US20210110895A1 - Systems and methods for mental health assessment - Google Patents
Systems and methods for mental health assessment Download PDFInfo
- Publication number
- US20210110895A1 US20210110895A1 US17/130,649 US202017130649A US2021110895A1 US 20210110895 A1 US20210110895 A1 US 20210110895A1 US 202017130649 A US202017130649 A US 202017130649A US 2021110895 A1 US2021110895 A1 US 2021110895A1
- Authority
- US
- United States
- Prior art keywords
- patient
- model
- data
- subject
- logic
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
- 238000000034 method Methods 0.000 title claims abstract description 234
- 230000004630 mental health Effects 0.000 title claims description 43
- 230000004044 response Effects 0.000 claims abstract description 190
- 230000006996 mental state Effects 0.000 claims abstract description 103
- 230000036541 health Effects 0.000 claims description 212
- 238000003058 natural language processing Methods 0.000 claims description 84
- 238000012545 processing Methods 0.000 claims description 84
- 230000003340 mental effect Effects 0.000 claims description 32
- 230000003542 behavioural effect Effects 0.000 claims description 12
- 230000036651 mood Effects 0.000 claims description 11
- 238000013528 artificial neural network Methods 0.000 claims description 10
- 238000003759 clinical diagnosis Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 abstract description 75
- 238000012216 screening Methods 0.000 description 265
- 238000012544 monitoring process Methods 0.000 description 260
- 238000012360 testing method Methods 0.000 description 199
- 230000003993 interaction Effects 0.000 description 115
- 238000012549 training Methods 0.000 description 101
- 238000004458 analytical method Methods 0.000 description 87
- 230000008569 process Effects 0.000 description 87
- 238000010586 diagram Methods 0.000 description 74
- 230000002452 interceptive effect Effects 0.000 description 68
- 230000009471 action Effects 0.000 description 62
- 230000015654 memory Effects 0.000 description 60
- 239000000090 biomarker Substances 0.000 description 45
- 239000003814 drug Substances 0.000 description 45
- 238000004422 calculation algorithm Methods 0.000 description 43
- 229940079593 drug Drugs 0.000 description 43
- 238000010801 machine learning Methods 0.000 description 37
- 241000282414 Homo sapiens Species 0.000 description 34
- 230000000994 depressogenic effect Effects 0.000 description 34
- 238000011282 treatment Methods 0.000 description 34
- 238000003860 storage Methods 0.000 description 30
- 238000007726 management method Methods 0.000 description 29
- 238000013135 deep learning Methods 0.000 description 27
- 239000002131 composite material Substances 0.000 description 26
- 230000008451 emotion Effects 0.000 description 23
- 208000013738 Sleep Initiation and Maintenance disease Diseases 0.000 description 22
- 206010022437 insomnia Diseases 0.000 description 22
- 208000019901 Anxiety disease Diseases 0.000 description 20
- 230000036506 anxiety Effects 0.000 description 20
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 18
- 230000001755 vocal effect Effects 0.000 description 18
- 238000004891 communication Methods 0.000 description 16
- 238000011002 quantification Methods 0.000 description 16
- 230000000694 effects Effects 0.000 description 14
- 238000012546 transfer Methods 0.000 description 14
- 230000006399 behavior Effects 0.000 description 13
- 208000035475 disorder Diseases 0.000 description 13
- 230000033001 locomotion Effects 0.000 description 13
- 238000005259 measurement Methods 0.000 description 13
- 241000282326 Felis catus Species 0.000 description 12
- 230000002776 aggregation Effects 0.000 description 12
- 238000004220 aggregation Methods 0.000 description 12
- 230000008859 change Effects 0.000 description 11
- 208000028173 post-traumatic stress disease Diseases 0.000 description 11
- 230000000875 corresponding effect Effects 0.000 description 10
- 230000001815 facial effect Effects 0.000 description 10
- 230000014509 gene expression Effects 0.000 description 10
- 208000020401 Depressive disease Diseases 0.000 description 9
- 230000002596 correlated effect Effects 0.000 description 9
- 230000035945 sensitivity Effects 0.000 description 9
- 230000005236 sound signal Effects 0.000 description 9
- 230000003044 adaptive effect Effects 0.000 description 8
- 230000003750 conditioning effect Effects 0.000 description 8
- 238000003745 diagnosis Methods 0.000 description 8
- 230000000670 limiting effect Effects 0.000 description 8
- 238000012986 modification Methods 0.000 description 8
- 230000004048 modification Effects 0.000 description 8
- 230000004962 physiological condition Effects 0.000 description 8
- 238000007781 pre-processing Methods 0.000 description 8
- 230000002123 temporal effect Effects 0.000 description 8
- 230000003287 optical effect Effects 0.000 description 7
- 208000020016 psychiatric disease Diseases 0.000 description 7
- 230000009467 reduction Effects 0.000 description 7
- 238000013518 transcription Methods 0.000 description 7
- 238000013519 translation Methods 0.000 description 7
- 238000013527 convolutional neural network Methods 0.000 description 6
- 230000002996 emotional effect Effects 0.000 description 6
- 230000035882 stress Effects 0.000 description 6
- 230000035897 transcription Effects 0.000 description 6
- 239000013598 vector Substances 0.000 description 6
- 208000020925 Bipolar disease Diseases 0.000 description 5
- 102000001554 Hemoglobins Human genes 0.000 description 5
- 108010054147 Hemoglobins Proteins 0.000 description 5
- 206010029216 Nervousness Diseases 0.000 description 5
- 230000008901 benefit Effects 0.000 description 5
- 230000033228 biological regulation Effects 0.000 description 5
- 201000010099 disease Diseases 0.000 description 5
- 238000011156 evaluation Methods 0.000 description 5
- 239000000945 filler Substances 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 5
- 230000007246 mechanism Effects 0.000 description 5
- 239000000203 mixture Substances 0.000 description 5
- 230000002829 reductive effect Effects 0.000 description 5
- 230000007958 sleep Effects 0.000 description 5
- 238000007619 statistical method Methods 0.000 description 5
- 206010024264 Lethargy Diseases 0.000 description 4
- 241000699666 Mus <mouse, genus> Species 0.000 description 4
- 230000004075 alteration Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 4
- 238000006243 chemical reaction Methods 0.000 description 4
- 239000000470 constituent Substances 0.000 description 4
- 238000013500 data storage Methods 0.000 description 4
- 238000003066 decision tree Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000004927 fusion Effects 0.000 description 4
- 208000014674 injury Diseases 0.000 description 4
- 239000000463 material Substances 0.000 description 4
- 230000001105 regulatory effect Effects 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 208000024891 symptom Diseases 0.000 description 4
- ZAFYATHCZYHLPB-UHFFFAOYSA-N zolpidem Chemical compound N1=C2C=CC(C)=CN2C(CC(=O)N(C)C)=C1C1=CC=C(C)C=C1 ZAFYATHCZYHLPB-UHFFFAOYSA-N 0.000 description 4
- 206010010144 Completed suicide Diseases 0.000 description 3
- 241000282412 Homo Species 0.000 description 3
- 206010037180 Psychiatric symptoms Diseases 0.000 description 3
- 108091027981 Response element Proteins 0.000 description 3
- 208000027418 Wounds and injury Diseases 0.000 description 3
- 230000004913 activation Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 239000008280 blood Substances 0.000 description 3
- 210000004369 blood Anatomy 0.000 description 3
- 230000001413 cellular effect Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 230000001143 conditioned effect Effects 0.000 description 3
- 230000006378 damage Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000010348 incorporation Methods 0.000 description 3
- 238000002955 isolation Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000002483 medication Methods 0.000 description 3
- 230000008520 organization Effects 0.000 description 3
- 238000003909 pattern recognition Methods 0.000 description 3
- 238000012805 post-processing Methods 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 201000000980 schizophrenia Diseases 0.000 description 3
- 230000006403 short-term memory Effects 0.000 description 3
- 238000004904 shortening Methods 0.000 description 3
- 101150114085 soc-2 gene Proteins 0.000 description 3
- 230000002269 spontaneous effect Effects 0.000 description 3
- 238000012706 support-vector machine Methods 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 208000007848 Alcoholism Diseases 0.000 description 2
- 206010011469 Crying Diseases 0.000 description 2
- 206010022035 Initial insomnia Diseases 0.000 description 2
- 206010058672 Negative thoughts Diseases 0.000 description 2
- 206010034912 Phobia Diseases 0.000 description 2
- 229940094070 ambien Drugs 0.000 description 2
- 206010002026 amyotrophic lateral sclerosis Diseases 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 230000037007 arousal Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000036772 blood pressure Effects 0.000 description 2
- 238000009534 blood test Methods 0.000 description 2
- 230000036760 body temperature Effects 0.000 description 2
- 239000003086 colorant Substances 0.000 description 2
- 230000034994 death Effects 0.000 description 2
- 235000005911 diet Nutrition 0.000 description 2
- 230000037213 diet Effects 0.000 description 2
- 238000009826 distribution Methods 0.000 description 2
- 206010013663 drug dependence Diseases 0.000 description 2
- 206010016256 fatigue Diseases 0.000 description 2
- 230000003054 hormonal effect Effects 0.000 description 2
- JYGXADMDTFJGBT-VWUMJDOOSA-N hydrocortisone Chemical compound O=C1CC[C@]2(C)[C@H]3[C@@H](O)C[C@](C)([C@@](CC4)(O)C(=O)CO)[C@@H]4[C@@H]3CCC2=C1 JYGXADMDTFJGBT-VWUMJDOOSA-N 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 208000019906 panic disease Diseases 0.000 description 2
- 230000037361 pathway Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000002085 persistent effect Effects 0.000 description 2
- 208000019899 phobic disease Diseases 0.000 description 2
- 230000003252 repetitive effect Effects 0.000 description 2
- 230000029058 respiratory gaseous exchange Effects 0.000 description 2
- 230000036387 respiratory rate Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 208000011117 substance-related disease Diseases 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 229960001475 zolpidem Drugs 0.000 description 2
- WURBVZBTWMNKQT-UHFFFAOYSA-N 1-(4-chlorophenoxy)-3,3-dimethyl-1-(1,2,4-triazol-1-yl)butan-2-one Chemical compound C1=NC=NN1C(C(=O)C(C)(C)C)OC1=CC=C(Cl)C=C1 WURBVZBTWMNKQT-UHFFFAOYSA-N 0.000 description 1
- 208000024827 Alzheimer disease Diseases 0.000 description 1
- 241000132099 Antennaria <angniosperm> Species 0.000 description 1
- 208000017667 Chronic Disease Diseases 0.000 description 1
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 1
- RYGMFSIKBFXOCR-UHFFFAOYSA-N Copper Chemical compound [Cu] RYGMFSIKBFXOCR-UHFFFAOYSA-N 0.000 description 1
- 208000011231 Crohn disease Diseases 0.000 description 1
- 206010011878 Deafness Diseases 0.000 description 1
- 206010012289 Dementia Diseases 0.000 description 1
- 206010013887 Dysarthria Diseases 0.000 description 1
- 206010013952 Dysphonia Diseases 0.000 description 1
- 208000030814 Eating disease Diseases 0.000 description 1
- 208000019454 Feeding and Eating disease Diseases 0.000 description 1
- 241000282324 Felis Species 0.000 description 1
- 208000011688 Generalised anxiety disease Diseases 0.000 description 1
- 206010019280 Heart failures Diseases 0.000 description 1
- 206010019663 Hepatic failure Diseases 0.000 description 1
- 206010020400 Hostility Diseases 0.000 description 1
- 206010022998 Irritability Diseases 0.000 description 1
- 241000699670 Mus sp. Species 0.000 description 1
- 208000018737 Parkinson disease Diseases 0.000 description 1
- 206010039740 Screaming Diseases 0.000 description 1
- 206010051154 Self-injurious ideation Diseases 0.000 description 1
- 206010071299 Slow speech Diseases 0.000 description 1
- 208000003028 Stuttering Diseases 0.000 description 1
- 208000008234 Tics Diseases 0.000 description 1
- 206010046542 Urinary hesitation Diseases 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 210000000577 adipose tissue Anatomy 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 238000013019 agitation Methods 0.000 description 1
- 201000007201 aphasia Diseases 0.000 description 1
- 230000036528 appetite Effects 0.000 description 1
- 235000019789 appetite Nutrition 0.000 description 1
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 238000013542 behavioral therapy Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000007177 brain activity Effects 0.000 description 1
- 238000003490 calendering Methods 0.000 description 1
- 230000000747 cardiac effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000009225 cognitive behavioral therapy Methods 0.000 description 1
- 230000001010 compromised effect Effects 0.000 description 1
- 238000009833 condensation Methods 0.000 description 1
- 230000005494 condensation Effects 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000001351 cycling effect Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000003001 depressive effect Effects 0.000 description 1
- 230000003467 diminishing effect Effects 0.000 description 1
- 235000014632 disordered eating Nutrition 0.000 description 1
- 230000009429 distress Effects 0.000 description 1
- 230000008846 dynamic interplay Effects 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000002526 effect on cardiovascular system Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 230000004424 eye movement Effects 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000010006 flight Effects 0.000 description 1
- 238000011010 flushing procedure Methods 0.000 description 1
- 238000009472 formulation Methods 0.000 description 1
- 208000029364 generalized anxiety disease Diseases 0.000 description 1
- 230000010370 hearing loss Effects 0.000 description 1
- 231100000888 hearing loss Toxicity 0.000 description 1
- 208000016354 hearing loss disease Diseases 0.000 description 1
- 208000027498 hoarse voice Diseases 0.000 description 1
- 229960000890 hydrocortisone Drugs 0.000 description 1
- 230000009474 immediate action Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 208000007903 liver failure Diseases 0.000 description 1
- 231100000835 liver failure Toxicity 0.000 description 1
- 238000011068 loading method Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 206010025482 malaise Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000002969 morbid Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 206010028417 myasthenia gravis Diseases 0.000 description 1
- 230000000474 nursing effect Effects 0.000 description 1
- 238000006213 oxygenation reaction Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 238000000275 quality assurance Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000007670 refining Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000004043 responsiveness Effects 0.000 description 1
- 230000000284 resting effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 208000037974 severe injury Diseases 0.000 description 1
- 230000009528 severe injury Effects 0.000 description 1
- 239000010454 slate Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 125000006850 spacer group Chemical group 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000012731 temporal analysis Methods 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
- 230000008733 trauma Effects 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
- 230000002747 voluntary effect Effects 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
- 239000013585 weight reducing agent Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/164—Lie detection
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/16—Devices for psychotechnics; Testing reaction times ; Devices for evaluating the psychological state
- A61B5/165—Evaluating the state of mind, e.g. depression, anxiety
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
-
- G—PHYSICS
- G09—EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
- G09B—EDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
- G09B19/00—Teaching not covered by other main groups of this subclass
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/10—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
- G16H40/60—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
- G16H40/67—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/50—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/40—Detecting, measuring or recording for evaluating the nervous system
- A61B5/4076—Diagnosing or monitoring particular conditions of the nervous system
- A61B5/4088—Diagnosing of monitoring cognitive diseases, e.g. Alzheimer, prion diseases or dementia
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7271—Specific aspects of physiological measurement analysis
- A61B5/7275—Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Definitions
- a method for assessing a mental state of a subject in a single session or over multiple different sessions is provided.
- the method can comprise using an automated module to present and/or formulate at least one query based in part on one or more target mental states to be assessed.
- the at least one query can be configured to elicit at least one response from the subject.
- the method may also comprise transmitting the at least one query in an audio, visual, and/or textual format to the subject to elicit the at least one response.
- the method may also comprise receiving data comprising the at least one response from the subject in response to transmitting the at least one query.
- the data can comprise speech data.
- the method may further comprise processing the data using one or more individual, joint, or fused models comprising a natural language processing (NLP) model, an acoustic model, and/or a visual model.
- NLP natural language processing
- the method may further comprise generating, for the single session, for each of the multiple different sessions, or upon completion of one or more sessions of the multiple different sessions, one or more assessments of the mental state associated with the subject.
- the one or more individual, joint, or fused models may comprise a metadata model.
- the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.
- the at least one query can comprise a plurality of queries and the at least one response can comprise a plurality of responses.
- the plurality of queries can be transmitted in a sequential manner to the subject and configured to systematically elicit the plurality of responses from the subject.
- the plurality of queries can be structured in a hierarchical manner such that each subsequent query of the plurality of queries is structured as a logical follow on to the subject's response to a preceding query, and can be designed to assess or draw inferences on a plurality of aspects of the mental state of the subject.
- the automated module can be further configured to present and/or formulate the at least one query based in part on a profile of the subject.
- the one or more target mental states can be selected from the group consisting of depression, anxiety, post-traumatic stress disorder (PTSD), schizophrenia, suicidality, and bipolar disorder.
- the one or more target mental states can comprise one or more conditions or disorders associated or comorbid with a list of predefined mental disorders.
- the list of predefined mental disorders may include mental disorders as defined or provided in the Diagnostic and Statistical Manual of Mental Disorders.
- the one or more associated or comorbid conditions or disorders can comprise fatigue, loneliness, low motivation, or stress.
- the assessment can comprise a score that indicates whether the subject is (i) more likely than others to experience at least one of the target mental states or (ii) more likely than others to experience at least one of the target mental states at a future point in time.
- the future point in time can be within a clinically actionable future.
- the method can further comprise: transmitting the assessment to a healthcare provider to be used in evaluating the mental state of the subject.
- the transmitting can be performed in real-time during the assessment, just-in-time, or after the assessment has been completed.
- the plurality of queries can be designed to test for or detect a plurality of aspects of the mental state of the subject.
- the assessment can comprise a score that indicates whether the subject is (i) more likely than others to experience at least one of the target mental states or (ii) more likely than others to experience at least one of the target mental states at a future point in time.
- the score can be calculated based on processed data obtained from the subject's plurality of responses to the plurality of queries. In some embodiments, the score can be continuously updated with processed data obtained from each of the subject's follow-on response to a preceding query.
- the method can further comprise based on the at least one response, identifying additional information to be elicited from the subject.
- the method can further comprise transmitting a subsequent query to the subject.
- the subsequent query relates to the additional information and can be configured to elicit a subsequent response from the subject.
- the method can further comprise receiving data comprising the subsequent response from the subject in response to transmitting the subsequent query.
- the method can further comprise processing the subsequent response to update the assessment of the mental state of the subject.
- identifying additional information to be elicited from the subject can comprise: identifying (i) one or more elements of substantive content or (ii) one or more patterns in the data that are material to the mental state of the subject.
- the method can further comprise: for each of the one or more elements of substantive content or the one or more patterns: identifying one or more items of follow-up information that are related to the one or more elements or the one or more patterns to be asked of the subject, and generating a subsequent query.
- the subsequent query can relate to the one or more items of follow-up information.
- the NLP model can be selected from the group consisting of a sentiment model, a statistical language model, a topic model, a syntactic model, an embedding model, a dialog or discourse model, an emotion or affect model, and a speaker personality model.
- the data can further comprise images or video of the subject.
- the data can be further processed using the visual model to generate the assessment of the mental state of the subject.
- the visual model can be selected from the group consisting of a facial cue model, a body movement/motion model, and an eye activity model.
- the at least one query can be transmitted in a conversational context in a form of a question, statement, or comment that is configured to elicit the at least one response from the subject.
- the conversational context can be designed to promote elicitation of truthful, reflective, thoughtful, or candid responses from the subject.
- the conversational context can be designed to affect an amount of time that the subject takes to compose the at least one response.
- the method can further comprise: transmitting one or more prompts in the audio and/or visual format to the subject when a time latency threshold is exceeded.
- the conversational context can be designed to enhance one or more performance metrics of the assessment of the mental state of the subject.
- the one or more performance metrics can be selected from the group consisting of an F1 score, an area under the curve (AUC), a sensitivity, a specificity, a positive predictive value (PPV), and an equal error rate.
- the at least one query is not or need not be transmitted or provided in a format of a standardized test or questionnaire.
- the at least one query can comprise subject matter that has been adapted or modified from a standardized test or questionnaire.
- the standardized test or questionnaire can be selected from the group consisting of PHQ-9, GAD-7, HAM-D, and BDI.
- the standardized test or questionnaire can be another similar test or questionnaire for assessing a patient's mental health state.
- the one or more individual, joint, or fused models can comprise a regression model.
- the at least one query can be designed to be open-ended without limiting the at least one response from the subject to be a binary yes-or-no response.
- the score can be used to calculate one or more scores with a clinical value.
- the assessment can comprise a quantized score estimate of the mental state of the subject.
- the quantized score estimate can comprise a calibrated score estimate.
- the quantized score estimate can comprise a binary score estimate.
- the plurality of queries can be represented as a series of edges and the plurality of responses can be represented as a series of nodes in a nodal network.
- the mental state can comprise one or more medical, psychological, or psychiatric conditions or symptoms.
- the method can be configured to further assess a physical state of the subject as manifested based on the speech data of the subject.
- the method can further comprise: processing the data using the one or more individual, joint, or fused models to generate an assessment of the physical state of the subject.
- the assessment of the physical state can comprise a score that indicates whether the subject is (i) more likely than others to experience at least one of a plurality of physiological conditions or (ii) more likely than others to experience at least one of the physiological conditions at a future point in time.
- the physical state of the subject is manifested due to one or more physical conditions that affect a characteristic or a quality of voice of the subject.
- the automated module can be a mental health screening module that can be configured to dynamically formulate the at least one query based in part on the one or more target mental states to be assessed.
- the one or more individual, joint, or fused models can comprise a composite model that can be an aggregate of two or more different models.
- Another aspect of the present disclosure provides a non-transitory computer readable-medium comprising machine-executable instructions that, upon execution by one or more computer processors, implements any of the foregoing methods described in the above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and memory comprising machine-executable instructions that, upon execution by the one or more computer processors, implements any of the methods foregoing described in the above or elsewhere herein.
- the method can comprise: transmitting at least one query to the subject.
- the at least one query can be configured to elicit at least one response from the subject.
- the method can further comprise receiving data comprising the at least one response from the subject in response to transmitting the at least one query.
- the data can comprise speech data.
- the method can further comprise processing the data using one or more individual, joint, or fused models comprising a natural language processing (NLP) model, an acoustic model, and/or a visual model to generate an output.
- NLP natural language processing
- the method can further comprise using at least the output to generate a score and a confidence level of the score.
- the score can comprise an estimate that the subject has the mental health disorder.
- the confidence level can be based at least in part on a quality of the speech data and represents a degree to which the estimate can be trusted.
- the one or more individual, joint, or fused models can comprise a metadata model.
- the metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.
- the output can comprise an NLP output, an acoustic output, and a visual output.
- the NLP output, the acoustic output, and the visual output can each comprise a plurality of outputs corresponding to different time ranges of the data.
- generating the score can comprise: (i) segmenting the NLP output, the acoustic output, and the visual output into discrete time segments, (ii) assigning a weight to each discrete time segment, and (iii) computing a weighted average of the NLP output, the acoustic output, and the visual output using the assigned weights.
- the weights can be based at least on (i) base weights of the one or more individual, joint, or fused models (ii) a confidence level of each discrete time segment of the NLP output, the acoustic output, and the visual output.
- the one or more individual, joint, or fused models can be interdependent such that each of the one or more individual, joint, or fused models is conditioned on an output of at least one other of the one or more individual, joint, or fused models.
- generating the score can comprise fusing the NLP output, the acoustic output, and the visual output.
- generating the confidence level of the score can comprise fusing (i) a confidence level of the NLP output with (ii) a confidence level of the acoustic output.
- the method can further comprise converting the score into one or more scores with a clinical value.
- the method can further comprise transmitting the one or more scores with a clinical value to the subject and/or a contact for the subject. In some embodiments, the method can further comprise transmitting the one or more scores with a clinical value to a healthcare provider for use in evaluating and/or providing care for a mental health of the subject. In some embodiments, the transmitting can comprise transmitting the one or more scores with a clinical value to the healthcare provider during the screening, monitoring, or diagnosing. In some embodiments, the transmitting can comprise transmitting the one or more scores with a clinical value to the healthcare provider or a payer after the screening, monitoring, or diagnosing has been completed.
- the at least one query can comprise a plurality of queries
- the at least one response can comprise a plurality of responses.
- Generating the score can comprise updating the score after receiving each of the plurality of responses
- the method can further comprise: converting the score to one or more scores with a clinical value after each of the updates.
- the method can further comprise transmitting the one or more scores with a clinical value to a healthcare provider after the converting.
- the method can further comprise: determining that the confidence level does not satisfy a predetermined criterion, in real time and based at least in part on the at least one response, generating at least one additional query, and using the at least one additional query, repeating steps (a)-(d) until the confidence level satisfies the predetermined criterion.
- the confidence level can be based on a length of the at least one response. In some embodiments, the confidence level can be based on an evaluated truthfulness of the one or more responses of the subject.
- the one or more individual, joint, or fused models can be trained on speech data from a plurality of test subjects, wherein each of the plurality of test subjects has completed a survey or questionnaire that indicates whether the test subject has the mental health disorder.
- the confidence level can be based on an evaluated truthfulness of responses in the survey or questionnaire.
- the method can further comprise extracting from the speech data one or more topics of concern of the subject using a topic model.
- the method can further comprise generating a word cloud from the one or more topics of concern.
- the word cloud reflects changes in the one or more topics of concern of the subject over time.
- the method can further comprise transmitting the one or more topics of concern to a healthcare provider, the subject, or both.
- the video output can be assigned a higher weight than the NLP output and the acoustic output in generating the score when the subject is not speaking. In some embodiments, a weight of the video output in generating the score can be increased when the NLP output and the acoustic output indicate that a truthfulness level of the subject is below a threshold.
- the video model can comprise one or more of a facial cue model, a body movement/motion model, and a gaze model.
- the at least one query can comprise a plurality of queries and the at least one response can comprise a plurality of responses.
- the plurality of queries can be configured to sequentially and systematically elicit the plurality of responses from the subject.
- the plurality of queries can be structured in a hierarchical manner such that each subsequent query of the plurality of queries can be a logical follow on to the subject's response to a preceding query and can be designed to assess or draw inferences about different aspects of the mental state of the subject.
- the at least one query can include subject matter that has been adapted or modified from a clinically-validated survey, test or questionnaire.
- the acoustic model can comprise one or more of an acoustic embedding model, a spectral-temporal model, a supervector model, an acoustic affect model, a speaker personality model, an intonation model, a speaking rate model, a pronunciation model, a non-verbal model, or a fluency model.
- the NLP model can comprise one or more of a sentiment model, a statistical language model, a topic model, a syntactic model, an embedding model, a dialog or discourse model, an emotion or affect model, or a speaker personality model.
- the mental health disorder can comprise depression, anxiety, post-traumatic stress disorder, bipolar disorder, suicidality or schizophrenia.
- the mental health disorder can comprise one or more medical, psychological, or psychiatric conditions or symptoms.
- the score can comprise a score selected from a range.
- the range can be normalized with respect to a general population or to a specific population of interest.
- the one or more scores with a clinical value can comprise one or more descriptors associated with the mental health disorder.
- steps (a)-(d) as described above can be repeated at a plurality of different times to generate a plurality of scores.
- the method can further comprise: transmitting the plurality of scores and confidences to a computing device and graphically displaying, on the computing device, the plurality of scores and confidences as a function of time on a dashboard or other representation for one or more end users.
- the quality of the speech data can comprise a quality of an audio signal of the speech data.
- the quality of the speech data can comprise a measure of confidence of a speech recognition process performed on an audio signal of the speech data.
- the method can be implemented for a single session.
- the score and the confidence level of the score can be generated for the single session.
- the method can be implemented for and over multiple different sessions, and the score and the confidence level of the score can be generated for each of the multiple different sessions, or upon completion of one or more sessions of the multiple different sessions.
- Another aspect of the present disclosure provides a non-transitory computer readable-medium comprising machine-executable instructions that, upon execution by one or more computer processors, implements any of the methods described in the above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and memory comprising machine-executable instructions that, upon execution by the one or more computer processors, implements any of the methods described above or elsewhere herein.
- Another aspect of the present disclosure provides a method for processing speech and/or video data of a subject to identify a mental state of the subject.
- the method can comprise: receiving the speech and/or video data of the subject and using at least one processing technique to process the speech and/or video data to identify the mental state at (i) a reduced error rate of at least 10% lower or (ii) an accuracy of at least 10% higher, than a standardized mental health questionnaire or testing tool usable for identifying the mental state.
- the reduced error rate or the accuracy can be established relative to at least one or more benchmark standards usable by an entity for identifying or assessing one or more medical conditions comprising the mental state.
- the entity can comprise one or more of the following: clinicians, healthcare providers, insurance companies, and government-regulated bodies.
- the at least one or more benchmark standards can comprise at least one clinical diagnosis that has been independently verified to be accurate in identifying the mental state.
- the speech data can be received substantially in real-time as the subject is speaking. In some embodiments, the speech data can be produced in an offline mode from a stored recording of the subject's speech.
- Another aspect of the present disclosure provides a method for processing speech data of a subject to identify a mental state of the subject.
- the method can comprise: receiving the speech data of the subject and using at least one processing technique to process the speech data to identify the mental state.
- the identification of the mental state is better according to one or more performance metrics as compared to a standardized mental health questionnaire or testing tool usable for identifying the mental state.
- the one or more performance metrics can comprise a sensitivity or specificity
- the speech data can be processed according to a desired level of sensitivity or a desired level of specificity.
- the desired level of sensitivity or the desired level of specificity can be defined based on criteria established by an entity.
- the entity can comprise one or more of the following: clinicians, healthcare providers, personal caregivers, insurance companies, and government-regulated bodies.
- Another aspect of the present disclosure provides a method for processing speech data of a subject to identify or assess a mental state of the subject.
- the method can comprise: receiving the speech data of the subject, using one or more processing technique to process the speech data to generate one or more descriptors indicative of the mental state, and generating a plurality of visual elements of the one or more descriptors.
- the plurality of visual elements can be configured to be displayed on a graphical user interface of an electronic device of a user and usable by the user to identify or assess the mental state.
- the user can be the subject. In some embodiments, the user can be a clinician or healthcare provider. In some embodiments, the one or more descriptors can comprise a calibrated or normalized score indicative of the mental state. In some embodiments, the one or more descriptors further can comprise a confidence associated with the calibrated or normalized score.
- Another aspect of the present disclosure provides a method for identifying, assessing, or monitoring a mental state of a subject.
- the method can comprise using a natural language processing algorithm, an acoustic processing algorithm, or a video processing algorithm to process data of the subject to identify or assess the mental state of a subject, the data comprising speech or video data of the subject, and outputting a report indicative of the mental state of the subject.
- the report can be transmitted to a user to be used for identifying, assessing, or monitoring the mental state.
- the user can be the subject. In some embodiments, the user can be a clinician or healthcare provider. In some embodiments, the report can comprise a plurality of graphical visual elements. In some embodiments, the report can be configured to be displayed on a graphical user interface of an electronic device of the user. In some embodiments, the method can further comprise: updating the report in response to one or more detected changes in the mental state of the subject. In some embodiments, the report can be updated substantially in real time as the one or more detected changes in the mental state are occurring in the subject.
- Another aspect of the present disclosure provides a method for identifying whether a subject is at risk of a mental or physiological condition.
- the method can comprise: obtaining speech data from the subject and storing the speech data in computer memory, processing the speech data using in part natural language processing to identify one or more features indicative of the mental or physiological condition, and outputting an electronic report identifying whether the subject is at a risk of the mental or physiological condition, and the risk can be quantified in a form of a normalized score with a confidence level.
- the normalized score with the confidence level can be usable by a user to identify whether the subject is at a risk of the mental or physiological condition.
- the user can be the subject. In some embodiments, the user can be a clinician or healthcare provider. In some embodiments, the report can comprise a plurality of graphical visual elements. In some embodiments, the report can be configured to be displayed on a graphical user interface of an electronic device of the user.
- Another aspect of the present disclosure provides a method for identifying, assessing, or monitoring a mental state or disorder of a subject.
- the method can comprise: receiving audio or audio-visual data comprising speech of the subject in computer memory and processing the audio or audio-visual data to identify, assess, monitor, or diagnose the mental state or disorder of the subject, which processing can comprise performing natural language processing on the speech of the subject.
- the audio or audio-visual data can be received in response to a query directed to the subject.
- the audio or audio-visual data can be from a prerecording of a conversation to which the subject can be a party.
- the audio or audio-visual data can be from a prerecording of a clinical session involving the subject and a healthcare provider.
- the mental state or disorder can be identified at a higher performance level compared to a standardized mental health questionnaire or testing tool.
- the processing further can comprise using a trained algorithm to perform acoustic analysis on the speech of the subject.
- the method can comprise: obtaining speech data from the subject and storing the speech data in computer memory.
- the speech data can comprise responses to a plurality of queries transmitted in an audio and/or visual format to the subject.
- the method can further comprise selecting (1) a first model optimized for sensitivity in estimating whether the subject has the mental condition or (2) a second model optimized for specificity in estimating whether the subject has the mental condition.
- the method can further comprise processing the speech data using the selected first model or the second model to generate the estimate.
- the method can further comprise transmitting the estimate to the stakeholder.
- the first model can be selected and the stakeholder can be a healthcare payer. In some embodiments, the second model can be selected and the stakeholder can be a healthcare provider.
- the system can be configured to (i) receive the speech data from the memory and (ii) process the speech data using at least one model to determine that the subject is at risk of having the mental condition.
- the at least one model can be trained on speech data from a plurality of other test subjects who have a clinical determination of the mental condition.
- the clinical determinations may serve as labels for the speech data.
- the system can be configured to generate the estimate of the mental condition that is better according to one or more performance metrics as compared to a clinically-validated survey, test or questionnaire.
- the system can be configured to generate the estimate of the mental condition with a higher specificity compared to the clinically-validated survey, test or questionnaire. In some embodiments, the system can be configured to generate the estimate of the mental condition with a higher sensitivity compared to the clinically-validated survey, test, or questionnaire.
- the identification can be output while the subject is speaking. In some embodiments, the identification can be output via streaming or a periodically updated signal.
- the method can comprise using an automated screening module to dynamically formulate at least one query based in part on one or more target mental states to be assessed.
- the at least one query can be configured to elicit at least one response from the subject.
- the method can further comprise transmitting the at least one query in an audio and/or visual format to the subject to elicit the at least one response.
- the method can further comprise receiving data comprising the at least one response from the subject in response to transmitting the at least one query.
- the data can comprise speech data.
- the method can further comprise processing the data using a composite model comprising at least one or more semantic models to generate an assessment of the mental state of the subject.
- Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto.
- the computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
- FIG. 1A shows a health screening or monitoring system in which a health screening or monitoring server and a clinical data server computer system and a social data server cooperate to estimate a health state of a patient in accordance with the present disclosure
- FIG. 1B shows an additional embodiment of the health screening or monitoring system from FIG. 1A ;
- FIG. 2 shows a patient screening or monitoring system in which a web server and modeling server(s) cooperate to assess a state of a patient through a wide area network, in accordance with some embodiments;
- FIG. 3 shows a patient assessment system in which a real-time computer system, a modeling computer system, and a clinical and demographic data server computer system that cooperate to assess a state of a patient and report the assessed state to a clinician using a clinician device through a wide area network in accordance with the present disclosure.
- FIG. 4 is a block diagram of the health screening or monitoring server of FIG. 1A in greater detail
- FIG. 5 is a block diagram of interactive health screening or monitoring logic of the health screening or monitoring server of FIG. 4 in greater detail;
- FIG. 6 is a block diagram of interactive screening or monitoring server logic of the interactive health screening or monitoring logic of FIG. 5 in greater detail;
- FIG. 7 is a block diagram of generalized dialogue flow logic of the interactive screening or monitoring server logic of FIG. 6 in greater detail;
- FIG. 8 is a logic flow diagram illustrating the control of an interactive spoken conversation with the patient by the generalized dialogue flow logic in accordance with the present disclosure
- FIG. 9 is a block diagram of a question and adaptive action bank of the generalized dialogue flow logic of FIG. 7 in greater detail;
- FIG. 10 is a logic flow diagram of a step of FIG. 8 in greater detail
- FIG. 11 is a block diagram of question management logic of the question and adaptive action bank of FIG. 9 in greater detail;
- FIG. 12 is a logic flow diagram of determination of the quality of a question in accordance with the present disclosure.
- FIG. 13 is a logic flow diagram of determination of the equivalence of two questions in accordance with the present disclosure
- FIG. 14 is a logic flow diagram illustrating the control of an interactive spoken conversation with the patient by the real-time system in accordance with the present disclosure
- FIGS. 15 and 16 are each a logic flow diagram of a respective step of FIG. 14 in greater detail.
- FIG. 17 is a transaction flow diagram showing an illustrative example of a spoken conversation with, and controlled by, the real-time system of FIG. 3 .
- FIG. 18 is a block diagram of runtime model server logic of the interactive health screening or monitoring logic of FIG. 3 in greater detail;
- FIG. 19 is a block diagram of model training logic of the interactive health screening or monitoring logic of FIG. 1A in greater detail;
- FIG. 20A shows a greater detailed block diagram of the patient screening or monitoring system, in accordance with some embodiments.
- FIG. 20B provides a block diagram of the runtime model server(s), in accordance with some embodiments.
- FIG. 21 provides a block diagram of the model training server(s), in accordance with some embodiments.
- FIG. 22 shows the real-time computer system and the modeling computer system of FIG. 3 in greater detail, including a general flow of data.
- FIG. 23A provides a block diagram of the acoustic model, in accordance with some embodiments.
- FIG. 23B shows an embodiment of FIG. 23A including an acoustic modeling block
- FIG. 23C shows a score calibration and confidence module
- FIG. 24 provides a simplified example of the high level feature representor of the acoustic model, for illustrative purposes
- FIG. 25 provides a block diagram of the Natural Language Processing (NLP) model, in accordance with some embodiments.
- NLP Natural Language Processing
- FIG. 26 provides a block diagram of the visual model, in accordance with some embodiments.
- FIG. 27 provides a block diagram of the descriptive features, in accordance with some embodiments.
- FIG. 28 provides a block diagram of the interaction engine, in accordance with some embodiments.
- FIG. 29 is a logic flow diagram of the example process of testing a patient for a mental health condition, in accordance with some embodiments.
- FIG. 30 is a logic flow diagram of the example process of model training, in accordance with some embodiments.
- FIG. 31 is a logic flow diagram of the example process of model personalization, in accordance with some embodiments.
- FIG. 32 is a logic flow diagram of the example process of client interaction, in accordance with some embodiments.
- FIG. 33 is a logic flow diagram of the example process of classifying the mental state of the client, in accordance with some embodiments.
- FIG. 34 is a logic flow diagram of the example process of model conditioning, in accordance with some embodiments.
- FIG. 35 is a logic flow diagram of the example process of model weighting and fusion, in accordance with some embodiments.
- FIG. 36 is a logic flow diagram of the example simplified process of acoustic analysis, provided for illustrative purposes only;
- FIG. 37 is a block diagram showing speech recognition logic of the modeling computer system in greater detail
- FIG. 38 is a block diagram showing language model training logic of the modeling computer system in greater detail
- FIG. 39 is a block diagram showing language model logic of the modeling computer system in greater detail.
- FIG. 40 is a block diagram showing acoustic model training logic of the modeling computer system in greater detail
- FIG. 41 is a block diagram showing acoustic model logic of the modeling computer system in greater detail
- FIG. 42 is a is a block diagram showing visual model training logic of the modeling computer system in greater detail
- FIG. 43 is a block diagram showing visual model logic of the modeling computer system in greater detail.
- FIG. 44 is a block diagram of a screening or monitoring system data store of the interactive health screening or monitoring logic of FIG. 1A in greater detail;
- FIG. 45 shows a health screening or monitoring system in which a health screening or monitoring server estimates a health state of a patient by passively listening to ambient speech in accordance with the present disclosure
- FIG. 46 is a logic flow diagram illustrating the estimation a health state of a patient by passively listening to ambient speech in accordance with the present disclosure
- FIG. 47 is a logic flow diagram illustrating the estimation a health state of a patient by passively listening to ambient speech in accordance with the present disclosure.
- FIG. 48 is a block diagram of health care management logic of the health screening or monitoring server of FIG. 4 in greater detail.
- FIGS. 49 and 50 are respective block diagrams of component conditions and actions of work-flows of the health care management logic of FIG. 48 .
- FIG. 51 is a logic flow diagram of the automatic formulation of a work-flow of the health care management logic of FIG. 48 in accordance with the present disclosure
- FIG. 52 is a block diagram of the real-time computer system of FIG. 3 in greater detail
- FIG. 53 is a block diagram of the modeling computer system of FIG. 3 in greater detail
- FIG. 54 is a block diagram of the health screening or monitoring server of FIG. 1A in greater detail.
- FIGS. 55 and 56 provide example illustrations of spectrograms of an acoustic signal used for analysis, in accordance with some embodiments
- FIGS. 57 and 58 are example illustrations of a computer system capable of embodying the current disclosure
- FIG. 59 shows a precision case management use case for the system
- FIG. 60 shows a primary care screening or monitoring use case for the system
- FIG. 61 shows a system for enhanced employee assistance plan (EAP) navigation and triage
- FIG. 62 shows a computer system that is programmed or otherwise configured to assess a mental state of a subject in a single session or over multiple different sessions.
- the present invention relates to health screening or monitoring systems, and, more particularly, to a computer-implemented mental health screening or monitoring tool with significantly improved accuracy and efficacy by leveraging language analysis, visual cues and acoustic analysis.
- the specifics of improved acoustic, visual and speech analysis techniques are described as they pertain to the classification of a respondent as being depressed, or other mental state of interest. While much of the following disclosure will focus largely on assessing depression in a patient, the systems and methods described herein may be equally adept at screening or monitoring a user for a myriad of mental and physical ailments. For example, bipolar disorder, anxiety, and schizophrenia are examples of mental ailments that such a system may be adept at screening or monitoring for. It is also possible that physical ailments may be assessed utilizing such systems. It should be understood that while this disclosure may focus heavily upon depression screening or monitoring, this is not limiting. Any suitable mental or physical ailment may be screened using the disclosed systems and methods.
- the systems and methods disclosed herein may use natural language processing (NLP) to perform semantic analysis on patient speech utterances.
- Semantic analysis may refer to analysis of spoken language from patient responses to assessment questions or captured conversations, in order to determine the meaning of the spoken language for the purpose of conducting a mental health screening or monitoring of the patient.
- the analysis may be of words or phrases, and may be configured to account for primary queries or follow-up queries. In the case of captured human-human conversations, the analysis may also apply to the speech of the other party.
- the terms “semantic analysis” and “natural language processing (NLP)” may be used interchangeably. Semantic analysis may be used to determine the meanings of utterances by patients, in context. It may also be used to determine topics patients are speaking about.
- a mental state may be distinguished from an emotion or feeling, such as happiness, sadness, or anger.
- a mental state may include one or more feelings in combination with a philosophy of mind, including how a person perceives objects of his or her environment and the actions of other people toward him or her. While feelings may be transient, a mental state may describe a person's overarching disposition or mood, even in situations where the person's feelings may change. For example, a depressed person may variously feel, at different times, happy, sad, or angry.
- a server computer system may apply a health state screening or monitoring test to a human patient using a client device (patient device 112 ), by engaging the patient in an interactive spoken conversation and applying a composite model, that may combine language, acoustic, metadata, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue.
- a composite model may analyze, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient and (ii) estimate the patient's health.
- Appendix A illustrates an exemplary implementation that includes Calendaring, SMS, Dialog, Calling and User Management Services. While the latter goal is primary, the former goal is a significant factor in achieving the latter.
- Truthfulness of the patient in answering questions posed by the screening or monitoring test is critical in assessing the patient's mood. Health screening or monitoring server 102 encourages patient honesty.
- the spoken conversation may provide the patient with less time to compose a disingenuous response to a question rather than simply responding honestly to the question.
- the conversation may feel, to the patient, more spontaneous and personal and may be less annoying to the patient than a generic questionnaire, as would be provided by, for example, simply administering the PHQ-9. Accordingly, the spoken conversation may not induce or exacerbate resentment in the patient for having to answer a questionnaire for the benefit of a doctor or other clinician.
- the spoken conversation may be adapted in progress to be responsive to the patient, reducing the patient's annoyance with the screening or monitoring test and, in some situations, shortening the screening or monitoring test.
- health screening or monitoring system 100 may include health screening or monitoring server 102 , a call center system 104 , a clinical data server 106 , a social data server 108 , a patient device 112 , and a clinician device 114 that are connected to one another though a wide area network (WAN) 110 , that is the Internet in this illustrative embodiment.
- WAN wide area network
- patient device 112 may also reachable by call center system 104 through a public-switched telephone network (PSTN) 120 or directly.
- PSTN public-switched telephone network
- Health screening or monitoring server 102 may be a server computer system that administers the health screening or monitoring test with the patient through patient device 112 and combines a number of language, acoustic, and visual models to produce results 1820 ( FIG. 18 ), using clinical data retrieved from clinical data server 106 , social data retrieved from social data server 108 , and patient data collected from past screenings or monitoring to train the models of runtime model server 304 ( FIG. 18 ).
- Clinical data server 106 FIG.
- 1A may be a server computer system that makes available clinical or demographic data of the patient, including diagnoses, medication information, etc., available, e.g., to health screening or monitoring server 102 , in a manner that is compliant with HIPAA (Health Insurance Portability and Accountability Act of 1996) and/or any other privacy and security policies and regulations such as GDPR and SOC 2.
- Social data server 106 may be a server computer system that makes social data of the patient, including social media posts, online purchases, searches, etc., available, e.g., to health screening or monitoring server 102 .
- Clinician device 114 may be a client device that receives data representing results of the screening or monitoring regarding the patient's health from health screening or monitoring server 102 .
- the system may be used to assess the mental state of the subject in a single session or over multiple sessions. Subsequent sessions may be informed by assessment results from prior assessments. This may be done by providing assessment data as inputs to machine learning algorithms or other analysis methods for the subsequent assessments. Each session may generate one or more assessments. Individual assessments may also compile data from multiple sessions.
- FIG. 1B shows an additional embodiment of the health screening or monitoring system from FIG. 1A .
- FIG. 1B illustrates a conversation between patient 120 and clinician 130 .
- the clinician 130 may record one or more speech samples from the patient 120 and upload them to the wide area network 110 , with the consent of the patient 120 .
- the speech samples may be analyzed by one or more machine learning algorithms, described elsewhere herein.
- FIG. 2 provides an additional embodiment of a health screening or monitoring system.
- Health screening or monitoring system 200 may apply a health state screening or monitoring test to a human patient using a client device (clients 260 a - n ), by engaging the patient in an interaction and applying a composite model that combines language, acoustic, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue.
- the composite model can be configured to analyze, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient, (ii) estimate the patient's mental health, and (iii) provide a judgment free and less embarrassing experience for the patient, who may already be suffering from anxiety and other mental barriers to receiving proper screening or monitoring from a clinician.
- the terms “patient”, “client”, “subject”, “respondent” and “user” may all be employed interchangeably to refer to the individual being screened for the mental health conditions and/or the device being utilized by this individual to collect and transmit the audio and visual data that is used to screen them.
- “semantic analysis” and “NLP” may be used interchangeably to reference natural language processing models and elements.
- takeholders is employed to refer to a wide variety of interested third parties who are not the patient being screened.
- These stakeholders may include physicians, health care providers, care team members, insurance companies, research organizations, and family/relatives of the patient, hospitals, crisis centers and the like. It should thus be understood that when another label is employed, such as “physician”, the intention in this disclosure is to reference any number of stakeholders.
- the health screening or monitoring system 200 includes a backend infrastructure designed to administer the screening or monitoring interaction and analyze the results.
- This includes one or more model servers 230 coupled to a web server 240 .
- the web server 240 and model server(s) 230 leverage user data 220 which is additionally populated by clinical and social data 210 .
- the clinical data portion may be compiled from the healthcare providers, and may include diagnoses, vital information (age, weight, height, blood chemistry, etc.), diseases, medications, lists of clinical encounters (hospitalizations, clinic visits, Emergency Department visits), clinician records, and the like.
- This clinical data may be compiled from one or more electronic health record (EHR) systems or Health Information Exchanges (HIE) by way of a secure application protocol, extension or socket.
- EHR electronic health record
- HIE Health Information Exchanges
- Social data may include information collected from a patient's social networks, including social media postings, from databases detailing patient's purchases, and from databases containing patient's economic, educational, residential, legal and other social determinants. This information may be compiled together with additional preference data, metadata, annotations, and voluntarily supplied information, to populate the user database 220 .
- the model server 230 and web server 240 are additionally capable of populating and/or augmenting the user data 220 with preferences, extracted features and the like.
- the backend infrastructure communicates with the clients 260 a - n via a network infrastructure 250 .
- this network may include the internet, a corporate local area network, private intranet, cellular network, or some combination of these.
- the clients 260 a - n include a client device of a person being screened, which accesses the backend screening or monitoring system and includes a microphone and camera for audio and video capture, respectively.
- the client device may be a cellular phone, tablet, laptop or desktop equipped with a microphone and optional camera, smart speaker in the home or other location, smart watch with a microphone and optional camera, or a similar device.
- a client device may collect additional data, such as biometric data.
- biometric data For example, smart watches and fitness trackers already have the capability of measuring motion, heart rate and sometimes respiratory rate and blood oxygenation levels and other physiologic parameters. Future smart devices may record conductivity measurements for tracking perspiration, pH changes in the skin, and other chemical or hormonal changes. Client devices may operate in concert to collect data. For example, a phone may capture the audio and visual data while a Bluetooth paired fitness tracker may provide body temperature, pulse rate, respiratory rate and movement data simultaneously.
- All of the collected data for each client 260 a - n is provided back to the web server 240 via the network infrastructure 250 .
- results are provided back to the client 260 a - n for consumption, and when desired for sharing with one or more stakeholders 270 a - n associated with the given client 260 a - n, respectively.
- the stakeholders 270 a - n are illustrated as being in direct communication with their respective clients 260 a - n. While in practice this may indeed be possible, often the stakeholder 270 a - n will be capable of direct access to the backend screening or monitoring system via the network infrastructure 250 and web server 240 , without the need to use the client 260 a - n as an intermediary.
- each client 260 a - n may be associated with one or more stakeholders 270 a - n, which may differ from any other client's 260 a - n stakeholders 270 a - n.
- a server computer system applies a depression assessment test to a human patient using a client device (portable device 312 ), by engaging the patient in an interactive spoken conversation and applying a composite model, that combines language, acoustic, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue.
- a composite model that combines language, acoustic, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue.
- the general subject matter of the assessment test may incorporate queries including subject matter similar to questions asked in standardized depression assessment tests such as the PHQ-9, the assessment does not merely include analysis of answers to survey questions.
- the screening or monitoring system's composite model analyzes, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient and (ii) assess the patient's mental health.
- Real-time system 302 encourages honesty of the patient in a number of ways. First, the spoken conversation provides the patient with less time to compose a response to a question rather than it would take to simply respond honestly to the question.
- the conversation feels, to the patient, more spontaneous and personal and is less annoying to the patient than an obviously generic questionnaire. Accordingly, the spoken conversation does not induce or exacerbate resentment in the patient for having to answer a questionnaire before seeing a doctor or other clinician.
- the spoken conversation is adapted in progress to be responsive to the patient, reducing the patient's annoyance with the assessment test and, in some situations, shortening the assessment test.
- the assessment test as administered by real-time system 302 may rely more on non-verbal aspects of the conversation and the patient than on the verbal content of the conversation to assess depression in the patient.
- patient assessment system 300 includes real-time system 302 , a modeling system 304 , a clinical data server 306 , a patient device 312 , and a clinician device 314 that are connected to one another though a wide area network (WAN) 310 , that is the Internet in this illustrative embodiment.
- Real-time system 302 is a server computer system that administers the depression assessment test with the patient through patient device 312 .
- Modeling system 304 is a server computer system that combines a number of language, acoustic, and visual models to produce a composite model 2204 ( FIG. 22 ), using clinical data retrieved from clinical data server 306 and patient data collected from past assessments to train composite model 2204 .
- Clinical data server 306 FIG.
- Clinician device 314 is a client device that receives data representing a resulting assessment regarding depression from real-time system 302 .
- the systems disclosed herein may provide medical care professionals with a prediction of a mental state of a patient.
- the mental state may be depression, anxiety, or another mental condition.
- the systems may provide the medical care professionals with additional information, outside of the mental state prediction.
- the system may provide demographic information, such as age, weight, occupation, height, ethnicity, medical history, psychological history, and gender to medical care professionals via client devices, such as the client devices 260 a - n of FIG. 2 .
- the system may provide information from online systems or social networks to which the patient may be registered. The patient may opt in, by setting permissions on his or her client device to provide this information before the screening or monitoring process begins. The patient may also be prompted to enter demographic information during the screening or monitoring process.
- Patients may also choose to provide information from their electronic health records to medical care professionals.
- medical care professionals may interview patients during or after a screening or monitoring event to obtain the demographic information.
- patients may also enter information that specifies or constraints their interests. For example, they may enter topics that they do and/or do not wish to speak about.
- the terms “medical care provider” and “clinician” are used interchangeably.
- Medical care providers may be doctors, nurses, physician assistants, nursing assistants, clinical psychologists, social workers, technicians, or other health care providers.
- a clinician may set up the mental health assessment with the patient. This may include choosing a list of questions for the system to ask the patient, including follow-up questions.
- the clinician may add or remove specific questions from the assessment, or change an order in which the questions are administered to the patient.
- the clinician may be available during the assessment as a proctor, in order to answer any clarifying questions the patient may have.
- the system may provide the clinician with the dialogue between itself and the patient.
- This dialogue may be a recording of the screening or monitoring process, or a text transcript of the dialogue.
- the system may provide a summary of the dialogue between itself and the patient, using semantic analysis to choose segments of speech that were most important to predicting the mental state of the patient. These segments may be selected because they might be highly weighted in a calculation of a binary or scaled score indicating a mental state prediction, by example.
- the system may incorporate such a produced score into a summary report for the patient, along with semantic context taken from a transcript of the interview with the patient.
- the system may additionally provide the clinician with a “word cloud” or “topic cloud” extracted from a text transcript of the patient's speech.
- a word cloud may be a visual representation of individual words or phrases, with words and phrases used most frequently designated using larger font sizes, different colors, different fonts, different typefaces, or any combination thereof. Depicting word or phrase frequency in such a way may be helpful as depressed patients commonly say particular words or phrases with larger frequencies than non-depressed patients. For example, depressed patients may use words or phrases that indicate dark, black, or morbid humor.
- Depressed patients may talk about feeling worthless or feeling like failures, or use absoluteist language, such as “always”, “never”, or “completely.”
- Depressed patients may also use a higher frequency of first-person singular pronouns (e.g., “I”, “me”) and a lower frequency of second- or third-person pronouns when compared to the general population.
- the system may be able to train a machine learning algorithm to perform semantic analysis of word clouds of depressed and non-depressed people, in order to be able to classify people as depressed or not depressed based on their word clouds.
- Word cloud analysis may also be performed using unsupervised learning. For example, the system may analyze unlabeled word clouds and search for patterns, in order to separate people into groups based on their mental states.
- the systems described herein can output an electronic report identifying whether a patient is at risk of a mental or physiological condition.
- the electronic report can be configured to be displayed on a graphical user interface of a user's electronic device.
- the electronic report can include a quantification of the risk of the mental or physiological condition, e.g., a normalized score.
- the score can be normalized with respect the entire population or with respect to a sub-population of interest.
- the electronic report can also include a confidence level of the normalized score. The confidence level can indicate the reliability of the normalized score (i.e., the degree to which the normalized score can be trusted).
- the electronic report can include visual graphical elements.
- the visual graphical element may be a graph that shows the progression of the patient's scores over time.
- the electronic report may be output to the patient or a contact person associated with the patient, a healthcare provider, a healthcare payer, or another third-party.
- the electronic report can be output substantially in real-time, even while the screening, monitoring, or diagnosis is ongoing.
- the electronic report can be updated substantially in real-time and be re-transmitted to the user.
- the electronic report may include one or more descriptors about the patient's mental state.
- the descriptors can be a qualitative measure of the patient's mental state (e.g., “mild depression”).
- the descriptors can be topics that the patient mentioned during the screening.
- the descriptors can be displayed in a graphic, e.g., a word cloud.
- the models described herein may be optimized for a particular purpose or based on the entity that may receive the output of the system.
- the models may be optimized for sensitivity in estimating whether a patient has a mental condition.
- Healthcare payers such as insurance companies may prefer such models so that they can minimize the number of insurance payments made to patients with false positive diagnoses.
- the models may be optimized for specificity in estimating whether a patient has a mental condition.
- Healthcare providers may prefer such models.
- the system may select the appropriate model based on the stakeholder to which the output will be transmitted. After processing, the system can transmit the output to the stakeholder.
- the models described herein can alternatively be tuned or configured to process speech and other data according to a desired level of sensitivity or a desired level of specificity determined by a clinician, healthcare provider, insurance company, or government regulated body.
- the system may be used to monitor teenagers for depression.
- the system may perform machine learning analysis on groups of teenagers in order to determine voice-based biomarkers that may uniquely classify teenagers as being at risk for depression. Depression in teenagers may have different causes than in adults. Hormonal changes may also introduce behaviors in teenagers that would be atypical for adults. A system for screening or monitoring teenagers would need to employ a model tuned to recognize these unique behaviors. For example, depressed or upset teenagers may be more prone to anger and irritability than adults, who may withdraw when upset. Thus, questions from assessments may elicit different voice-based biomarkers from teenagers than adults. Different screening or monitoring methods may be employed when testing teenagers for depression, or studying teenagers' mental states, than are employed for screening or monitoring adults.
- Clinicians may modify assessments to particularly elicit voice-based biomarkers specific to depression in teenagers.
- the system may be trained using these assessments, and determine a teenager-specific model for predicting mental states. Teenagers may further be segmented by household (foster care, adoptive parent(s), two biological parents, one biological parent, care by guardian/relative, etc.), medical history, gender, age (old vs. young teenager), and socioeconomic status, and these segments may be incorporated into the model's predictions.
- the system may also be used to monitor the elderly for depression and dementia.
- the elderly may also have particular voice-based biomarkers that younger adults may not have.
- the elderly may have strained or thin voices, owing to aging.
- Elderly people may exhibit aphasia or dysarthria, have trouble understanding survey questions, follow-ups, or conversational speech, and may use repetitive language.
- Clinicians may develop, or algorithms may be used to develop, surveys for eliciting particular voice-based biomarkers from elderly patients.
- Machine learning algorithms may be developed to predict mental states in elderly patients, specifically, by segmenting patients by age. Differences may be present in elderly patients from different generations (e.g., Greatest, Silent, Boomer), who may have different views on gender roles, morality, and cultural norms.
- Models may be trained to incorporate elder age brackets, gender, race, socioeconomic status, physical medical conditions, and family involvement.
- the system may be used to test airline pilots for mental fitness. Airline pilots have taxing jobs, and may experience large amounts of stress and fatigue on long flights. Clinicians or algorithms may be used to develop screening or monitoring methods for these conditions. For example, the system may base an assessment off of queries similar to those tested in the Minnesota Multiphasic Personality Inventory (MMPI) and MMPI-2.
- MMPI Minnesota Multiphasic Personality Inventory
- the system may also be used to screen military personnel for mental fitness.
- the system may implement an assessment that uses queries with similar subject matter to those asked on the Primary Care Post-Traumatic Stress Disorder for Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 (PC-PTSD-5) to test for PTSD.
- DSM Primary Care Post-Traumatic Stress Disorder for Diagnostic and Statistical Manual of Mental Disorders
- the system may screen military personnel for depression, panic disorder, phobic disorder, anxiety, and hostility.
- the system may employ different surveys to screen military personnel pre-and post-deployment.
- the system may segment military personnel by segmenting for occupation, and segment military personnel by branch, officer or enlisted, gender, age, ethnicity, number of tours/deployments, marital status, medical history, and other factors.
- the system may be used to evaluate prospective gun buyers, e.g., by implementing background checks. Assessments may be designed, by clinicians or algorithmically, to evaluate prospective buyers for mental fitness for owning a firearm.
- the survey may have a requirement to determine, using questions and follow-up questions, if a prospective gun buyer would be able to be certified as a danger to him or herself or others, by a court or other authority.
- Health screening or monitoring server 102 ( FIG. 1A ) is shown in greater detail in FIG. 4 and in even greater detail in FIG. 22 .
- health screening or monitoring server 102 includes interactive health screening or monitoring logic 402 and health care management logic 408 .
- health screening or monitoring server 102 includes screening or monitoring system data store 410 and model repository 416 .
- interactive health screening or monitoring logic 402 conducts an interactive conversation with the subject patient and estimates one or more health states of the patient by application of the models of runtime model server 504 ( FIG. 18 ) to audiovisual signals representing responses by the patient.
- interactive health screening or monitoring logic 402 may also operate in a passive listening mode, observing the patient outside the context of an interactive conversation with health screening or monitoring server 102 , e.g., during a session with a health care clinician, and estimating a health state of the patient from such observation.
- Health care management logic 408 makes expert recommendations in response to health state estimations of interactive health screening or monitoring logic 402 .
- Screening system data store 410 stores and maintains all user and patient data needed for, and collected by, screening or monitoring in the manner described herein.
- the conversational context of the health screening or monitoring system may improve one or more performance metrics associated with one or more machine learning algorithms used by the system.
- These metrics may include metrics such as an F1 score, an area under the curve (AUC), a sensitivity, a specificity, a positive predictive value (PPV), and an equal error rate.
- health screening or monitoring server 102 may be distributed across multiple computer systems.
- real-time, interactive behavior of health screening or monitoring server 102 e.g., interactive screening or monitoring server logic 502 and runtime model server logic 504 described below
- real-time, interactive behavior of health screening or monitoring server 102 is implemented in one or more servers configured to handle large amounts of traffic through WAN 110 ( FIG. 1A ) and computationally intensive behavior of health screening or monitoring server 102 (e.g., health care management logic 408 and model training logic 506 ) is implemented in one or more other servers configured to efficiently perform highly complex computation.
- Distribution of various loads carried by health screening or monitoring server 102 may be distributed among multiple computer systems.
- Interactive health screening or monitoring logic 402 is shown in greater detail in FIG. 5 .
- Interactive health screening or monitoring logic 402 includes interactive screening or monitoring server logic 502 , runtime model server logic 504 , and model training logic 506 .
- Interactive screening or monitoring server logic 502 conducts an interactive screening or monitoring conversation with the human patient; runtime model server logic 504 uses and adjusts a number of machine learning models to concurrently evaluate responsive audiovisual signals of the patient; and model training logic 506 trains models of runtime model server logic 504 .
- Interactive screening or monitoring server logic 502 is shown in greater detail in FIG. 6 and includes generalized dialogue flow logic 602 and input/output (I/O) logic 604 .
- I/O logic 604 affects the interactive screening or monitoring conversation by sending audiovisual signals to, and receiving audiovisual signals from, patient device 112 .
- I/O logic 604 receives data from generalized dialogue flow logic 602 that specifies questions to be asked of the patient and sends audiovisual data representing those questions to patient device 112 .
- the interactive screening or monitoring conversation is effected through PSTN 120 ( FIG.
- I/O logic 604 (i) sends an audiovisual signal to patient device 112 by sending data to a human, or automated, operator of call center 104 prompting the operator to ask a question in a telephone call with patient device 112 (or alternatively by sending data to a backend automated dialog system destined for patients) and (ii) receives an audiovisual signal from patient device 112 by receiving an audiovisual signal of the interactive screening or monitoring conversation forwarded by call center 104 .
- I/O logic 604 also sends at least portions of the received audiovisual signal of the interactive screening or monitoring conversation to runtime model server logic 504 ( FIG. 18 ) and model training logic 506 ( FIG. 19 ).
- the queries asked to patients, or questions, may be stored as nodes, while patient responses, collected as audiovisual signals, may be stored as edges.
- a screening or monitoring event, or set of screening or monitoring events, for a particular patient may be therefore represented as a graph.
- different answers to different follow-up questions may be represented as multiple spokes connecting a particular node to a plurality of other nodes.
- Different graph structures for different patients may be used as training examples for a machine learning algorithm as another method of determining a mental state classification for a patient. Classification may be performed by determining similarities between graphs of, for example, depressed patients. Equivalent questions, as discussed herein, may be labeled as such within the graph.
- the graphs may also be studied and analyzed to determine idiosyncrasies in interpretations of different versions of questions by patients.
- I/O logic 604 also receives results 1820 ( FIG. 18 ) from runtime server logic 504 that represent evaluation of the audiovisual signal.
- Generalized dialogue flow logic 602 conducts the interactive screening or monitoring conversation with the human patient.
- Generalized dialogue flow logic 602 determines what questions I/O logic 604 should ask of the patient and monitors the reaction of the patient as represented in results 1820 .
- generalized dialogue flow logic 602 determines when to politely conclude the interactive screening or monitoring conversation.
- Generalized dialogue flow logic 602 is shown in greater detail in FIG. 5 .
- Generalized dialogue flow logic 602 includes interaction control logic generator 702 .
- Interaction control logic generator 702 manages the interactive screening or monitoring conversation with the patient by sending data representing dialogue actions to I/O logic 604 ( FIG. 6 ) that direct the behavior of I/O logic 604 in carrying out the interactive screening or monitoring conversation.
- Examples of dialogue actions include asking a question of the patient, repeating the question, instructing the patient, politely concluding the conversation, changing aspects of a display of patient device 112 , and modifying characteristics of the speech presented by the patient by I/O logic 604 , i.e., pace, volume, apparent gender of the voice, etc.
- Interaction control logic generator 702 customizes the dialogue actions for the patient.
- Interaction control logic generator 702 receives data from screening or monitoring data store 210 that represents subjective preferences of the patient and a clinical and social history of the patient.
- the subjective preferences are explicitly specified by the patient, generally prior to any interactive screening or monitoring conversation, and include such things as the particular voice to be presented to the patient through I/O logic 604 , default volume and pace of the speech generated by I/O logic 604 , and display schemes to be used within patient device 112 .
- the clinical and social history of the patient may indicate that questions related to certain topics should be asked of the patient.
- Interaction control logic generator 702 uses the patient's preferences and medical history to set attributes of the questions to ask the patient.
- Interaction control logic generator 702 receives data from runtime model server logic 504 that represents analytical results of responses of the patient in the current screening or monitoring conversation.
- interaction control logic generator 702 receives data representing analytical results of responses, i.e., results 1820 ( FIG. 18 ) of runtime model server logic 504 and patient and results metadata from descriptive model and analytics 1812 that facilitates proper interpretation of the analytical results.
- Interaction control logic generator 702 interprets the analytical results in the context of the results metadata to determine the patient's current status.
- History and state machine 720 tracks the progress of the screening or monitoring conversation, i.e., which questions have been asked and which questions are yet to be asked.
- Question and dialogue action bank 710 is a data store that stores all dialogue actions that may be taken by interaction control logic generator 702 , including all questions that may be asked of the patient.
- history and state machine 720 informs question and dialogue action bank 710 as to which question is to be asked next in the screening or monitoring conversation.
- Interaction control logic generator 702 receives data representing the current state of the conversation and what questions are queued to be asked from history and state machine 720 . Interaction control logic generator 702 processes the received data to determine the next action to be taken by interactive screening or monitoring server logic 302 in furtherance of the screening or monitoring conversation. Once the next action is determined, interaction control logic generator 702 retrieves data representing the action from question and dialogue action bank 710 and sends a request to I/O logic 604 to perform the next action.
- step 802 generalized dialogue flow logic 602 selects a question or other dialogue action to initiate the conversation with the patient.
- Interaction control logic generator 702 receives data from history and state machine 720 that indicates that the current screening or monitoring conversation is in its initial state.
- Interaction control logic generator 702 receives data that indicates (i) subjective preferences of the patient and (ii) topics of relatively high pertinence to the patient.
- interaction control logic generator 702 selects an initial dialogue action with which to initiate the screening or monitoring conversation.
- the initial dialogue action may include (i) asking a common conversation-starting question such as “can you hear me?” or “are you ready to begin?”; (ii) asking a question from a predetermined script used for all patients; (iii) reminding the patient of a topic discussed in a previous screening or monitoring conversation with the patient and asking the patient a follow-up question on that topic; or (iv) presenting the patient with a number of topics from which to select using a user-interface technique on patient device 112 .
- interaction control logic generator 702 causes I/O logic 604 to carry out the initial dialogue action.
- Loop step 804 and next step 816 define a loop in which generalized dialogue flow logic 602 conducts the screening or monitoring conversation according to steps 806 - 814 until generalized dialogue flow logic 602 determines that the screening or monitoring conversation is completed.
- step 806 interaction control logic generator 702 causes I/O logic 604 to carry out the selected dialogue action.
- the dialogue action is selected in step 802 .
- the dialogue action is selected in step 814 as described below.
- step 808 generalized dialogue flow logic 602 receives an audiovisual signal of the patient's response to the question. While processing according to logic flow diagram 800 is shown in a manner that suggests synchronous processing, generalized dialogue flow logic 602 performs step 808 effectively continuously during performance of steps 802 - 816 and processes the conversation asynchronously. The same is true for steps 810 - 814 .
- I/O logic 604 sends the audiovisual signal received in step 808 to runtime model server logic 504 , which processes the audiovisual signal in a manner described below.
- I/O logic 604 of generalized dialogue flow logic 602 receives multiplex data from runtime model server logic 504 and produces therefrom an intermediate score for the screening or monitoring conversation so far.
- the results data include analytical results data and results metadata.
- I/O logic 604 determines to what degree the screening or monitoring conversation has completed screening or monitoring for the target health state(s) of the patient, (ii) identifies any topics in the patient's response that warrant follow-up questions, and (iii) identifies any explicit instructions from the patient for modifying the screening or monitoring conversation. Examples of the last include patient statements such as “can you speak louder?”, “can you repeat that?” or “what?”, and “please speak more slowly.”
- generalized dialogue flow logic 602 selects the next question to ask the subject patient, along with other dialogue actions to be performed by I/O logic 604 , in the next performance of step 806 .
- interaction control logic generator 702 receives dialogue state data from history and state machine 720 regarding the question to be asked next, (ii) receives intermediate results data from I/O logic 604 representing evaluation of the patient's health state so far, and (iii) receives patient preferences and pertinent topics.
- Generalized dialogue flow logic 602 repeats the loop of steps 804 - 816 until interaction control logic generator 702 determines that the screening or monitoring conversation is complete, at which point generalized dialogue flow logic 602 politely terminates the screening or monitoring conversation.
- the screening or monitoring conversation is complete when (i) all mandatory questions have been asked and answered by the patient and (ii) the measure of confidence in the score resulting from screening or monitoring determined in step 812 is at least a predetermined threshold. It should be noted that confidence in the screening or monitoring is not symmetrical.
- the screening or monitoring conversation seeks to detect specific health states in the patient, e.g., depression and anxiety. If such states are detected quickly, they're detected. However, absence of such states is not assured by failing to detect them immediately. More generally, absence of proof is not proof of absence. Thus, generalized dialogue flow logic 602 finds confidence in early detection but not in early failure to detect.
- health screening or monitoring server 102 ( FIG. 4 ) estimates the current health state, e.g., mood, of the patient using a spoken conversation with the patient through patient device 112 .
- Interactive screening or monitoring server logic 502 sends data representing the resulting screening or monitoring of the patient to the patient's doctor or other clinicians by sending the data to clinician device 114 .
- interactive screening or monitoring server logic 502 records the resulting screening or monitoring in screening or monitoring system data store 410 .
- a top priority of generalized dialogue flow logic 602 is to elicit speech from the patient that is highly informative with respect to the health state attributes for which health screening or monitoring server 102 screens the patient. For example, in this illustrative embodiment, health screening or monitoring server 102 screens most patients for depression and anxiety.
- the analysis performed by runtime model server logic 504 is most accurate when presented with patient speech of a particular quality.
- speech quality refers to the sincerity with which the patient is speaking. Generally speaking, high quality speech is genuine and sincere, while poor quality speech is from a patient not engaged in the conversation or being intentionally dishonest.
- generalized dialogue flow logic 602 may invite the patient to engage interactive screening or monitoring server logic 502 as an audio diary whenever the patient is so inclined. Voluntary speech by the patient whenever motivated tends to be genuine and sincere and therefore highly informative.
- Generalized dialogue flow logic 602 may also select topics that are pertinent to the patient. These topics may include topics specific to clinical and social records of the patient and topics specific to interests of the patient. Using topics of interest to the patient may have the negative effect of influencing the patient's mood. For example, asking the patient about her favorite sports team may cause the patient's mood to rise or fall with the most recent news of the team. Accordingly, generalized dialogue flow logic 602 distinguishes health-relevant topics of interest to the patient from health-irrelevant topics of interest to the patient. For example, questions related to an estranged relative of the patient may be health-relevant while questions related to the patient's favorite television series are typically not.
- patient device 112 displays a video representation of a speaker, i.e., an avatar, to the patient
- patient preferences include, in addition to the preferred voice, physical attributes of the appearance of the avatar.
- generalized dialogue flow logic 602 may use a synthetic voice and avatar chosen for the first screening or monitoring conversation and, in subsequent screening or monitoring conversations, change the synthetic voice and avatar and compare the degree of informativeness of the patient's responses to determine which voice and avatar elicit the most informative responses.
- the voice and avatar chosen for the initial screening or monitoring conversation may be chosen according to which voice and avatar tends to elicit the most informative speech among the general population or among portions of the general population sharing one or more phenotypes with the patient. The manner in which the informativeness of responses elicited by a question is determined is described below.
- generalized dialogue flow logic 602 inserts a synthetic backchannel in the conversation.
- generalized dialogue flow logic 602 may utter “uh-huh” during short pauses in the patient's speech to indicate that generalized dialogue flow logic 602 is listening and interested in what the patient has to say.
- generalized dialogue flow logic 602 may cause the video avatar to exhibit non-verbal behavior (sometimes referred to as “body language”) to indicate attentiveness and interest in the patient.
- Generalized dialogue flow logic 602 also selects questions that are of high quality. Question quality is measured in the informativeness of responses elicited by the question. In addition, generalized dialogue flow logic 602 avoids repetition of identical questions in subsequent screening or monitoring conversations, substituting equivalent questions when possible. The manner in which questions are determined to be equivalent to one another is described more completely below.
- question and adaptive action bank 710 ( FIG. 5 ) is a data store that stores all dialogue actions that may be taken by interaction control logic generator 702 , including all questions that may be asked of the patient.
- Question and adaptive action bank 710 is shown in greater detail in FIG. 9 .
- Question and adaptive action bank 710 is shown in greater detail in FIG. 7 and includes a number of question records 902 and a dialogue 912 .
- Each of question records 902 includes data representing a single question that may be asked of a patient.
- Dialogue 912 is a series of questions to ask a patient in a spoken conversation with the patient.
- Each of question records 902 includes a question body 904 , a classification 906 , a quality 908 , and an equivalence 910 .
- Question body 904 includes data specifying the substantive content of the question, i.e., the sequence of words to be spoken to the patient to effect asking of the question.
- Topic 906 includes data specifying a hierarchical topic category to which the question belongs. Categories may correlate to (i) specific health diagnoses such as depression, anxiety, etc.; (ii) specific symptoms such as insomnia, lethargy, general disinterest, etc.; and/or (iii) aspects of a patient's treatment such as medication, exercise, etc.
- Quality 908 includes data representing the quality of the question. The quality of the question is a measure of informativeness of responses elicited by the question.
- Equivalence 910 is data identifying one or more other questions in question records 902 that are equivalent to the question represented by this particular one of question records 902 . In this illustrative embodiment, only questions of the same topic 906 may be considered equivalent.
- Dialogue 912 includes an ordered sequence of questions 914 A-N, each of which identifies a respective one of question records 902 to ask in a spoken conversation with the patient.
- the spoken conversation begins with twenty (20) preselected questions and may include additional questions as necessary to produce a threshold degree of confidence to conclude the conversation of logic flow diagram 600 ( FIG. 6 ).
- the preselected questions include, in order, five (5) open-ended questions of high quality, eight (8) questions of the standard and known PHQ-8 screening or monitoring tool for depression, and the seven (7) questions of the standard and known GAD-7 screening or monitoring tool for anxiety. In other examples, the questions may be generated algorithmically.
- Dialogue 912 specifies these twenty (20) questions in this illustrative embodiment.
- interaction control logic generator 702 determines the next question to ask the patient in step 814 .
- One embodiment of step 814 is shown as logic flow diagram 1014 ( FIG. 8 ).
- interaction control logic generator 702 dequeues a question from dialogue 912 , treating the ordered sequence of questions 914 A-N as a queue.
- History and state machine 720 keeps track of which of questions 914 A-N is next.
- interaction control logic generator 702 selects questions from those of question records 902 with the highest quality 908 and pertaining to topics selected for the patient.
- interaction control logic generator 702 may select one as the dequeued question randomly with each question weighted by its quality 908 and its closeness to suggested topics.
- interaction control logic generator 702 collects all equivalent questions identified by equivalence 910 ( FIG. 9 ) for the question dequeued in step 1002 .
- interaction control logic generator 702 selects a question from the collection of equivalent questions collected in step 1004 , including the question dequeued in step 1002 itself.
- Interaction control logic generator 702 may select one of the equivalent questions randomly or using information about prior interactions with the patient, e.g., to select the one of the equivalent questions least recently asked of the patient.
- Interaction control logic generator 702 processes the selected question as the next question in the next iteration of the loop of steps 804 - 816 ( FIG. 8 ). The use of equivalent questions is important.
- the quality of a question i.e., the degree to which responses the question elicits are informative in runtime model server logic 504 , decreases for a given patient over time. In other words, if a given question is asked to a given patient repeatedly, each successive response by the patient becomes less informative than it was in all prior asking's of the question. In a sense, questions become stale over time. To keep questions fresh, i.e., soliciting consistently informative responses over time, a given question is replaced with an equivalent, but different, question in a subsequent conversation. However, the measurement of equivalence may be accurate for comparison of responses to equivalent questions over time to be consistent.
- question management logic 916 includes question quality logic 1102 , which measures a question's quality, and question equivalence logic 1104 , which determines whether two (2) questions are equivalent in the context of health screening or monitoring server 102 .
- Question quality logic 1102 includes a number of metric records 1106 and metric aggregation logic 1112 .
- question quality logic 1102 uses a number of metrics to be applied to a question, each of which results in a numeric quality score for the question and each of which is represented by one of metric records 1106 .
- Each of metric records 1106 represents a single metric for measuring question quality and includes metric metadata 1108 and quantification logic 1110 .
- Metric metadata 1108 represents information about the metric of metric record 1106 .
- Quantification logic 1110 defines the behavior of question quality logic 1102 in evaluating a question's quality according to the metric of metric record 1106 .
- quantification logic 1110 retrieves all responses to a given question from screening or monitoring system data store 410 ( FIG. 4 ) and uses associated results data from screening or monitoring system data store 410 to determine the number of words in each of the responses. Quantification logic 1110 quantifies the quality of the question as a statistical measure of the number of words in the responses, e.g., a statistical mean thereof.
- the duration of elicited responses may be measured in a number of ways.
- the duration of the elicited response is simply the elapsed duration, i.e., the entire duration of the response as recorded in screening or monitoring system data store 410 .
- the duration of the elicited response is the elapsed duration less pauses in speech.
- the duration of the elicited response is the elapsed duration less any pause in speech at the end of the response.
- quantification logic 1110 retrieves all responses to a given question from screening or monitoring system data store 410 ( FIG. 4 ) and determines the duration of those responses. Quantification logic 1110 ( FIG. 11 ) quantifies the quality of the question as a statistical measure of the duration of the responses, e.g., a statistical mean thereof.
- semantic models of NLP model 1806 estimate a patient's health state from positive and/or negative content of the patient's speech.
- the semantic models correlate individual words and phrases to specific health states the semantic models are designed to detect.
- quantification logic 1110 retrieves all responses to a given question from collected patient data 410 ( FIG. 5 ) and uses the semantic models to determine correlation of each word of each response to one or more health states.
- An individual response's weighted word score is the statistical mean of the correlations of the weighted word scores.
- Quantification logic 1110 quantifies the quality of the question as a statistical measure of the weighted word scores of the responses, e.g., a statistical mean thereof.
- runtime model server logic 504 estimates a patient's health state from pitch and energy of the patient's speech as described below. How informative speech is to the various models of runtime model server logic 504 is directly related to how emotional the speech is.
- quantification logic 1110 retrieves all responses to a given question from screening or monitoring system data store 410 ( FIG. 4 ) and uses response data from runtime model server logic 504 to determine an amount of energy present in each response.
- Quantification logic 1110 quantifies the quality of the question as a statistical measure of the measured acoustic energy of the responses, e.g., a statistical mean thereof.
- the quality of a question is a measure of how similar responses to the question are to utterances recognized by runtime models 1802 ( FIG. 18 ) as highly indicative of a health state that runtime models 1802 are trained to recognize.
- quantification logic 1110 determines how similar deep learning machine features for all responses to a given question are to deep learning machine features for health screening or monitoring server 102 as a whole.
- Deep learning machine features are known but are described herein briefly to facilitate understanding and appreciation of the present invention. Deep learning is a sub-science of machine learning in that a deep learning machine is a machine learning machine, i.e., learning machine, that learns for itself how to distinguish one thing represented in data from another thing represented in data. The following is a simple example to illustrate the distinction.
- Such a learning machine is typically a computer process with multiple layers of logic.
- One layer is manually configured to recognize contiguous portions of an image with transitions from one color to another (e.g., light to dark, red to green, etc.). This is commonly referred to as edge detection.
- a subsequent layer receives data representing the recognized edges and is manually configured to recognize edges that join together to define shapes.
- a final layer receives data representing shapes and is manually configured to recognize a symmetrical grouping of triangles (cat's ears) and dark regions (eyes and nose). Other layers may be used between those mentioned here.
- the data received as input to any step in the computation including intermediate results from other steps in the computation, are called features.
- the results of the learning machine are called labels.
- the labels are “cat” and “no cat”.
- This manually configured learning machine may work reasonably well but may have significant shortcomings. For example, recognizing the symmetrical grouping of shapes might not recognize an image in which a cat is represented in profile. In a deep learning machine, the machine is trained to recognize cats without manually specifying what groups of shapes represent a cat. The deep learning machine may utilize manually configured features to recognize edges, shapes, and groups of shapes, however these are not a required component of a deep learning system. Features in a deep learning system may be learned entirely automatically by the algorithm based on the labeled training data alone.
- Training a deep learning machine to recognize cats in image data can, for example, involve presenting the deep learning machine with numerous, preferably many millions of, images and associated knowledge as to whether each image includes a cat, i.e., associated labels of “cat” or “no cat”. For each image received in training, the last, automatically configured layer of the deep learning machine receives data representing numerous groupings of shapes and the associated label of “cat” or “no cat”. Using statistical analysis and conventional techniques, the deep learning machine determines statistical weights to be given each type of shape grouping, i.e., each feature, in determining whether a previously unseen image includes a cat.
- features of the deep learning machine will likely include the symmetrical grouping of shapes manually configured into the learning machine as described above. However, these features will also likely include shape groupings and combinations of shape groupings not thought of by human programmers.
- the features of the constituent models of runtime model server logic 504 ( FIG. 18 ) specify precisely the type of responses that indicate a health state that the constituent models of runtime model server logic 504 are configured to recognize.
- these features represent an exemplary feature set.
- quantification logic 1110 retrieves all responses to the question from screening or monitoring system data store 410 and data representing the diagnoses associated with those responses and trains runtime models 1802 and model repository 416 using those responses and associated data.
- Quantification logic 1110 measures similarity between the feature set specific to the question and the exemplary feature set in a manner described below with respect to question equivalence logic 1104 .
- interaction control logic generator 702 uses quality 908 ( FIG. 9 ) of various questions in determining which question(s) to ask a particular patient.
- quality 908 FIG. 9
- metric aggregation logic 1112 FIG. 11
- the manner in which aggregation logic 1112 aggregates the measures of quality for a given question is illustrated by logic flow diagram 1200 ( FIG. 12 ).
- Loop step 1202 and next step 1210 define a loop in which metric aggregation logic 1112 processes each of metric records 1106 according to steps 1204 - 1208 .
- the particular one of metric records 1106 processed in an iteration of the loop of steps 1202 - 1210 is sometimes referred to as “the subject metric record”, and the metric represented by the subject metric record is sometimes referred to as “the subject metric.”
- metric aggregation logic 1112 evaluates the subject metric, using quantification logic 1110 of the subject metric record and all responses in screening or monitoring system data store 410 ( FIG. 4 ) to the subject question.
- test step 1206 FIG.
- metric aggregation logic 1110 determines whether screening or monitoring system data store 410 includes a statistically significant sample of responses to the subject question by the subject patient. If so, metric aggregation logic 1110 evaluates the subject metric using quantification logic 1110 and only data corresponding to the subject patient in screening or monitoring system data store 410 in step 1208 . Conversely, if collected patient data 410 does not include a statistically significant sample of responses to the subject question by the subject patient, metric aggregation logic 1112 skips step 1208 . Thus, metric aggregation logic 1112 evaluates the quality of a question in the context of the subject patient to the extent screening or monitoring system data store 410 contains sufficient data corresponding to the subject patient.
- metric metadata 1108 stores data specifying how metric aggregation logic 1112 is to include the associated metric in the aggregate measure in step 1212 .
- metric metadata 1108 may specify a weight to be attributed to the associated metric relative to other metrics.
- step 1212 ( FIG. 12 )
- processing according to logic flow diagram 1200 completes.
- equivalence 910 for a given question identifies one or more other questions in question records 902 that are equivalent to the given question. Whether two questions are equivalent is determined by question equivalence logic 1104 ( FIG. 11 ) by comparing similarity between the two questions to a predetermined threshold. The similarity here is not how similar the words and phrasing of the sentences are but instead how similarly models of runtime model server 504 and model repository 416 sees them. The predetermined threshold is determined empirically.
- Question equivalence logic 1104 measures the similarity between two questions in a manner illustrated by logic flow diagram 1300 ( FIG. 13 ).
- Loop step 1302 and next step 1306 define a loop in which question equivalence logic 1304 processes each of metric records 1106 according to step 1304 .
- the particular one of metric records 1306 processed in an iteration of the loop of steps 1302 - 1306 is sometimes referred to as “the subject metric record”, and the metric represented by the subject metric record is sometimes referred to as “the subject metric.”
- question equivalence logic 1104 evaluates the subject metric for each of the two questions. Once all metrics have been processed according to the loop of steps 1302 - 1106 , processing by question equivalence logic 1104 transfers to step 1308 .
- question equivalence logic 1104 combines the evaluated metrics for each question into a respective multi-dimensional vector for each question.
- step 1310 question equivalence logic 1104 normalizes both vectors to have a length of 1.0.
- step 1312 question equivalence logic 1104 determines an angle between the two normalized vectors.
- step 1314 the cosine of the angle determined in step 1312 is determined by question equivalence logic 1104 to be the measured similarity between the two questions.
- the similarity between two questions ranges from ⁇ 1.0 to 1.0, 1.0 being perfectly equivalent.
- the predetermined threshold is 0.98 such that two questions have a measured similarity of at least 0.98 are considered equivalent and are so represented in equivalence 910 ( FIG. 9 ) for both questions.
- assessment test administrator 2202 administers a depression assessment test to the subject patient by conducting an interactive spoken conversation with the subject patient through patient device 312 .
- the manner in which assessment test administrator 2202 does so is illustrated in logic flow diagram 1400 ( FIG. 14 ).
- the test administrator 2202 may be a computer program configured to questions to the patient.
- the questions may be algorithmically generated questions.
- the questions may be generated by, for example, a natural language processing (NLP) algorithm. Examples of NLP algorithms are semantic parsing, sentiment analysis, vector-space semantics, and relation extraction.
- NLP natural language processing
- the methods described herein may be able to generate an assessment without requiring the presence or intervention of a human clinician.
- the methods described herein may be able to be used to augment or enhance clinician-provided assessments, or aid a clinician in providing an assessment.
- the assessment may include queries containing subject matter that has been adapted or modified from screening or monitoring methods, such as the PHQ-9 and GAD-7 assessments.
- the assessment herein may not merely use the questions from such surveys verbatim, but may adaptively modify the queries based at least in part on responses from subject patients.
- step 1402 assessment test administrator 2202 optimizes the testing environment. Step 1402 is shown in greater detail in logic flow diagram 1402 ( FIG. 15 ).
- assessment test administrator 2202 initiates the spoken conversation with the subject patient.
- assessment test administrator 2202 initiates a conversation by asking the patient the initial question of the assessment test.
- the initial question is selected in a manner described more completely below.
- the exact question asked isn't particularly important. What is important is that the patient responds with enough speech that assessment test administrator 2202 may evaluate the quality of the video and audio signal received from patient device 312 .
- Assessment test administrator 2202 receives and processes audiovisual data from patient device 312 throughout the conversation.
- Loop step 1504 and next step 1510 define a loop in which assessment test administrator 2202 processes the audiovisual signal according to steps 1506 - 1508 until assessment test administrator 2202 determines that the audiovisual signal is of high quality or at least of adequate quality to provide accurate assessment.
- assessment test administrator 2202 evaluates the quality of the audiovisual signal received from patient device 312 .
- assessment test administrator 2202 measures the volume of speech, the clarity of the speech, and to what degree the patient's face and, when available, body is visible.
- assessment test administrator 2202 reports the evaluation to the patient.
- assessment test administrator 2202 generates an audiovisual signal that represents a message to be played to the patient through patient device 312 . If the audiovisual signal received from patient device 312 is determined by assessment test administrator 2202 to be of inadequate quality, the message asks the patient to adjust her environment to improve the signal quality. For example, if the audio portion of the signal is poor, the message may be “I'm having trouble hearing you. may you move the microphone closer to you or find a quieter place?” If the patient's face and, when available, body isn't clearly visible, the message may be “I can't see your face (and body).
- step 1508 processing by assessment test administrator 2202 transfers through next step 1510 to loop step 1504 and assessment test administrator 2202 continues processing according to the loop of steps 1504 - 1510 until the received audiovisual is adequate or is determined to be as good as it will get for the current assessment test. It is preferred that subsequent performances of step 1508 are responsive to any speech by the patient. For example, the patient may attempt to comply with a message to improve the environment with the question, “Is this better?” The next message sent in reporting of step 1508 should include an answer to the patient's question.
- composite model 2204 includes a language model component, so assessment test administrator 2202 necessarily performs speech recognition.
- processing by assessment test administrator 2202 according to the loop of steps 1504 - 1510 completes.
- Loop step 1404 and next step 1416 define a loop in which assessment test administrator 2202 conducts the spoken conversation of the assessment test according to steps 1406 - 1414 until assessment test administrator 2202 determines that the assessment test is completed.
- assessment test administrator 2202 asks a question of the patient in furtherance of the spoken conversation.
- assessment test administrator 2202 uses a queue of questions to ask the patient, and that queue is sometimes referred to herein as the conversation queue.
- the queue may be prepopulated with questions to be covered during the assessment test.
- these questions cover the same general subject matter covered by currently used written assessment tests such as the PHQ-9 and GAD-7.
- assessment test administrator 2202 may require more audio and video than provided by one-word answers. Accordingly, it is preferred that the initially queued questions be more open-ended.
- the initial questions pertain to the topics of general mood, sleep, and appetite.
- An example of an initial question pertaining to sleep is question 1702 ( FIG. 17 ): “How have you been sleeping recently?” This question is intended to elicit a sentence or two from the patient to thereby provide more audio and video of the patent than would ordinarily be elicited by a highly directed question.
- step 1408 assessment test administrator 2202 receives an audiovisual signal of the patient's response to the question. While processing according to logic flow diagram 1400 is shown in a manner that suggests synchronous processing, assessment test administrator 2202 performs step 1408 effectively continuously during performance of steps 1402 - 1416 and processes the conversation asynchronously. The same is true for steps 1410 - 1414 .
- assessment test administrator 2202 processes the audiovisual signal received in step 1408 using composite model 2204 .
- assessment test administrator 2202 produces an intermediate score for the assessment test according to the audiovisual signal received so far.
- step 1414 assessment test administrator 2202 selects the next question to ask the subject patient in the next performance of step 1406 , and processing transfers through next step 1416 to loop step 1404 .
- Step 1414 is shown in greater detail as logic flow diagram 1414 ( FIG. 16 ).
- FIG. 16 may be construed to follow from step 814 from FIG. 8 .
- assessment test administrator 2202 identifies significant elements in the patient's speech.
- assessment test administrator 2202 uses language portions of composite model 2204 to identify distinct assertions in the portion of the audiovisual signal received after the last question asked in step 1406 ( FIG. 14 ). That portion of the audiovisual signal is sometimes referred to herein as “the patient's response” in the context of a particular iteration of the loop of steps 1604 - 1610 .
- conversation 1700 An example of a conversation conducted by assessment test administrator 2202 of real-time system 302 and patient device 312 is shown in ( FIG. 17 ). It should be appreciated that conversation 1700 is illustrative only. The particular questions to ask, which parts of the patient's response are significant, and the depth to which any topic is followed is determined by the type information to be gathered by assessment test administrator 2202 and is configured therein. In step 1702 , assessment test administrator 2202 asks the question, “How have you been sleeping recently?” The patient's response is “Okay . . . I've been having trouble sleeping lately. I have meds for that.
- assessment test administrator 2202 identifies three (3) significant elements in the patient's response: (i) “trouble sleeping” suggests that the patient has some form of insomnia or at least that sleep is poor; (ii) “I have meds” suggests that the user is taking medication; and (iii) “They seem to help” suggests that the medication taken by the user is effective.
- each of these significant elements is processed by assessment test administrator 2202 in the loop of steps 1604 - 1610 .
- Loop step 1604 and next step 1610 define a loop in which assessment test administrator 2202 processes each significant element of the patient's answer identified in step 1602 according to steps 1606 - 1608 .
- the particular significant element processed is sometimes referred to as “the subject element.”
- assessment test administrator 2202 processes the subject element, recording details included in the element and identifying follow-up questions. For example, in conversation 1700 ( FIG.
- assessment test administrator 2202 identifies three (3) topics for follow-up questions for the element of insomnia: (i) type of insomnia (initial, middle, or late), (ii) the frequency of insomnia experienced by the patient, and (iii) what medication if any the patient is taking for the insomnia.
- step 1608 assessment test administrator 2202 enqueues any follow-up questions identified in step 1606 .
- processing by assessment test administrator 2202 transfers through next step 1610 to loop step 1604 until assessment test administrator 2202 has processed all significant elements of the patient's response according to the loop of steps 1604 - 1610 .
- processing transfers from loop step 1604 to step 1612 .
- FIG. 17 shows a particular instantiation of a conversation proceeding between the system and a patient.
- the queries and replies disclosed herein are exemplary and should not be construed as being required to follow the sequence disclosed in FIG. 17 .
- assessment test administrator 2202 identifies and enqueues follow-up topics regarding the type insomnia and any medication taken for the insomnia.
- assessment test administrator 2202 In processing the response element of medication in the patient's response, assessment test administrator 2202 observes that the patient is taking medication. In step 1606 , assessment test administrator 2202 records that fact and, identifying a queued follow-up question regarding medication for insomnia, processes the medication element as responsive to the queued question.
- step 1608 for the medication element assessment test administrator 2202 enqueues follow-up questions regarding the particular medicine and dosage used by the patient and its efficacy as shown in step 1708 .
- questions in the conversation queue are hierarchical. In the hierarchy, each follow-up question is a child of the question for which the follow-up question follows up. The latter question is the parent of the follow-up question.
- assessment test administrator 2202 implements a pre-order depth-first walk of the conversation queue hierarchy. In other words, all child questions of a given question are processed before processing the next sibling question. In conversational terms, all follow-up questions of a given question are processed before processing the next question at the same level, recursively.
- assessment test administrator 2202 processes all follow-up questions of the type of insomnia before processing the questions of frequency and medication and any of their follow-up questions. This is the way conversations happen naturally—staying with the most recently discussed topic until complete before returning to a previously discussed topic.
- assessment test administrator 2202 may be influenced by the responses of the patient.
- a follow-up question regarding the frequency of insomnia precedes the follow-up question regarding medication.
- assessment test administrator 2202 changes the sequence of follow-up questions such that the follow-up question regarding medication is processed prior to processing the follow-up question regarding insomnia frequency. Since medication was mentioned by the patient, we'll discuss that before adding new subtopics to the conversation. This is another way in which assessment test administrator 2202 is responsive to the patient.
- assessment test administrator 2202 In processing the response element of medication efficacy (i.e., “They seem to help.”), assessment test administrator 2202 records that the medication is moderately effective. Seeing that the conversation queue includes a question regarding the efficacy of medication, assessment test administrator 2202 applies this portion of the patient's response as responsive to the queued follow-up question in step 1710 .
- step 1612 assessment test administrator 2202 dequeues the next question from the conversation queue and processing according to logic flow diagram 1414 , and therefor step 1414 , completes and the conversation continues.
- step 1414 Prior to returning to discussion of ( FIG. 14 ), it is helpful to consider additional performances of step 1414 , and therefore logic flow diagram 1414 , in the context of illustrative conversation 1700 .
- the question dequeued as the next question in this illustrative embodiment asks about the patient's insomnia, trying to discern the type of insomnia. It is appreciated that conventional thinking as reflected in the PHQ-9 and GAD-7 is that the particular type of sleep difficulties experienced by a test subject isn't as strong an indicator of depression as the mere fact that sleep is difficult.
- the next question is related to the type of insomnia.
- the question is intentionally as open-ended as possible while still targeted at specific information: “Have you been waking up in the middle of the night?” See question 1712 . While this question may elicit a “Yes” or “No” answer, it may also elicit a longer response, such as response 1714 : “No. I just have trouble falling asleep.”
- processing according to logic flow diagram 1414 , and therefore step 1414 ( FIG. 14 ) completes.
- assessment test administrator 2202 continues the illustrative example of conversation 1700 .
- assessment test administrator 2202 asks question 1712 ( FIG. 17 ).
- assessment test administrator 2202 receives response 1714 .
- Assessment test administrator 2202 processes response 1714 in the next performance of step 1414 .
- assessment test administrator 2202 identifies a single significant element, namely, that the patient has trouble falling asleep and doesn't wake in the middle of the night.
- assessment test administrator 2202 records the type of insomnia (see step 1716 ) and, in this illustrative embodiment, there are no follow-up questions related to that.
- assessment test administrator 2202 dequeues the next question from the conversation queue. Since no follow-up questions for the type of insomnia and whether the patient is treating the insomnia with medication have already been answered, the next question is the first child question related to medication, namely, the particular medication taken by the patient.
- assessment test administrator 2202 forms the question, namely, which particular medication the patient is taking for insomnia.
- assessment test administrator 2202 asks that question in the most straight-forward way, e.g., “You said you're taking medication for your insomnia. Which drug are you taking?” This has the advantage of being open-ended and eliciting more speech than would a simple yes/no question.
- assessment test administrator 2202 accesses clinical data related to the patient to help identify the particular drug used by the patient.
- the clinical data may be received from modeling system 302 ( FIG. 22 ), using clinical data 2220 , or from clinical data server 306 ( FIG. 3 ).
- assessment test administrator 2202 may ask a more directed question using the assumed drug's most common name and generic name. For example, if the patient's data indicates that the patient has been prescribed Zolpidem (the generic name of the drug sold under the brand name, Ambien), question 1720 ( FIG. 17 ) may be, “You said you're taking medication for insomnia.
- Zolpidem the generic name of the drug sold under the brand name, Ambien
- assessment test administrator 2202 determines whether to ask a highly directed question rather than a more open-ended question based on whether requisite clinical data for the patient is available and to what degree additional speech is needed to achieve an adequate degree of accuracy in assessing the state of the patient.
- conversation 1700 continues with assessment test administrator 2202 recording the substance of response 1722 in step 1724 .
- Assessment test administrator 2202 determines the responsiveness to the patient also in the manner assessment test administrator 2202 determines whether the patient has completed her response to the most recently asked question, e.g., in determining when an answer received in step 1408 is complete and selection of the next question in step 1414 may begin.
- assessment test administrator 2202 avoids interrupting the patient as much as possible. It helpful to consider response 1704 : “Okay . . . I've been having trouble sleeping lately. I have meds for that. They seem to help.” The ellipsis after “Okay.” indicates a pause in replying by the patient. To this end, assessment test administrator 2202 waits long enough to permit the patient to pause briefly without interruption but not so long as to cause the patient to believe that assessment test administrator 2202 has become unresponsive, e.g., due to a failure of assessment test administrator 2202 or the communications links therewith. Moreover, pauses in speech are used in assessment as described more completely below and assessment test administrator 2202 should avoid interfering with the patient's speech fluency.
- assessment test administrator 2202 uses two pause durations, a short one and a long one. After a pause for the short duration, assessment test administrator 2202 indicates that assessment test administrator 2202 continues to listen by playing a very brief sound that acknowledges an understanding and a continuation of listening, e.g., “uh-huh” or “mmm-hmmm”. After playing the message, assessment test administrator 2202 waits during any continued pause for the long duration. If the pause continues that long, assessment test administrator 2202 determines that the patient has completed her response.
- assessment test administrator 2202 continues to adjust these durations for the patient whenever interacting with the patient.
- Assessment test administrator 2202 recognizes durations that are too short when observing cross-talk, i.e., when speed is being received from the patient while assessment test administrator 2202 concurrently plays any sound.
- Assessment test administrator 2202 recognizes durations that are too long when (i) the patient explicitly indicates so (e.g., saying “Hello?” or “Are you still there?”) and/or (ii) the patient's response indicates increased frustration or agitation relative to the patient's speech earlier in the same conversation.
- the conversation is terminated politely by assessment test administrator 2202 when the assessment test is complete.
- the assessment test is complete when (i) the initial questions in the conversation queue and all of their descendant questions have been answered by the patient or (ii) the measure of confidence in the score resulting from assessment determined in step 1412 is at least a predetermined threshold. It should be noted that confidence in the assessment is not symmetrical.
- the assessment test seeks depression, or other behavioral health conditions, in the patient. If it's found quickly, it's found. However, its absence is not assured by failing to find it immediately. Thus, assessment test administrator 2202 finds confidence in early detection but not in early failure to detect.
- real-time system 302 assesses the current mental state of the patient using an interactive spoken conversation with the patient through patient device 312 .
- Assessment test administrator 2202 sends data representing the resulting assessment of the patient to the patient's doctor or other clinician by sending the data to clinician device 314 .
- assessment test administrator 2202 records the resulting assessment in clinical data 2220 .
- assessment test administrator 2202 While assessment test administrator 2202 is described as conducting an interactive spoken conversation with the patient to assess the mental state of the patient, in other embodiments, assessment test administrator 2202 passively listens to the patient speaking with the clinician and assesses the patient's speech in the manner described herein.
- the clinician may be a mental health professional, a general practitioner or a specialist such as a dentist, cardiac surgeon, or an ophthalmologist.
- assessment test administrator 2202 passively listens to the conversation between the patient and clinician through patient device 312 upon determining that the patient is in conversation with the clinician, e.g., by a “START” control on the clinician's iPad.
- assessment test administrator 2202 Upon determining that the conversation between the patient and clinician is completed, e.g., by a “STOP” control on the clinician's iPad, assessment test administrator 2202 ceases passively listening and assessing speech in the manner described above. In addition, since patient device 312 is listening passively and not prompting the patient, assessment test administrator 2202 makes no attempt to optimize the audiovisual signal received through patient device 312 and makes no assumption that faces in any received video signal are that of the patient.
- the clinician asks the patient to initiate listening by assessment test administrator 2202 and the patient does so by issuing a command through patient device 312 that directs assessment test administrator 2202 to begin listening.
- the clinician asks the patient to terminate listening by assessment test administrator 2202 and the patient does so by issuing a command through patient device 312 that directs assessment test administrator 2202 to cease listening.
- assessment test administrator 2202 listens to the conversation between the patient and the clinician through clinician device 314 .
- the clinician may manually start and stop listening by assessment test administrator 2202 through clinician device 314 using conventional user-interface techniques.
- assessment test administrator 2202 assesses the patient's speech and not the clinician's speech. Assessment test administrator 2202 may distinguish the voices in any of a number of ways, e.g., by a “MUTE” control on the clinician's iPad. In embodiments in which assessment test administrator 2202 listens through patient device 312 , assessment test administrator 2202 uses acoustic models (e.g., acoustic models 2218 ) to distinguish the two voices. Assistant test administrator 2202 identifies the louder voice as that of the patient, assuming patient device 312 is closer to the patient than to the clinician. This may also be the case in embodiments in which clinician device 312 is set up to hear the patient more loudly.
- acoustic models e.g., acoustic models 2218
- clinician device 314 may be configured to listen through a highly directional microphone that the clinician directs toward the patient such that any captured audio signal represents the patient's voice much more loudly than other, ambient sounds such as the clinician's voice.
- Assessment test administrator 2202 may further distinguish the patient's voice from the clinician's voice using language models 2214 , particularly, semantic pattern models such as semantic pattern modules 4004 , to identify which of the two distinguished voices more frequently asks questions.
- Assessment test administrator 2202 may further distinguish the patient's voice from the clinician's voice using acoustic models 2016 , which may identify and segment out the clinician's voice from an acoustic analysis of the clinician's voice performed prior to the clinical encounter.
- assessment test administrator 2202 assesses the mental state of the patient from the patient's speech in the manner described herein and finalizes the assessment upon detecting the conclusion of the conversation.
- Runtime model server logic 704 processes audiovisual signals representing the patient's responses in the interactive screening or monitoring conversation and, while the conversation is ongoing, estimates the current health of the patient from the audiovisual signals.
- ASR logic 1804 is logic that processes speech represented in the audiovisual data from I/O logic 604 ( FIG. 6 ) to identify words spoken in the audiovisual signal. The results of ASR logic 1804 ( FIG. 18 ) are sent to runtime models 1802 .
- Runtime models 1802 also receive the audiovisual signals directly from I/O logic 604 . In a manner described more completely below, runtime models 1802 combine language, acoustic, and visual models to produce results 1820 from the received audiovisual signal. In turn, interactive screening or monitoring server logic 702 uses results 1820 in real time as described above to estimate the current state of the patient and to accordingly make the spoken conversation responsive to the patient as described above.
- ASR logic 1804 In addition to identifying words in the audiovisual signal, ASR logic 1804 also identifies where in the audiovisual signal each word appears and a degree of confidence in the accuracy of each identified word in this illustrative embodiment. ASR logic 1804 may also identify non-verbal content of the audiovisual signals, such as laughter and fillers for example, along with location and confidence information. ASR logic 1804 makes such information available to runtime models 1802 .
- Runtime models 1802 include descriptive model and analytics 1812 , natural language processing (NLP) model 1806 , acoustic model 1808 , and visual model 1810 .
- NLP natural language processing
- NLP model 1806 includes a number of text-based machine learning models to (i) predict depression, anxiety, and perhaps other health states directly from the words spoken by the patient and (ii) model factors that correlate with such health states.
- Examples of machine learning that models health states directly include sentiment analysis, semantic analysis, language modeling, word/document embeddings and clustering, topic modeling, discourse analysis, syntactic analysis, and dialogue analysis. Models do not need to be constrained to one type of information.
- a model may contain information for example from both sentiment and topic based features.
- NLP information includes the score output of specific modules for example the score from a sentiment detector trained for sentiment rather than for mental health state.
- NLP information includes that obtained via transfer learning based systems.
- NLP model 1806 stores text metadata and modeling dynamics and shares that data with acoustic model 1808 , visual model 1810 , and descriptive model and analytics 1812 .
- Text data may be received directly from ASR logic 1804 as described above or may be received as text data from NLP model 1806 .
- Text metadata may include, for example, data identifying, for each word or phrase, parts of speech (syntactic analysis), sentiment analysis, semantic analysis, topic analysis, etc.
- Modeling dynamics includes data representing components of constituent models of NLP model 1806 .
- Such components include machine learning features of NLP model 1806 and other components such as long short-term memory (LSTM) units, gated recurrent units (GRUs), hidden Markov model (HMM), and sequence-to-sequence (seq2seq) translation information.
- LSTM long short-term memory
- GRUs gated recurrent units
- HMM hidden Markov model
- sequence-to-sequence (seq2seq) translation information e.g., NLP metadata
- acoustic model 1808 , visual model 1810 , and descriptive model and analytics 1812 may more accurately model the audiovisual signal.
- Runtime models 1802 include acoustic model 1808 , which analyzes the audio portion of the audiovisual signal to find patterns associated with various health states, e.g., depression. Associations between acoustic patterns in speech and health are in some cases applicable to different languages without retraining. They may also be retrained on data from that language. A of the particular language spoken. Accordingly, acoustic model 1808 analyzes the audiovisual signal in a language-agnostic fashion. In this illustrative embodiment, acoustic model 1808 uses machine learning approaches such as convolutional neural networks (CNN), long short-term memory (LSTM) units, hidden Markov models (HMM), etc. for learning high-level representations and for modeling the temporal dynamics of the audiovisual signals.
- CNN convolutional neural networks
- LSTM long short-term memory
- HMM hidden Markov models
- Acoustic model 1808 stores data representing attributes of the audiovisual signal and machine learning features of acoustic model 1808 as acoustic model metadata and shares that data with NLP model 1806 , visual model 1810 , and descriptive model and analytics 1812 .
- the acoustic model metadata may include, for example, data representing a spectrogram of the audiovisual signal of the patient's response.
- the acoustic model metadata may include both basic features and high-level feature representations of machine learning features. More basic features may include Mel-frequency cepstral coefficients (MFCCs), and various log filter banks, for example, of acoustic model 1808 .
- MFCCs Mel-frequency cepstral coefficients
- High-level feature representations may include, for example, convolutional neural networks (CNNs), autoencoders, variational autoencoders, deep neural networks, and support vector machines of acoustic model 1808 .
- the acoustic model metadata allows NLP model 1806 to, for example, use acoustic analysis of the audiovisual signal to improve sentiment analysis of words and phrases.
- the acoustic model metadata allows visual model 1810 and descriptive model and analytics 1812 to, for example, use acoustic analysis of the audiovisual signal to more accurately model the audiovisual signal.
- Runtime model server logic 504 ( FIG. 18 ) includes visual model 1810 , which infers various health states of the patient from face, gaze and pose behaviors.
- Visual model 1810 may include facial cue modeling, eye/gaze modeling, pose tracking and modeling, etc. These are merely examples.
- Visual model 1810 stores data representing attributes of the audiovisual signal and machine learning features of visual model 1810 as visual model metadata and shares that data with NLP model 1806 , acoustic model 1808 , and descriptive model and analytics 1812 .
- the visual model metadata may include data representing face locations, pose tracking information, and gaze tracking information of the audiovisual signal of the patient's response.
- the visual model metadata may include both basic features and high-level feature representations of machine learning features. More basic features may include image processing features of visual model 1810 . High-level feature representations may include, for example, CNNs, autoencoders, variational autoencoders, deep neural networks, and support vector machines of visual model 1810 .
- the visual model metadata allows descriptive model and analytics 1812 to, for example, use video analysis of the audiovisual signal to improve sentiment analysis of words and phrases. Descriptive model and analytics 1812 may even use the visual model metadata in combination with the acoustic model metadata to estimate the veracity of the patient in speaking words and phrases for more accurate sentiment analysis.
- the visual model metadata allows acoustic model 1808 to, for example, use video analysis of the audiovisual signal to better interpret acoustic signals associated with various gazes, poses, and gestures represented in the video portion of the audiovisual signal.
- Descriptive features or descriptive analytics are interpretable descriptions that may be computed based on features in the speech, language, video, and metadata that convey information about a speaker's speech patterns in a way in which a stakeholder may understand.
- descriptive features may include a speaker sounding nervous or anxious, having a shrill or deep voice, or speaking quickly or slowly. Humans can interpret “features” of voices, such as pitch, rate of speaking, and semantics, in order to mentally determine emotions.
- a descriptive analytics module by applying interpretable labels to speech utterances, based on their features, differs from a machine learning module. Machine learning models also make predictions by analyzing features, but the methods by which machine learning algorithms process the features, and determine representations of those features, differs from how humans interpret them. Thus, labels that machine learning algorithms may “apply” to data, in the context of analyzing features, may not be labels that humans may be able to interpret.
- Descriptive model and analytics 1812 may generate analytics and labels for numerous health states, not just depression. Examples of such labels include emotion, anxiety, how engaged the patient is, patient energy, sentiment, speech rate, and dialogue topics.
- descriptive model and analytics 1812 applies these labels to each word of the patient's response and determines how significant each word is in the patient's response. While the significance of any given word in a spoken response may be inferred from the part of speech, e.g., articles and filler words as relatively insignificant, descriptive model and analytics 1812 infers a word's significance from additional qualities of the word, such as emotion in the manner in which the word is spoken as indicated by acoustic model 1808 .
- Descriptive model and analytics 1812 also analyzes trends over time and uses such trends, at least in part, to normalize analysis of the patient's responses. For example, a given patient might typically speak with less energy than others. Normalizing analysis for this patient might set a lower level of energy as “normal” than would be used for the general population. In addition, a given patient may use certain words more frequently than the general population and use of such words by this patient might not be as notable as such use would be by a different patient. Descriptive model and analytics 1812 may analyze trends in real-time, i.e., while a screening or monitoring conversation is ongoing, and in non-real-time contexts.
- Descriptive model and analytics 1812 stores data representing the speech analysis and trend analysis described above, as well as metadata of constituent models of descriptive model and analytics 1812 , as descriptive model metadata and shares that data with NLP model 1806 , acoustic model 1808 , and visual model 1810 .
- the descriptive model metadata allows NLP model 1806 , acoustic model 1808 , and visual model 1810 to more accurately model the audiovisual signal.
- runtime model server logic 504 estimates a health state of a patient using what the patient says, how the patient says it, and contemporaneous facial expressions, eye expressions, and poses in combination and stores resulting data representing such estimation as results 1820 . Such provides a particularly accurate and effective tool for estimating the patient's health state.
- Runtime model server logic 504 sends results 1820 to I/O logic 604 ( FIG. 6 ) to enable interactive screening or monitoring server logic 502 to respond to the patient's responses, thereby making the screening or monitoring dialogue interactive in the manner described above.
- Runtime model server logic 504 ( FIG. 18 ) also sends results 1820 to screening or monitoring system data store 410 to be included in the history of the subject.
- Model training logic 506 trains the models used by runtime model server logic 504 ( FIG. 18 ).
- Model training logic 506 ( FIG. 19 ) includes runtime models 1802 and ASR logic 1804 and trains runtime models 1802 .
- Model training logic 506 sends the trained models to model repository 416 to make runtime models 1802 , as trained, available to runtime model server logic 504 .
- FIG. 20A provides a more detailed example illustration of the backend screening or monitoring system of the embodiment of FIG. 2 .
- the web server 240 is expanded to illustrate that it includes a collection of functional modules.
- the primary component of the web server 240 includes an input/output (IO) module 2041 for accessing the system via the network infrastructure 250 .
- This IO 2041 enables the collection of response data (in the form of at least speech and video data) and labels from the clients 260 a - n, and the presentation of prompting information (such as a question or topic), and feedback to the clients 260 a - n.
- the prompting materials is driven by the interaction engine 2043 , which is responsive to the needs of the system, user commands and preferences to fashion an interaction that maintains the clients' 260 a - n engagement and generates meaningful response data.
- the interaction engine will be discussed in greater detail below.
- Truthfulness of the patient in answering questions (or other forms of interaction) posed by the screening or monitoring test is critical in assessing the patient's mental state, as is having a system that is approachable and that will be sought out and used by a prospective patient.
- the health screening or monitoring system 200 encourages honesty of the patient in a number of ways. First, a spoken conversation provides the patient with less time to compose a response to a question, or discuss a topic, than a written response may take. This truncated time generally results in a more honest and “raw” answer. Second, the conversation feels, to the patient, more spontaneous and personal and is less annoying than an obviously generic questionnaire, especially when user preferences are factored into the interaction, as will be discussed below.
- the spoken interaction does not induce or exacerbate resentment in the patient for having to answer a questionnaire before seeing a doctor or other clinician.
- the spoken interaction is adapted in progress to be responsive to the patient, reducing the patient's annoyance with the screening or monitoring test and, in some situations, shortening the screening or monitoring test.
- the screening or monitoring test as administered by health screening or monitoring system 200 relies on more than mere verbal components of the interaction. Non-verbal aspects of the interaction are leveraged synergistically with the verbal content to assess depression in the patient. In effect, ‘what is said’ is not nearly as reliably accurate in assessing depression as is ‘how it's said’.
- the final component of the web server 240 is a results and presentation module 2045 which collates the results from the model server(s) 230 and provides then to the clients 260 a - n via the IO 2041 , as well as providing feedback information to the interaction engine 2043 for dynamically adapting the course of the interaction to achieve the system's goals. Additionally, the results and presentation module 2045 additionally supplies filtered results to stakeholders 270 a - n via a stakeholder communication module 2003 .
- the communication module 2003 encompasses a process engine, routing engine and rules engine.
- the rules engine embodies conditional logic that determines what, when and who to send communications to, the process engine embodies clinical and operational protocol logic to pass messages through a communications chain that may be based on serial completion of tasks and the routing engine gives the ability to send any messages to the user's platform of choice (e.g., cellphone, computer, landline, tablet, etc.).
- the process engine embodies clinical and operational protocol logic to pass messages through a communications chain that may be based on serial completion of tasks and the routing engine gives the ability to send any messages to the user's platform of choice (e.g., cellphone, computer, landline, tablet, etc.).
- the filtering and/or alteration of the results by the results and presentation module 2045 is performed when necessary to maintain HIPAA (Health Insurance Portability and Accountability Act of 1996) and other privacy and security regulations and policies such as GDPR and SOC 2 compliance as needed and to present the relevant stakeholder 270 a - n with information of the greatest use.
- HIPAA Health Insurance Portability and Accountability Act of 1996)
- other privacy and security regulations and policies such as GDPR and SOC 2 compliance
- a clinician may desire to receive not only the screening or monitoring classification (e.g., depressed or neurotypical) but additional descriptive features, such as suicidal thoughts, anxiety around another topic, etc.
- an insurance provider may not need or desire many of these additional features, and may only be concerned with a diagnosis/screening or monitoring result.
- a researcher may be provided only aggregated data that is not personally identifiable, in order to avoid transgression of privacy laws and regulations.
- the IO 2041 in addition to connecting to the clients 260 a - n, provides connectivity to the user data 220 and the model server(s) 230 .
- the collected speech and video data (raw audio and video files in some embodiments) are provided by the IO 2041 to the user data 220 , runtime model server(s) 2010 and a training data filter 2001 .
- Label data from the clients 260 a - n is provided to a label data set 2021 in the user data 220 . This may be stored in various databases 2023 . Label data includes not only verified diagnosed patients, but inferred labels collected from particular user attributes or human annotation. Client ID information and logs may likewise be supplied from the IO 2041 to the user data 220 .
- the user data 220 may be further enriched with clinical and social records 210 sourced from any number of third party feeds. This may include social media information obtained from web crawlers, EHR databases from healthcare providers, public health data sources, and the like.
- the training data filter 2001 may consume speech and video data and append label data 2021 to it to generate a training dataset.
- This training dataset is provided to model training server(s) 2030 for the generation of a set of machine learned models.
- the models are stored in a model repository 2050 and are utilized by the runtime model server(s) 2010 to make a determination of the screening or monitoring results, in addition to generating other descriptors for the clients 260 a - n.
- the model repository 2050 together with the model training server(s) 2030 and runtime model server(s) 2010 make up the model server(s) 250 .
- the runtime model server(s) 2010 and model training server(s) 2030 are described in greater detail below in relation to FIGS. 20B and 21 , respectively.
- the runtime model server(s) 2010 is provided in greater detail.
- the server received speech and video inputs that originated from the clients 260 a - n.
- a signal preprocessor and multiplexer 2011 performs conditioning on the inputted data, such as removal of noise or other artifacts in the signal that may cause modeling errors. These signal processing and data preparation tasks include diarization, segmentation and noise reduction for both the speech and video signals. Additionally, metadata may be layered into the speech and video data. This data may be supplied in this preprocessed form to a bus 2014 for modelers 2020 consumption and may also be subjected to any number of third parties, off the shelf Automatic Speech Recognition (ASR) systems 2012 .
- the ASR 2012 output includes a machine readable transcription of the speech portion of the audio data.
- This ASR 2012 output is likewise supplied to the bus 2014 for consumption by later components.
- the signal preprocessor and multiplexer 2011 may be provided with confidence values, such as audio quality (signal quality, length of sample) and transcription confidence (how accurate the transcription is) values 2090 and 2091 .
- FIG. 20B also includes a metadata model 2018 .
- the metadata model may analyze patient data, such as demographic data, medical history data, and patient-provided data.
- modeler 2020 consumes the models, preprocessed audio and visual data, and ASR 2012 output to analyze the clients' 260 a - n responses for the health state in question.
- the present system includes a natural language processing (NLP) model 2015 , acoustic model 2016 , and video model 2017 that all operate in concert to generate classifications for the clients' 260 a - n health state.
- NLP natural language processing
- modelers not only operate in tandem, but consume outputs from one another to refine the model outputs.
- the output for each of these modelers 2020 is provided, individually, to a calibration, confidence, and desired descriptors module 2092 .
- This module calibrates the outputs in order to produce scaled scores, as well as provides confidence measures for the scores.
- the desired descriptors module may assign human-readable labels to scores.
- the output of desired description module 2092 is provided to model weight and fusion engine 2019 .
- This model weight and fusion engine 2019 combines the model outputs into a single consolidated classification for the health state of each client 260 a - n.
- Model weighting may be done using static weights, such as weighting the output of the NLP model 2015 more than either the acoustic model 2016 or video model 2017 outputs. However, more robust and dynamic weighting methodologies may likewise be applied.
- weights for a given model output may, in some embodiments, be modified based upon the confidence level of the classification by the model. For example, if the NLP model 2015 classifies an individual as being not depressed, with a confidence of 0.56 (out of 0.00-1.00), but the acoustic model 2016 renders a depressed classification with a confidence of 0.97, in some cases the weight of a the models' outputs may be weighted such that the acoustic model 2016 is provided a greater weight. In some embodiments, the weight of a given model may be linearly scaled by the confidence level, multiplied by a base weight for the model. In yet other embodiments, model output weights are temporally based.
- the NLP model 2015 may be afforded a greater weight than other models, however, when the user isn't speaking, the video model 2017 may be afforded a greater weight for that time domain.
- the video model 2017 and acoustic model 2016 are independently suggesting the person is being nervous and untruthful (frequent gaze shifting, perspiration increased, pitch modulation upward, increased speech rate, etc.) then the weight of the NLP model 2015 may be minimized, since it is likely the individual is not answering the question truthfully.
- model output fusion and weighting may be combined with features and other user information in a multiplex output module 2051 in order to generate the final results.
- these results are provided back to the user data 220 for storage and potentially as future training materials, and also to the results and presentation module 2045 of the webserver 240 for display, at least in part, to the clients 260 a - n and the stakeholders 270 a - n.
- These results are likewise used by the interaction engine 2043 to adapt the interaction with the client 260 a - n moving forward.
- the model training server(s) 2030 is provided in greater detail. Like the runtime model server(s) 2010 , the model training server(s) 2030 consume a collection of data sources. However, these data sources have been filtered by the training data filter 2001 to provide only data for which label information is known or imputable.
- the model training server additionally takes as inputs audio quality confidence values 2095 (which may include bit rate, noise, and length of the audio signal) and transcription confidence values 2096 . These confidence values may include the same types of data as those of FIG. 20B .
- the filtered social, demographic, and clinical data, speech and video data, and label data are all provided to a preprocessor 2031 for cleaning and normalization of the filtered data sources.
- the processed data is then provided to a bus 2040 for consumption by various trainers 2039 , and also to one or more third party ASR systems 2032 for the generation of ASR outputs, which are likewise supplied to the bus 2040 .
- the signal preprocessor and multiplexer 2011 may be provided with confidence values, such as audio quality (signal quality, length of sample) and transcription confidence (how accurate the transcription is) values 2095 and 2096 .
- the model trainers 2039 consume the processed audio, visual, metadata, and ASR output data in a NLP trainer 2033 , an acoustic trainer 2034 , a video trainer 2035 , and a metadata trainer 2036 .
- the trained models are provided, individually, to a calibration, confidence, and desired descriptors module 2097 . This module calibrates the outputs in order to produce scaled scores, as well as provides confidence measures for the scores.
- the desired descriptors module may assign human-readable labels to scores.
- the trained and calibrated models are provided to a fused model trainer 2037 for combining the trained models into a trained combinational model. Each individual model and the combined model may be stored in the model repository 2050 . Additionally and optionally, the trained models may be provided to a personalizer 2038 , which leverages metadata (such as demographic information and data collated from social media streams) to tailor the models specifically for a given client 260 a - n.
- a particular model xo may be generated for classifying acoustic signals as either representing someone who is depressed, or not.
- the tenor, pitch and cadence of an audio input may vary significantly between a younger individual versus and elderly individual.
- specific models are developed based upon if the patient being screened is younger or elderly (models xy and xe respectively).
- women generally have variances in their acoustic signals as compared to men, suggesting that yet another set of acoustic models are needed (models xf and xm respectively).
- the metadata for an individual provides insight into that person's age, gender, ethnicity, educational background, accent/region they grew up in, etc. this information may be utilized to select the most appropriate model to use in future interactions with this given patient, and may be likewise used to train models that apply to individuals that share similar attributes.
- the personalizer 2038 may personalize a model, or set of models, for a particular individual based upon their past history and label data known for the individual. This activity is more computationally expensive than relying upon population wide, or segment wide, modeling, but produces more accurate and granular results. All personalized models are provided from the personalizer 2038 to the model repository 2050 for retention until needed for patient assessment.
- a client 260 a - n is initially identified, and when able, a personalized model may be employed for their screening or monitoring. If not available, but metadata is known for the individual, the most specific model for the most specific segment is employed in their screening or monitoring. If no metadata is available, then the model selected is the generic, population-wide model. Utilizing such a tiered modeling structure, the more information that is known regarding the client 260 a - n allows for more specific and accurate models to be employed. Thus, for each client 260 a - n, the ‘best’ model is leveraged given the data available for them.
- Assessment test administrator 2202 of real-time system 302 conducts an interactive conversation with the patient through patient device 312 .
- the responsive audiovisual signal of the patient is received by real-time system 302 from patient device 312 .
- the exchange of information between real-time system 302 and patient device 312 may be through a purpose-built app executing in patient device 112 or through a conventional video call between patient device 312 and video call logic of assessment test administrator 2202 . While this illustrative embodiment uses an audiovisual signal to assess the state of the patient, it should be appreciated that, in alternative embodiments, an audio-only signal may be used with good results. In such alternative embodiments, an ordinary, audio-only telephone conversation may serve as the vehicle for assessment by assessment test administrator 2202 .
- assessment test administrator 2202 uses composite model 2204 to assess the state of the patient in real-time, i.e., as the spoken conversation transpires.
- Such intermediate assessment is used, in a manner described more completely below, to control the conversation, making the conversation more responsive, and therefore more engaging, to the patient and to help make the conversation as brief as possible while maintaining the accuracy of the final assessment.
- Modeling system 304 receives collected patient data 2206 that includes the audiovisual signal of the patient during the assessment test.
- the assessment test involves patient device 312 a purpose-built app executing in patient device 312
- modeling system 104 may receive collected patient data 2206 from patient device 312 .
- modeling system 304 receives collected patient data 2206 from real-time system 302 .
- Modeling system 304 retrieves clinical data 2220 from clinical data server 306 .
- Clinical data 2220 includes generally any available clinical data related to the patient, other patients assessed by assessment test administrator 2202 , and the general public that may be helpful in training any of the various models described herein.
- Preprocessing 2208 conditions any audiovisual data for optimum analysis. Having a high-quality signal to start is very helpful in providing accurate analysis. Preprocessing 2208 is shown within modeling system 304 . In alternative embodiments, preprocessing is included in real-time system 302 to improve accuracy in application of composite model 204 .
- Speech recognition 2210 processes speech represented in the audiovisual data after preprocessing 2208 , including automatic speech recognition (ASR).
- ASR may be conventional.
- Language model training 2212 uses the results of speech recognition 2210 to train language models 214 .
- Acoustic model training 2216 uses the audiovisual data after preprocessing 2208 to train acoustic models 2218 .
- Visual model training 2224 uses the audiovisual data after preprocessing 2208 to train visual models 2226 .
- language model training 2212 , acoustic model training 2216 , and visual model training 2224 train language models 2214 , acoustic models 2218 , and visual models 2226 , respectively, specifically for the subject patient. Training may also use clinical data 2222 for patients that share one or more phenotypes with the subject patient.
- composite model builder 2222 uses language models 2214 , acoustic models 2218 , and visual models 2226 , in combination with clinical data 2220 , to combine language, acoustic, and visual models into composite model 2204 .
- assessment test administrator 2202 uses composite model 2204 in real time to assess the current state of the subject patient and to accordingly make the spoken conversation responsive to the subject patient as described more completely below.
- assessment test administrator 2202 administers a depression assessment test to the subject patient by conducting an interactive spoken conversation with the subject patient through patient device 312 .
- FIG. 23A a general block diagram for one example substantiation of the acoustic model 2016 is provided.
- the speech and video data is provided to a high level feature representor 2320 that operates in concert with a temporal dynamics modeler 2330 . Influencing the operation of these components is a model conditioner 2340 that consumed features from the descriptive features 2018 , results generated from the speech and video models 2015 and 2017 , respectively, and clinical and social data.
- the high level feature representor 2320 and temporal dynamics modeler 2330 also receive raw and higher level feature extractor 2310 outputs, that identify features within the incoming acoustic signals, and feeds them to the models.
- the high level feature representor 2320 and temporal dynamics modeler 2330 generate the acoustic model results, which may be fused into a final result that classifies the health state of the individual, and may also be consumed by the other models for conditioning purposes.
- the high level feature representor 2320 includes leveraging existing models for frequency, pitch, amplitude and other acoustic features that provide valuable insights into feature classification.
- a number of off-the-shelf “black box” algorithms accept acoustic signal inputs and provide a classification of an emotional state with an accompanying degree of accuracy. For example, emotions such as sadness, happiness, anger and surprise are already able to be identified in acoustic samples using existing solutions. Additional emotions such as envy, nervousness, excited-ness, mirth, fear, disgust, trust and anticipation will also be leveraged as they are developed. However, the present systems and methods go further by matching these emotions, strength of the emotion, and confidence in the emotion, to patterns of emotional profiles that signify a particular mental health state. For example, pattern recognition may be trained, based upon patients that are known to be suffering from depression, to identify the emotional state of a respondent that is indicative of depression.
- FIG. 23B shows an embodiment of FIG. 23A including an acoustic modeling block 2341 .
- the acoustic modeling block 2341 includes a number of acoustic models.
- the acoustic models may be separate models that use machine learning algorithms.
- the illustrated listing of models shown in FIG. 23B is not necessarily an exhaustive listing of possible models. These models may include a combination of existing third party models and internally derived models.
- FIG. 23B includes acoustic embedding model 2342 , spectral temporal model 2343 , acoustic effect model 2345 , speaker personality model 2346 , intonation model 2347 , temporal/speaking rate model 2348 , pronunciation models 2349 , and fluency models 2361 .
- the machine learning algorithms used by these models may include neural networks, deep neural networks, support vector machines, decision trees, hidden Markov models, and Gaussian mixture models.
- FIG. 23C shows a score calibration and confidence module 2370 .
- the score calibration and confidence module 2370 includes a score calibration module 2371 and a performance estimation module 2374 .
- the score calibration module 2371 includes a classification module 2372 and a mapping module 2373 .
- the score calibration and confidence module 2370 may accept as inputs a raw score, produced by a machine learning algorithm, such as a neural network or deep learning network, that may be analyzing audiovisual data.
- the score calibration and confidence module 2370 may also accept a set of labels, with which to classify data.
- the labels may be provided by clinicians.
- the classification module 2371 may apply one or more labels to the raw score, based on the value of the score. For example, if the score is a probability near 1, the classification module 2371 may apply a “severe” label to the score.
- the classification module 2371 may apply labels based on criteria set by clinicians, or may algorithmically determine labels for scores, e.g., using a machine learning algorithm.
- the mapping module 2372 may scale the raw score to fit within a range of numbers, such as 120-180 or 0-700. The classification module 2371 may operate before or after the mapping module 2372 .
- the score calibration and confidence module 2370 may determine a confidence measure 2376 by estimating a performance for the labeled, scaled score.
- the performance may be estimated by analyzing features of the collected data, such as duration, sound quality, accent, and other features.
- the estimated performance may be a weighted parameter that is applied to the score. This weighted parameter may comprise the score confidence.
- FIG. 24 To provide greater context, and clarification around the acoustic model's 2016 operation, a highly simplified and single substantiation of one possible version of the high level feature representor 2320 is provided in relation to FIG. 24 . It should be noted that this example is provided for illustrative purposes only, and is not intended to limit the embodiments of the high level feature representor 2320 in any way.
- the raw and high level feature extractor 2310 takes the acoustic data signal and converts it into a spectrogram image 2321 .
- FIG. 55 provides an example image of such a spectrogram 5500 of a human speaking.
- a spectrogram of this sort provides information along one axis regarding the audio signal frequency, amplitude of the signal (here presented in terms of intensity/how dark the frequency is labeled), and time.
- Such a spectrogram 5500 is considered a raw feature of the acoustic signal, as would pitch, cadence, energy level, etc.
- a spectrogram sampler 2323 selects a portion of the image at a constant timeframe, for example between time zero and 10 seconds is one standard sample size, but other sample time lengths are possible.
- FIG. 56 provides an example of a sampled portion 5502 of the spectrogram 5600 .
- This image data the then represented as an MxN matrix (x), in this particular non-limiting example.
- This approximate value is an abstraction of the mental state being tested, dependent upon the input equation.
- the system may have previously determined threshold, or cutoff values 2322 , for the variables which indicate if the response is indicative of the mental state or not. These cutoff values are trained for by analyzing responses from individuals for which the mental state is already known.
- Equation determination may leverage deep learning techniques, as previously discussed. This may include recurrent neural networks 2324 and/or convolutional neural networks 2325 . In some cases, long short-term memory (LSTM) or gated recurrent unit (GRU) may be employed, for example. In this manner, depression, or alternate mental states may be directly analyzed for in the acoustic portion of the response. This, in combination with using off-the-shelf emotion detection ‘black box’ systems, with pattern recognition, may provide a robust classification by a classifier 2326 of the mental state based upon the acoustic signal which, in this example, is provided as acoustic analysis output 2327 .
- LSTM long short-term memory
- GRU gated recurrent unit
- this example of using a spectrogram as a feature for analysis is but one of many possible substantiations of the high level feature representor's 2320 activity.
- Other features and mechanisms for processing these features may likewise be analyzed. For example pitch levels, isolated breathing patterns, total energy of the acoustic signal, or the like may all be subject to similar temporally based analysis to classify the feature as indicative of a health condition.
- This system consumes the output from the ASR system 2012 and performs post-processing on it via an ASR output post processor 2510 .
- This post processing includes reconciling the ASR outputs (when multiple outputs are present).
- Post processing may likewise include n-gram generation, parsing activities and the like.
- the results from the video and acoustic models 2016 and 2017 respectively, as well as clinical and social data are consumed by a model conditioner 2540 for altering the functioning of the language models 2550 .
- the language models 2550 operate in concert with a temporal dynamics modeler 2520 to generate the NLP model results.
- the language models 2550 include a number of separate models.
- the illustrated listing of models shown in FIG. 25 is not necessarily an exhaustive listing of possible models. These models may include a combination of existing third party models and internally derived models.
- Language models may use standard machine learning or deep learning algorithms, as well as language modeling algorithms such as n-grams.
- sentiment model 2551 is a readily available third party model that uses either original text samples or spoken samples that have been transcribed by a human or machine speech recognizer, to output to determine if the sentiment of the discussion is generally positive or negative. In general, a positive sentiment is inversely correlated with depression, whereas a negative sentiment is correlated with a depression classification.
- Statistical language model 2552 utilizes n-grams and pattern recognition within the ASR output to statistically match patterns and n-gram frequency to known indicators of depression. For example, particular sequences of words may be statistically indicative of depression. Likewise, particular vocabulary and word types used by a speaker may indicate depression or not having depression.
- a topic model 2553 identifies types of topics within the ASR output. Particular topics, such as death, suicide, hopelessness and worth (or lack thereof) may all be positively correlated with a classification of depression. Additionally, there is a latent negative correlation between activity (signified by verb usage) and depression. Thus, ASR outputs that are high in verb usage may indicate that the client 260 a - n is not depressed. Furthermore, topic modeling based on the known question or prompt given the subject, can produce better performance via using pre-trained topic-specific models for processing the answer for mental health state.
- Syntactic model 2554 identifies situations where the focus of the ASR output is internal versus external. The usage of terms like ‘I’ and ‘me’ are indicative of internal focus, while terms such as ‘you’ and ‘they’ are indicative of a less internalized focus. More internal focus has been identified as generally correlated with an increased chance of depression. Syntactic model 2554 may additionally look at speech complexity. Depressed individuals tend to have a reduction in sentence complexity. Additionally, energy levels, indicated by language that is strong or polarized, is negatively correlated with depression. Thus, someone with very simple, sentences focused internally, and with low energy descriptive language would indicate a depressed classification.
- Embedding and clustering model 2556 maps words to prototypical words or word categories. For example, the terms “kitten”, “feline” and “kitty” may all be mapped to the term “cat”. Unlike the other models, the embedding and clustering model 2556 does not generate a direct indication of whether the patient is depressed or not, rather this model's output is consumed by the other language models 2550 .
- a dialogue and discourse model 2557 identifies latency and usage of spacer words (“like”, “umm”, etc.) Additionally the dialogue and discourse model 2557 identifies dialogue acts such as questions versus statements.
- An emotion or affect model 2558 provides a score, typically a posterior probability over a set of predetermined emotions (for example happy, sad) that describes how well the sample matches pre-trained models for each of the said emotions. These probabilities can then be used in various forms as input to the mental health state models, and/or in a transfer learning set up.
- a speaker personality model 2559 provides a score, typically a posterior probability over a set of predetermined speaker personality traits (for example agreeableness, openness) that describes how well the sample matches pre-trained models for each of the said traits. These probabilities can then be used in various forms as input to the mental health state models, and/or in a transfer learning set up.
- the non-verbal model 2561 using ASR events may provide a score based on non-lexical speech utterances of patients, which may regardless be indicative of mental state. These utterances may be laughter, sighs, or deep breaths, which may be picked up and transcribed by an ASR.
- the text quality confidence module 2560 determines a confidence measure for the output of the ASR output post processor 2510 .
- the confidence measure may be determined based on text metadata (demographic information about the patient, environmental conditions, method of recording, etc.) as well as context (e.g., length of speech sample, question asked).
- each of these models may impact one another and influence the results and/or how these results are classified. For example, a low energy language response typically is indicative of depression, whereas high energy verbiage would negatively correlate with depression.
- FIG. 26 again we see a collection of feature extractors 2610 that consume the video data.
- a face bounder 2611 which recognizes the edges of a person's face, and extract this region of the image for processing.
- facial features provide significant input on how an individual is feeling. Sadness, exhaustion, worry, and the like, are all associated with a depressive state, whereas jubilation, excitation, and mirth are all negatively correlated with depression.
- the region around the eyes may be analyzed separately from regions around the mouth. This allows greater emphasis to be placed upon differing image regions based upon context.
- the region around the mouth generally provides a large amount of information regarding an individual's mood, however when a person is speaking, this data is more likely to be inaccurate due to movements associated with the speech formation.
- the acoustic and language models may provide insight as to when the user is speaking in order to reduce reliance on the analysis of a mouth region extraction.
- the region around the eyes is generally very expressive when someone is speaking, so the reliance upon this feature is relied upon more during times when the individual is speaking.
- a pose tracker 2612 is capable or looking at larger body movements or positions.
- a slouched position indicates unease, sadness, and other features that indicate depression.
- the presence of excessing fidgeting, or conversely unusual stillness likewise are indicative of depression.
- Moderate movement and fidgeting is not associated with depression.
- Upright posture and relaxed movement likewise are inversely related to a depressive classification.
- even the direction that the individual sits or stands is an indicator of depression.
- a user who directly faces the camera is less likely to be depressed.
- an individual that positions their body oblique to the camera, or otherwise covers themselves (by crossing their arms for example) is more likely to be depressed.
- a gaze tracker 2613 is particularly useful in determining where the user is looking, and when (in response to what stimulus) the person's gaze shifts. Looking at the screen or camera of the client device 260 a - n indicates engagement, confidence and honesty—all hallmarks of a non-depressed state. Looking down constantly, on the other hand, is suggestive of depression. Constantly shifting gaze indicates nervousness and dishonesty. Such feedback may be used by the NLP model 2015 to reduce the value of analysis based on semantics during this time period as the individual is more likely to be hedging their answers and/or outright lying. This is particularly true if the gaze pattern alters dramatically in response to a stimulus.
- the image processing features extractor 2614 may take the form of any number of specific feature extractions, such as emotion identifiers, speaking identifiers (from the video as opposed to the auditory data), and the above disclosed specific bounder extractors (region around the eyes for example). All of the extracted features are provided to a high-level feature representor 2620 and classifier and/or regressor 2630 that operate in tandem to generate the video model results. As with the other models, the video model 2017 is influenced by the outputs of the NLP model 2015 and the acoustic model 2016 , as well as clinical and social data. The model conditioner 2640 utilizes this information to modify what analysis is performed, or the weight afforded to any specific findings.
- specific feature extractions such as emotion identifiers, speaking identifiers (from the video as opposed to the auditory data), and the above disclosed specific bounder extractors (region around the eyes for example). All of the extracted features are provided to a high-level feature representor 2620 and classifier and/or regressor 2630 that
- the descriptive features module 2018 of FIG. 27 includes direct measurements 2710 and model outputs 2720 that result from the analysis of the speech and video data.
- the descriptive features module may not be included in either the runtime model servers 2010 or model training servers 2030 . Instead, descriptive features may be incorporated in the acoustic and NLP models. Disclosed in the description of FIG. 27 are examples of descriptive features.
- Many different measurements 2710 and model outputs 2720 are collected by the descriptive features 2018 module. For example, measurements include at least speech rate analyzer 2711 which tracks a speaker's words per minute. Faster speech generally indicates excitement, energy and/or nervousness.
- a temporal analyzer 2715 determines the time of the day, week and year, in order to provide context around the interaction. For example, people are generally more depressed in the winter months, around particular holidays, and at certain days of the week and times of the day. All this timing information is usable to alter the interaction (by providing topicality) or by enabling classification thresholds to be marginally altered to reflect these trends.
- the model outputs 2720 may include a topic analyzer 2721 , various emotion analyzers 2723 (anxiety, joy, sadness, etc.), sentiment analyzer 2725 , engagement analyzer 2727 , and arousal analyzer 2729 . Some of these analyzers may function similarly in the other models; for example the NLP model 2015 already includes a sentiment model 2551 , however the sentiment analyzer 2725 in the descriptive features 2018 module operates independently from the other models, and includes different input variables, even if the output is similar.
- the engagement analyzer 2727 operates to determine how engaged a client 260 a - n is in the interaction. High levels of engagement tend to indicate honesty and eagerness. Arousal analyzer 2729 provides insights into how energetic or lethargic the user is.
- a key feature of the descriptive features 2018 module is that each of these features, whether measured or the result of model outputs, is normalized by the individual by a normalizer 2730 . For example, some people just speak faster than others, and a higher word per minute measurement for this individual versus another person may not indicate anything unusual. The degree of any of these features is adjusted for the baseline level of the particular individual by the normalizer 2730 . Obviously, the normalizer 2730 operates more accurately the more data that is collected for any given individual.
- a first time interaction with a client 260 a - n cannot be effectively normalized immediately, however as the interaction progresses, the ability to determine a baseline for this person's speech rate, energy levels, engagement, general sentiment/demeanor, etc. may be more readily ascertained using standard statistical analysis of variation of these features over time. This becomes especially true after more than one interaction with any given individual.
- the system may identify trends in these features for the individual by analysis by a trend tracker 2740 .
- the trend tracker splits the interaction by time domains and looks for changes in values between the various time periods. Statistically significant changes, and especially changes that continue over multiple time periods, are identified as trends for the feature for this individual.
- the features, both in raw and normalized form, and any trends are all output as the descriptive results.
- the client devices may be capable of collecting biometric data (temperature, skin chemistry data, pulse rate, movement data, etc.) from the individual during the interaction. Models focused upon these inputs may be leveraged by the runtime model server(s) 2010 to arrive at determinations based upon this data.
- the disclosed systems may identify chemical markers in the skin (cortisol for example), perspiration, temperature shifts (e.g. flushing), and changes in heart rate, etc. for diagnostic purposes.
- FIG. 8 A process flow diagram featuring the components of the interaction engine 2043 is featured in FIG. 8 .
- the interaction engine 2043 dictates the interactions between the web server(s) 240 and the clients 260 a - n. These interactions, as noted previously may consist of a question and answer session, with a set number and order or questions. In such embodiments, this type of assessment is virtually an automated version of what has previously been leveraged for depression diagnosis, except with audio and video capture for improved screening or monitoring accuracy. Such question and answer may be done with text questions displayed on the client device, or through a verbal recording of a question.
- This interaction engine 2043 includes the ability to take a number of actions, including different prompts, questions, and other interactions. These are stored in a question and action bank 2810 .
- the interaction engine 2043 also includes a history and state machine 2820 which tracks what has already occurred in the interaction, and the current state of the interaction.
- the state and history information, database of possible questions and actions, and additional data is consumed by an interaction modeler 2830 for determining next steps in the interaction.
- the other information consumed consists of user data, clinical data and social data for the client being interacted with, as well as model results, NLP outputs and descriptive feature results.
- the user data, clinical data and social media data are all consumed by a user preference analyzer 2832 for uncovering the preferences of a user.
- appealing to the user is one of the large hurdles to successful screening or monitoring. If a user doesn't want to use the system they will not engage it in the first place, or may terminate the interaction prematurely. Alternatively, an unpleasant interaction may cause the user to be less honest and open with the system. Not being able to properly screen individuals for depression, or health states generally, is a serious problem, as these individuals are likely to continue struggling with their disease without assistance, or even worse die prematurely. Thus, having a high degree of engagement with a user may literally save lives.
- the interactions are tailored in a manner that appeals to the user's interests and desires.
- Topics identified within social media feeds are incorporated into the interaction to pique interest of the user.
- Collected preference data from the user modulates the interaction to be more user friendly, and particular needs or limitations of the user revealed in clinical data are likewise leveraged to make the interaction experience user-friendly. For example, if the clinical data includes information that the user experiences hearing loss, the volume of the interaction may be proportionally increased to make the interaction easier. Likewise, if the user indicates their preferred language is Spanish, the system may automatically administer the interaction in this language.
- the descriptive features and model results are used by a user response analyzer 2831 to determine if the user has answered the question (when the interaction is in a question-answer format), or when sufficient data has been collected to generate an appropriate classification if the interaction is more of a ‘free-form’ conversation, or even a monologue by the client about a topic of interest.
- a navigation module 2834 receives NLP outputs and semantically analyzes the NLP results for command language in near real time.
- commands may include statements such as “Can you repeat that?”, “Please speak up”, “I don't want to talk about that”, etc. These types of ‘command’ phrases indicate to the system that an immediate action is being requested by the user.
- Output from each of the navigation module 2834 , user response analyzer 2831 and user preference analyzer 2832 are provided to an action generator 2833 , in addition to access to the question and adaptive action bank 2810 and history and state machine 2820 .
- the action generator 2833 applies a rule based model to determine which action within the question and adaptive action bank 2810 is appropriate.
- a machine learned model is applied in lieu of a rule based decision model.
- the customized action is likewise passed back to the history and state machine 2820 so that the current state, and past actions may be properly logged.
- Customized actions may include, for example, asking a specific question, prompting a topic, switching to another voice or language, ending the interaction, altering the loudness of the interaction, altering speech rates, font sizes and colors, and the like.
- the clinical and social data for the clients are collated and stored within the data store (at step 2910 ). This information may be gathered from social media platforms utilizing crawlers or similar vehicles. Clinical data may be collected from health networks, physicians, insurance companies or the like. In some embodiments, the health screening or monitoring system 2000 may be deployed as an extension of the care provider, which allows the sharing of such clinical data with reduced concerns with violation of privacy laws (such as HIPAA).
- HIPAA privacy laws
- Clinical data may include electronic health records, physician notes, medications, diagnoses and the like.
- the process may require that models are available to analyze a client's interaction.
- Initial datasets that include labeling data confirmed or imputed diagnoses of depression
- Such training may also include personalization of models when additional metadata is available.
- FIG. 30 provides a greater detailed illustration of an example process for such model training.
- label data is received (at 3010 ).
- Labels include a confirmed diagnosis of depression (or other health condition being screened for).
- situations where the label may be imputed or otherwise estimated are used to augment the training data sets.
- Imputed label data is received by a manual review of a medical record and/or interaction record with a given client. For example, in prediction mode, when the label is unknown, it is possible to decide whether it is possible to estimate a label for a data point given other information such as patient records, system predictions, clinically-validated surveys and questionnaires, and other clinical data. Due to the relative rarity of label data sets, and the need for large numbers of training samples to generate accurate models, it is often important that the label data includes not just confirmed cases of depression, but also these estimated labels.
- the process includes receiving filtered data (at 3020 ). This data is filtered so that only data for which labels are known (or estimated) is used.
- each of the models is trained.
- Such training includes training of the NLP model (at 3030 ), the acoustic model (at 3040 ) the video model (at 3050 ) and the descriptive features (at 3060 ). It should be noted that these training processes occur in any order, or are trained in parallel.
- the parallel training includes generating cross dependencies between the various models. These cross dependencies are one of the critical features that render the presently disclosed systems and methods uniquely capable of rendering improved and highly accurate classifications for a health condition.
- the resulting trained models are fused, or aggregated, and the final fused trained model may be stored (at 3070 ).
- the models (both individual and fused models) are stored in a model repository. However, it is also desirable to generate model variants that are customized to different population groups or even specific individuals (at 3080 ).
- training data is received from a known individual. This individual is identified as a black woman in her seventies, in this example. This training data is then used to train for models specific to African American individuals, African American women, women, elderly people, elderly women, elderly African American people, and elderly African American women. Thus, this single piece of training data is used to generate seven different models, each with slightly different scope and level of granularity. In situations where age is further divided out, this number of models being trained off of this data is increased even further (e.g., adult women, women over 50, women over 70, individuals over 70, etc.). The models are then trained on this segment-by-segment basis (at 3084 ).
- the customized models are annotated by which segment(s) they are applicable to (at 3085 ), allowing for easy retrieval when a new response is received for classification where information about the individual is known, and may be utilized to select the most appropriate/tailored model for this person.
- the customized models are also stored in the model repository, along with the original models and fused models (at 3090 ). It should be noted that while model customization generally increases classification accuracy, any such accuracy gains are jeopardized if a low number of training datasets are available for the models.
- the system tracks the number of training data sets that are used to train any given customized model, and only models with sufficiently large enough training sets are labeled as ‘active’ within the model repository. Active models are capable of being used by the runtime model server(s) 2010 for processing newly received response data. Inactive models are merely stored until sufficient data has been collected to properly train these models, at which time they are updated as being active.
- the process may engage with an interaction with a client (at 2930 ).
- This interaction may consist of a question and answer style format, a free-flowing conversation, or even a topic prompt and the client providing a monologue style input.
- FIG. 32 provides an example of this interaction process.
- the system needs to be aware of the current state of the interaction (at 3210 ) as well as the historical action that have been taken in the interaction.
- a state machine and log of prior actions provides this context.
- the process also receives user, clinical and social data (at 3220 ).
- This data is used to extract user preference information (at 3230 ).
- preferences may be explicitly directed in the user data, such as language preferences, topic of interest, or the like.
- these preferences are distilled from the clinical and social data.
- the social data provides a wealth of information regarding the topics of interest for the user, and clinical data provides insight into any accessibility issues, or the like.
- the model results are received (at 3240 ), which are used to analyze the user's responses (at 3250 ) and make decisions regarding the adequacy of the data that has already been collected. For example, if it is determined via the model results that there is not yet a clear classification, the interaction will be focused on collecting more data moving forward. Alternatively, if sufficient data has been collected to render a confident classification, the interaction may instead be focused on a resolution. Additionally, the interaction management will sometimes receive direct command statements/navigational commands (at 3260 ) from the user. These include actions such as repeating the last dialogue exchange, increasing or decreasing the volume, rephrasing a question, a request for more time, a request to skip a topic, and the like.
- the action is selected from the question and adaptive action bank responsive to the current state (and prior history of the interaction) as well as any commands, preferences, and results already received. This may be completed using a rule based engine, in some embodiments. For example, direct navigational commands may take precedence over alternative actions, but barring a command statement by the user, the model responses may be checked against the current state to determine if the state objective has been met. If so, an action is selected from the repository that meets another objective that has not occurred in the history of the interaction. This action is also modified based on preferences, when possible. Alternatively, the action selection is based on a machine learned model (as opposed to a rule based system).
- the customized action is used to manage the interaction with the client, and also is used to update the current state and historical state activity (at 3280 ).
- the process checks if the goals are met, and if the interaction should be concluded (at 3290 ). If not, then the entire process may be repeated for the new state and historical information, as well as any newly received response data, navigational commands, etc.
- the client response data is collected (at 2940 ).
- This data includes video/visual information as well as speech/audio information captured by the client device's camera(s) and microphone(s), respectively.
- the collected data may likewise include biometric results via haptic interfaces or the like.
- the health state is then classified using this collected response data (at 2950 ).
- FIG. 33 provides a greater detail of the example process for classification.
- the models are initially retrieved (at 3310 ) from the model repository.
- the user data, social data, clinical data and speech and visual data are all provided to the runtime model server(s) for processing (at 3330 ).
- the inclusion of the clinical and/or social data sets the present screening or monitoring methodologies apart from prior screening or monitoring methods.
- This data is preprocessed to remove artifacts, noise and the like.
- the preprocessed data is also multiplexed into (at 3330 ).
- the preprocessed and multiplexed data is supplied to the models for analysis, as well as to third party ASR systems (at 3340 ).
- the ASR output may be consolidated (when multiple ASR systems are employed in concert), and the resulting machine readable speech data is also provided to the models.
- the data is then processed by the NLP model (at 3350 a), the acoustic model (at 3350 b), the video model (at 3350 c) and for descriptive features (at 3350 d).
- Each of the models operates in parallel, with results from any given model being fed to the others to condition their operations.
- FIG. 34 describes the process of model conditioning in greater detail.
- Model conditioning essentially includes three sub-processes operating in parallel, or otherwise interleaved. These include the configuration of the NLP model using the results of the acoustic model and video model, in addition to the descriptive features (at 3371 ), the configuration of the acoustic model using the results of the NLP model and video model, in addition to the descriptive features (at 3372 ), and configuration of the video model using the results of the acoustic model and NLP model, in addition to the descriptive features (at 3373 ).
- this conditioning is not a clearly ordered process, as intermediate results from the acoustic model for example may be used to condition the NLP model, the output of which may influence the video model, which then in turn conditions the acoustic model, requiring the NLP model to be conditioned based upon updated acoustic model results.
- This may lead to looped computing processes, wherein each iteration the results are refined to be a little more accurate than the previous iteration.
- Artificial cutoffs are imposed in such computational loops to avoid infinite cycling and breakdown of the system due to resource drain. These cutoffs are based upon number of loop cycles, or upon the degree of change in a value between one loop cycle and the next. Over time, the results from one loop cycle to the next become increasingly closer to one another. At some point additional looping cycles are not desired due to the diminishing returns to the model accuracy for the processing resources spent.
- This kind of conditioning is when the NLP model determines that the user is not speaking. This result is used by the video model to process the individuals facial features based upon mouth bounding and eye bounding. However, when the user is speaking, the video model uses this result to alter the model for emotional recognition to rely less upon the mouth regions of the user and rather rely upon the eye regions of the user's face.
- each model is then combined (fused) by weighting the classification results by the time domains (at 3380 ).
- This sub process is described in greater detail in relation to FIG. 35 .
- one model is relied upon more heavily than another model due to the classification confidence, or based upon events in the response. The clearest example of this is that if there is a period of time in which the user is not speaking, then the NLP model classification for this time period should be minimized, whereas the weights for video modeling and acoustic modeling should be afforded a much larger weight.
- the odd model's classification may also be weighted lower than the other models accordingly.
- this weighting process involves starting with a base weight for each model (at 3381 ).
- the response is then divided up into discrete time segments (at 3382 ).
- the length of these time segments is configurable, and in one embodiment, they are set to a three second value, as most spoken concepts are formed in this length of time.
- the base weights for each of the models are then modified based upon model confidence levels, for each time period (at 3383 ). For example, if the NLP model is classified as being 96% confident during the first six seconds, but only 80% confident in the following twelve seconds, a higher weight will be applied to the first two time periods, and a lower weight for the following four time periods.
- the system also determines when the user is not speaking, generally by relying upon the ASR outputs (at 3384 ). During these periods the NLP model is not going to be useful in determining the user's classification, and as such the NLP model weights are reduced for these time periods (at 3385 ). The degree of reduction may differ based upon configuration, but in some embodiments, the NLP is afforded no weight for periods when the user is not speaking.
- periods where the patient exhibits voice-based biomarkers associated with being dishonest may also be identified, based upon features and conclusions from the video and acoustic models (at 3386 ). Excessive fidgeting, shifting gaze, higher pitch and mumbling may all be correlated with dishonesty, and when multiple features are simultaneously present, the system flags these periods of the interaction as being suspect. During such time periods the NLP model weights is again reduced (at 3387 ), but only marginally. Even when a user is not being entirely honest, there is still beneficial information contained in the words they speak, especially for depression diagnosis.
- the system After all the weight adjustments have been made, the system performs a weighted average, over the entire response time period, of the models' classification results (at 3388 ). The final result of this condensation of the classifications over time and across the different component models results in the fused model output.
- this fused model output generates a final classification (at 3390 ) for the interaction.
- This classification, model results, and features are then output in aggregate or in part (at 3399 ).
- these results are then presented to the client and other interested stakeholders (at 2960 ). This may include selecting which results any given entity should receive. For example, a client may be provided only the classification results, whereas a physician for the client will receive features relating to mood, topics of concern, indications of self-harm or suicidal thoughts, and the like. In contrast, an insurance company will receive the classification results, and potentially a sampling of the clinical data as it pertains to the individual's risk factors.
- FIG. 36 one example substantiation of an acoustic modeling process 3350 b is presented in greater detail. It should be noted, that despite the enhanced detail in this example process, this is still a significant simplification of but one of the analysis methodologies, and is intended purely as an illustrative process for the sake of clarity, and does not limit the analyses that are performed on the response data.
- a variable cutoff value is determined from the training datasets (at 3605 ).
- the acoustic signal that is received, in this particular analysis, is converted into a spectrogram image (at 3610 ), which provides information on the frequency of the audio signal and the amplitude at each of these frequencies. This image also tracks these over time.
- a sample of the spectrogram image is taken that corresponds to a set length of time (at 3615 ). In some cases, this may be a ten second sample of the spectrogram data.
- the image is converted into a matrix.
- This matrix is used in an equation to represent a higher order feature.
- the equation is developed from the training data utilizing machine learning techniques.
- the equation includes unknown variables, in addition to the input matrix of the high order feature (here the spectrogram image sample). These unknown variables are multiplied, divided, added or subtracted from the feature matrix (or any combination thereof).
- the solution to the equation is also known, resulting in the need to randomly select values for the unknown variables (at 3620 ) in an attempt to solve the equation (at 3630 ) and get a solution that is similar to the known solution.
- the difference between the solved equation values is compared to the known solution value in order to calculate the error (at 3630 ). This process is repeated thousands or even millions of times until a close approximation of the correct variable values are found, as determined by a sufficiently low error calculation (at 3635 ). Once these sufficiently accurate values are found, they are compared against the cutoff values that were originally determined from the training data (at 3640 ). If the values are above or below the cutoffs, this indicates the existence or absence of the classification, based on the equation utilized. In this manner the classification for the spectrogram analysis may be determined (at 3645 ), which may be subsequently output (at 3650 ) for incorporation with the other model results.
- Modeling system logic 5320 includes speech recognition 2210 ( FIG. 22 ), which is shown in greater detail in ( FIG. 37 ). Speech recognition is specific to the particular language of the speech. Accordingly, speech recognition 2210 includes language-specific speech recognition 3702 , which in turn includes a number of language-specific speech recognition engines 3706 A-Z. The particular languages of language-specific speech recognition engines 3706 A-Z shown in ( FIG. 14 ) 7 ) are merely illustrative examples.
- Speech recognition 2210 also includes a translation engine 3704 .
- Language-specific speech recognition 3702 ( FIG. 37 ) produces text in the language spoken by the patient, i.e., the patient's language, from the audio signal received from the patient.
- translation engine 3704 translates the text from the patient's language to a language that may be processed by language models 2214 , e.g., English.
- language models 2214 may not be as accurate when relying on translation by translation engine 3704 , accuracy of language models 2214 is quite good with currently available translation techniques.
- the importance of language models 2214 is diluted significantly by the incorporation of acoustic models 2218 , visual models 2222 , and clinical data 2220 in the creation of composite model 2204 .
- composite model 2204 is extremely accurate notwithstanding reliance on translation engine 3704 .
- Modeling system logic 5320 includes language model training 2212 ( FIG. 22 ) and language models 2214 , which are shown in greater detail in FIGS. 10 and 11 , respectively.
- Language model training 2212 ( FIG. 38 ) includes logic for training respective models of language models 2214 .
- language model training 2212 ( FIG. 38 ) includes syntactic language model training 3802 , semantic pattern model training 3804 , speech fluency model training 3806 , and non-verbal model training 3808 which include logic for training syntactic language model 3902 , semantic pattern model 3904 , speech fluency model 3906 , and non-verbal model 3908 , respectively, of language models 2214 .
- Each of models 3902 - 3908 includes deep learning (also known as deep structured learning or hierarchical learning) logic that assesses the patient's depression from text received from speech recognition 2210 .
- deep learning also known as deep structured learning or hierarchical learning
- Syntactic language model 3902 assesses a patient's depression from syntactic characteristics of the patient's speech. Examples of such syntactic characteristics include sentence length, sentence completion, sentence complexity, and negation. When a patient speaks in shorter sentences, fails to complete sentences, speaks in simple sentences, and/or uses relatively frequent negation (e.g., “no”, “not”, “couldn't”, “won't”, etc.), syntactic language model 3902 determines that the patient is more likely to be depressed.
- Semantic pattern model 3904 assesses a patient's depression from positive and/or negative content of the patient's speech—i.e., from sentiments expressed by the patient. Some research suggests that expression of negative thoughts may indicate depression and expression of positive thoughts may counter-indicate depression. For example, “the commute here was out” may be interpreted as an indicator for depression while “the commute here was awesome” may be interpreted as a counter-indicator for depression.
- Speech fluency model 3906 assesses a patient's depression from fluency characteristics of, i.e., the flow of, the patient's speech. Fluency characteristics may include, for example, word rates, the frequency and duration of pauses in the speech, the prevalence of filler expressions such as “uh” or “umm”, and packet speech patterns. Some research suggests that lower word rates, frequent and/or long pauses in speech, and high occurrence rates of filler expressions may indicate depression. Perhaps more so than others of language models 2214 , speech fluency model 3906 may be specific to the individual patient. For example, rates of speech (word rates) vary widely across geographic regions. The normal rate of speech for a patient from New York City may be significantly greater than the normal rate of speech for a patient from Minnesota.
- Non-verbal model 3908 assesses a patient's depression from non-verbal characteristics of the patient's speech, such as laughter, chuckles, and sighs. Some research suggests that sighs may indicate depression while laughter and chuckling (and other forms of partially repressed laughter such as giggling) may counter-indicate depression.
- Modeling system logic 5320 includes acoustic model training 2216 ( FIG. 22 ) and acoustic models 2214 , which are shown in greater detail in FIGS. 12 and 13 , respectively.
- Acoustic model training 2216 ( FIG. 40 ) includes logic for training respective models of acoustic models 2218 ( FIG. 41 ).
- acoustic model training 2216 ( FIG. 40 ) includes pitch/energy model training 4002 , quality/phonation model training 4004 , speaking flow model training 4006 , and articulatory coordination model training 4008 which include logic for training pitch/energy model 4102 , quality/phonation pattern model 4104 , speaking flow model 4106 , and articulatory coordination mode 11308 , respectively, of acoustic models 2218 .
- Each of models 4102 - 4108 includes deep learning (also known as deep structured learning or hierarchical learning) logic that assesses the patient's depression from audio signals representing the patient's speech as received from collected patient data 2206 ( FIG. 22 ) and preprocessing 2208 .
- deep learning also known as deep structured learning or hierarchical learning
- Pitch/energy model 4102 assesses a patient's depression from pitch and energy of the patient's speech. Examples of energy include loudness and syllable rate, for example. When a patient speaks with a lower pitch, more softly, and/or more slowly, pitch/energy model 4102 determines that the patient is more likely to be depressed.
- Quality/phonation model 4104 assesses a patient's depression from voice quality and phonation aspects of the patient's speech. Different voice source modifications may occur in depression and affect the voicing related aspects of speech, both generally and for specific speech sounds.
- Speaking flow model 4106 assesses a patient's depression from the flow of the patient's speech.
- Speaking flow characteristics may include, for example, word rates, the frequency and duration of pauses in the speech, the prevalence of filler expressions such as “uh” or “umm”, and packet speech patterns.
- Articulatory coordination model 4108 assesses a patient's depression from articulatory coordination in the patient's speech. Articulatory coordination refers to micro-coordination in timing, among articulators and source characteristics. This coordination becomes worse when the patient is depressed.
- Modeling system logic 5320 includes visual model training 2224 ( FIG. 22 ) and visual models 2226 , which are shown in greater detail in FIGS. 14 and 15 , respectively.
- Visual model training 2224 includes logic for training respective models of visual models 226 ( FIG. 53 ).
- visual model training 2224 includes facial cue model training 4202 and eye/gaze model training 4204 which include logic for training facial cue model 4302 and eye/gaze model 4304 , respectively, of visual models 2226 .
- Each of models 4302 - 4304 includes deep learning (also known as deep structured learning or hierarchical learning) logic that assesses the patient's depression from video signals representing the patient's speech as received from collected patient data 2206 ( FIG. 22 ) and preprocessing 2208 .
- deep learning also known as deep structured learning or hierarchical learning
- Facial cue model 4302 assesses a patient's depression from facial cues recognized in the video of the patient's speech.
- Eye/gaze model 4304 assesses a patient's depression from observed and recognized eye movements in the video of the patient's speech.
- composite model builder 2222 builds composite model 2204 by combining language models 2214 , acoustic models 2218 , and visual models 2226 and training the combined model using both clinical data 2220 and collected patient data 2206 .
- composite model 2204 assesses depression in a patient using what the patient says, how the patient says it, and contemporaneous facial and eye expressions in combination. Such provides a particularly accurate and effective tool for assessing the patient's depression.
- assessment test administrator 2202 is described as assessing the mental health of the human subject, who may be a patient, it is appreciated that “assessment” sometimes refers to professional assessments made by professional clinicians. As used herein, the assessment provided by assessment test administrator 2202 may be any type of assessment in the general sense, including screening or monitoring.
- the models described herein may produce scores, at various stages of an assessment.
- the scores produced may be scaled scores or binary scores. Scaled scores may range over a large number of values, while binary scores may be one of two discrete values.
- the system disclosed may interchange binary and scaled scores at various stages of the assessment, to monitor different mental states, or update particular binary scores and particular scaled scores for particular mental states over the course of an assessment.
- the scores produced by the system may be produced after each response to each query in the assessment, or may be formulated in part based on previous queries. In the latter case, each marginal score acts to fine-tune a prediction of depression, or of another mental state, as well as to make the prediction more robust. Marginal predictions may increase confidence measures for predictions of mental states in this way, after a particular number of queries and responses (correlated with a particular intermediate mental state)
- the refinement of the score may allow clinicians to determine, with greater precision, seventies of one or more mental states the patient is experiencing.
- the refinement of the scaled score when observing multiple intermediate depression states, may allow a clinician to determine whether the patient has mild, moderate, or severe depression.
- Performing multiple scoring iterations may also assist clinicians and administrators in removing false negatives, by adding redundancy and adding robustness.
- initial mental state predictions may be noisier, because relatively fewer speech segments are available to analyze, and NLP algorithms may not have enough information to determine semantic context for the patient's recorded speech. Even though a single marginal prediction may itself be a noisy estimate, refining the prediction by adding more measurements may reduce the overall variance in the system, yielding a more precise prediction.
- the predictions described herein may be more actionable than those which may be obtained by simply administering a survey, as people may have incentive to lie about their conditions. Administering a survey may yield high numbers of false positive and false negative results, enabling patients who need treatment to slip through the cracks. In addition, although trained clinicians may notice voice and face-based biomarkers, they may not be able to analyze the large amount of data the system disclosed is able to analyze.
- the scaled score may be used to describe a severity of a mental state.
- the scaled score may be, for example, a number between 1 and 5, or between 0 and 100, with larger numbers indicating a more severe or acute form of the patient's experienced mental state.
- the scaled score may include integers, percentages, or decimals.
- Conditions for which the scaled score may express severity may include, but are not limited to depression, anxiety, stress, PTSD, phobic disorder, and panic disorder.
- a score of 0 on a depression-related aspect of an assessment may indicate no depression, a score of 50 may indicate moderate depression, and a score of 100 may indicate severe depression.
- the scaled score may be a composition of multiple scores.
- a mental state may be expressed as a composition of mental sub-states, and a patient's composite mental state may be a weighted average of individual scores from the mental sub-states.
- a composition score of depression may be a weighted average of individual scores for anger, sadness, self-image, self-worth, stress, loneliness, isolation, and anxiety.
- a scaled score may be produced using a model that uses a multilabel classifier.
- This classifier may be, for example, a decision tree classifier, a k-nearest neighbors' classifier, or a neural network-based classifier.
- the classifier may produce multiple labels for a particular patient at an intermediate or final stage of assessment, with the labels indicating seventies or extents of a particular mental state.
- a multilabel classifier may output multiple numbers, which may be normalized into probabilities using a softmax layer. The label with the largest probability may indicate the severity of the mental state experienced by the patient.
- the scaled score may also be determined using a regression model.
- the regression model may determine a fit from training examples that are expressed as sums of weighted variables. The fit may be used to extrapolate a score from a patient with known weights.
- the weights may be based in part on features, which may be in part derived from the audiovisual signal (e.g., voice-based biomarkers) and in part derived from patient information, such as patient demographics. Weights used to predict a final score or an intermediate score may be taken from previous intermediate scores.
- the scaled score may be scaled based on a confidence measure.
- the confidence measure may be determined based on recording quality, type of model used to analyze the patient's speech from a recording (e.g., audio, visual, semantic), temporal analysis related to which model was used most heavily during a particular period of time, and the point in time of a specific voice-based biomarker within an audiovisual sample. Multiple confidence measures may be taken to determine intermediate scores. Confidence measures during an assessment may be averaged in order to determine a weighting for a particular scaled score.
- the binary score may reflect a binary outcome from the system.
- the system may classify a user as being either depressed or not depressed.
- the system may use a classification algorithm to do this, such as a neural network or an ensemble method.
- the binary classifier may output a number between 0 and 1. If a patient's score is above a threshold (e.g., 0.5), the patient may be classified as “depressed.” If the patient's score is below the threshold, the patient may be classified as “not depressed.”
- the system may produce multiple binary scores for multiple intermediate states of the assessment.
- the system may weight and sum the binary scores from intermediate sates of the assessment in order to produce an overall binary score for the assessment.
- the outputs of the models described herein can be converted to a calibrated score, e.g., a score with a unit range.
- the outputs of the models described herein can additionally or alternatively be converted to a score with a clinical value.
- a score with a clinical value can be a qualitative diagnosis (e.g., high risk of severe of depression).
- a score with a clinical value can alternatively be a normalized, qualitative score that is normalized with respect to the general population or a specific sub-population of patients. The normalized, qualitative score may indicate a risk percentage relative to the general population or to the sub-population.
- the systems described herein may be able to identify a mental state of a subject (e.g., a mental disorder or a behavioral disorder) with less error (e.g., 10% less) or a higher accuracy (e.g., 10% more) than a standardized mental health questionnaire or testing tool.
- the error rate or accuracy may be established relative to a benchmark standard usable by an entity for identifying or assessing one or more medical conditions comprising said mental state.
- the entity may be a clinician, a healthcare provider, an insurance company, or a government-regulated body.
- the benchmark standard may be a clinical diagnosis that has been independently verified.
- a confidence measure may be a measure of how effective the score produced by the machine learning algorithm may be in order of accurately predicting a mental state, such as depression.
- a confidence measure may depend on conditions under which the score was taken.
- a confidence measure may be expressed as a whole number, a decimal, or a percentage.
- Conditions may include a type of recording device, an ambient space in which signals were taken, background noise, patient speech idiosyncrasies, language fluency of a speaker, the length of responses of the patient, an evaluated truthfulness of the responses of the patient, and frequency of unintelligible words and phrases. Under conditions where the quality of the signal or speech makes it more difficult for the speech to be analyzed, the confidence measure may have a smaller value.
- the confidence measure may be added to the score calculation, by weighting a calculated binary or scaled score with the confidence measure.
- the confidence measure may be provided separately. For example, the system may tell a clinician that the patient has a 0.93 depression score with 75% confidence.
- the confidence level may also be based on the quality of the labels of the training data used to train the models that analyze the patient's speech. For example, if the labels are based on surveys or questionnaires completed by patients rather than official clinical diagnoses, the quality of the labels may be determined to be lower, and the confidence level of the score may thus be lower. In some cases, it may be determined that the surveys or questionnaires have a certain level of untruthfulness. In such cases, the quality of the labels may be determined to be lower, and the confidence level of the score may thus be lower.
- the system may employ one or more signal processing algorithms to filter out background noise, or use impulse response measurements to determine how to remove effects of reverberations caused by objects and features of the environment in which the speech sample was recorded.
- the system may also use semantic analysis to find context clues to determine the identities of missing or unintelligible words.
- the system may use user profiles to group people based on demeanor, ethnic background, gender, age, or other categories. Because people from similar groups may have similar voice-based biomarkers, the system may be able to predict depression with higher confidence, as people who exhibit similar voice-based biomarkers may indicate depression in similar manners.
- depressed people from different backgrounds may be variously categorized by slower speech, monotone pitch or low pitch variability, excessive pausing, vocal timbre (gravelly or hoarse voices), incoherent speech, rambling or loss of focus, terse responses, and stream-of-consciousness narratives.
- voice-based biomarkers may belong to one or more segments of patients analyzed.
- Screening system data store 410 (shown in greater detail in FIG. 44 ) stores and maintains all user and patient data needed for, and collected by, screening or monitoring in the manner described herein.
- Screening system data store 410 includes data store logic 4402 , label estimation logic 4404 , and user and patient databases 4406 .
- Data store logic 4402 controls access to user and patient databases 4406 .
- data store logic 4402 stores audiovisual signals of patients' responses and provides patient clinical history data upon request. If the requested patient clinical history data is not available in user and patient databases 4406 , data store logic 4402 retrieves the patient clinical history data from clinical data server 106 . If the requested patient social history data is not available in user and patient databases 4406 , data store logic 4402 retrieves the patient social history data from social data server 108 .
- Users who are not patients include health care service providers and payers.
- Social media server 108 may include a wide variety of patient/subject data including but not limited to retail purchasing records, legal records (including criminal records), income history, as these may provide valuable insights to a person's health. In many instances, these social determinants of disease contribute more to a person's morbidity than medical care. Appendix B depicts a “Health Policy Brief: The Relative Contributions of Multiple Determinants to Health Outcomes”.
- Label estimation logic 4404 includes logic that specifies labels for which the various learning machines of health screening or monitoring server 102 screen.
- Label estimation logic 4404 includes a user interface through which human operators of health screening or monitoring server 102 may configure and tune such labels.
- Label estimation logic 4404 also controls quality of model training by, inter alia, determining whether data stored in user and patient databases 4406 is of adequate quality for model training.
- Label estimation logic 4404 includes logic for automatically identifying or modifying labels. In particular, if model training reveals a significant data point that is not already identified as a label, label estimation logic 4404 looks for correlations between the data point and patient records, system predictions, and clinical insights to automatically assign a label to the data point.
- interactive screening or monitoring server logic 502 While interactive screening or monitoring server logic 502 is described as conducting an interactive, spoken conversation with the patient to assess the health state of the patient, interactive screening or monitoring server logic 502 may also act in a passive listening mode. In this passive listening mode, interactive screening or monitoring server logic 502 passively listens to the patient speaking without directing questions to be asked of the patient.
- Passive listening mode in this illustrative embodiment, has two (2) variants.
- “conversational” variant the patient is engaged in a conversation with another whose part of the conversation is not controlled by interactive screening or monitoring server logic 502 .
- Examples of conversational passive listening include a patient speaking with a clinician and a patient speaking during a telephone call reminding the patient of an appointment with a clinician or discussing medication with a pharmacist.
- “fly-on-the-wall” (FOTW) or “ambient” variant the patient is speaking alone or in a public, or semi-public, place.
- Examples of ambient passive listening include people speaking in a public space or a hospital emergency room and a person speaking alone, e.g., in an audio diary or leaving a telephone message.
- One potentially useful scenario for screening or monitoring a person speaking alone involves interactive screening or monitoring server logic 502 screening or monitoring calls to police emergency services (i.e., “9-1-1”). Analysis of emergency service callers may distinguish truly urgent callers from less urgent callers.
- Patient screening or monitoring system 100 B ( FIG. 45 ) illustrates a passive listening variation of patient screening or monitoring system 100 ( FIG. 1 ).
- Patient screening or monitoring system 100 B ( FIG. 45 ) includes health screening or monitoring server 102 , a clinical data server 106 , and a social data server 108 , which are as described above and, also as described above, connected to one another through WAN 110 .
- LAN local area network
- listening devices 4512 and 4514 are smart speakers, such as the HomePodTM smart speaker available from Apple Computer of Cupertino, Calif., the Google HomeTM smart speaker available from Google LLC of Mountain View, Calif., and the Amazon EchoTM available from Amazon.com, Inc. of Seattle, Wash.
- listening devices 4512 and 4514 may be other types of listening devices such as microphones coupled to clinician device 114 B, for example.
- a single listening device 4514 is used and screening or monitoring server 102 distinguishes between the patient and the clinician using conventional voice recognition techniques. Accuracy of such voice recognition may be improved by training screening or monitoring server 102 to recognize the clinician's voice prior to any session with a patient. While the following description refers to a clinician as speaking to the patient, it should be appreciated that the clinician may be replaced with another. For example, in a telephone call made to the patient by a health care office administrator, e.g., support staff for a clinician, the administrator takes on the clinician's role as described in the context of conversational passive listening.
- Appendix C depicts an exemplary Question Bank for some of the embodiments in accordance with the present invention.
- FIG. 46 shows an instantiation of a dynamic mode, in which query content is analyzed in real-time.
- Loop step 4602 and next step 4616 define a loop in which generalized dialogue flow logic 602 processes audiovisual signals of the conversation between the patient and the clinician according to steps 4604 - 4614 . While steps 4604 - 4614 are shown as discrete, sequential steps, they are performed concurrently with one another in an ongoing basis by generalized dialogue flow logic 602 .
- the loop of steps 4602 - 4616 is initiated and terminated by the clinician using conventional user interface techniques, e.g., using clinician device 114 B ( FIG. 45 ) or listening device 4514 .
- step 4604 generalized dialogue flow logic 602 recognizes a question to the patient posed by the clinician and sends the question to runtime model server logic 504 for processing and analysis.
- Generalized dialogue flow logic 602 receives results 1820 for the audiovisual signal of the clinician's utterance, and results 1820 ( FIG. 18 ) include a textual representation of the clinician's utterance from ASR logic 1804 along with additional information from descriptive model and analytics 1812 . This additional information includes identification of the various parts of speech of the words in the clinician's utterance.
- step 4606 generalized dialogue flow logic 602 identifies the most similar question in question and dialogue action bank 710 ( FIG. 7 ). If the question recognized in step 4604 is not identical to any questions stored in question and dialogue action bank 710 , generalized dialogue flow logic 602 may identify the nearest question in the manner described above with respect to question equivalence logic 1104 ( FIG. 11 ) or may identify the question in question and dialogue action bank 710 ( FIG. 7 ) that is most similar linguistically.
- step 4608 generalized dialogue flow logic 602 retrieves the quality of the nearest question from question and dialogue action bank 710 , i.e., quality 908 ( FIG. 9 ).
- step 4610 ( FIG. 46 ), generalized dialogue flow logic 602 recognizes an audiovisual signal representing the patient's response to the question recognized in step 4604 .
- the patient's response is recognized as an utterance of the patient immediately following the recognized question.
- the utterance may be recognized as the patient's by (i) determining that the voice is captured more loudly by listening device 4512 than by listening device 4514 or (ii) determining that the voice is distinct from a voice previously established and recognized as the clinician's.
- step 4612 generalized dialogue flow logic 602 sends the patient's response, along with the context of the clinician's corresponding question, to runtime model server logic 504 for analysis and evaluation.
- the context of the clinician's question is important, particularly if the semantics of the patient's response is unclear in isolation. For example, consider that the patient's answer is simply “Yes.” That response is analyzed and evaluated very differently in response to the question “Were you able to find parking?” versus in response to the question “Do you have thoughts of hurting yourself?”
- step 4614 generalized dialogue flow logic 602 reports intermediate analysis received from results 1820 to the clinician.
- the report may be in the form of animated gauges indicating intermediate scores related to a number of health states. Examples of animated gauges include steam gauges, i.e., round dial gauges with a moving needle, and dynamic histograms such as those seen on audio equalizers in sound systems.
- step 4618 interactive screening or monitoring server logic 502 sends final analysis of the conversation to the clinician.
- the “clinician” is always a medical health professional or health records of the patient.
- health screening or monitoring server 102 may screen patients for any of a number of health states passively during a conversation the patient may engage in regardless without requiring a separate, explicit screening or monitoring interview of the patient.
- health screening or monitoring server 102 listens to and processes ambient speech according to logic flow diagram 4700 ( FIG. 47 ). Processing by interactive health screening or monitoring logic 402 , particularly generalized dialogue flow logic 602 ( FIG. 7 ), in ambient passive listening is illustrated by logic flow diagram 4700 ( FIG. 47 ).
- Loop step 4702 and next step 4714 define a loop in which generalized dialogue flow logic 602 processes audiovisual signals of ambient speech according to steps 4704 - 4712 . While steps 4704 - 4714 are shown as discrete, sequential steps, they are performed concurrently with one another in an ongoing basis by generalized dialogue flow logic 602 .
- the loop of steps 4702 - 4714 is initiated and terminated by a human operator of the listening device(s) involved, e.g., listening device 4514 .
- step 4704 generalized dialogue flow logic 602 captures ambient speech.
- interactive screening or monitoring server logic 502 determines whether the speech captured in step 4704 is spoken by a voice that is to be analyzed. In ambient passive listening in areas that are at least partially controlled, many people likely to speak in such areas may be registered with health screening or monitoring server 102 such that their voices may be recognized. In schools, students may have their voices registered with health screening or monitoring server 102 at admission.
- the people whose voices are to be analyzed are admitted students that are recognized by generalized dialogue flow logic 602 .
- hospital personnel may have their voices registered with health screening or monitoring server 102 at hiring.
- patients in hospitals may register their voices at first contact, e.g., at an information desk or by hospital personnel in an emergency room.
- hospital personnel are excluded from analysis when recognized as the speaker by generalized dialogue flow logic 602 .
- generalized dialogue flow logic 602 may still track speaking by unknown speakers. Multiple utterances may be recognized by generalized dialogue flow logic 602 as emanating from the same individual person. Health screening or monitoring server 102 may also determine approximate positions of unknown speakers in environments with multiple listening devices, e.g., by triangulation using different relative amplitudes and/or relative timing of arrival of the captured speech at multiple listening devices.
- the speaker may be asked to identify herself.
- the identity of the speaker may be inferred or is not especially important.
- the speaker may be authenticated by the device or may be assumed to be used by the device's owner.
- police emergency telephone call triage the identity of the caller is not as important as the location of the speaker and qualities of the speaker's voice such as emotion, energy, and the substantive content of the speaker's speech.
- generalized dialogue flow logic 602 always determines that the speaker is to be analyzed.
- generalized dialogue flow logic 602 sends the captured ambient speech to runtime model server logic 504 for processing and analysis for context.
- Generalized dialogue flow logic 602 receives results 1820 for the audiovisual signal of the captured speech, and results 1820 ( FIG. 18 ) include a textual representation of the captured speech from ASR logic 1804 along with additional information from descriptive model and analytics 1812 . This additional information includes identification of the various parts of speech of the words in the clinician's utterance.
- Generalized dialogue flow logic 602 processes results 1820 for the captured speech to establish a context.
- step 4708 ( FIG. 47 ) processing transfers through next step 4714 to loop step 4702 and passive listening accord to the loop of steps 4702 - 4714 continues.
- step 4706 If in test step 4706 , interactive screening or monitoring server logic 502 determines that the speech captured in step 4704 is spoken by a voice that is to be analyzed, processing transfers to step 4710 .
- step 4710 generalized dialogue flow logic 602 sends the captured speech, along with any context determined in prior yet contemporary performances of step 4708 or step 4710 , to runtime model server logic 504 for analysis and evaluation.
- step 4712 generalized dialogue flow logic 602 processes any alerts triggered by the resulting analysis from runtime model server logic 504 according predetermined alert rules.
- predetermined alert rules are analogous to work-flows 4810 described below. In essence, these predetermined alert rules are in the form of if-then-else logic elements that specify logical states and corresponding actions to take in such states.
- alert rules may be implemented by interactive screening or monitoring server logic 502 .
- a police emergency system call in which the caller, speaking initially to an automated triage system, whose speech is determined to be highly emotional and anxious and to semantically describe a highly urgent situation, e.g., a car accident with severe injuries, a very high priority may be assigned to the call and taken ahead of less urgent callers.
- interactive screening or monitoring server logic 502 may trigger immediate notification of law enforcement and school personnel.
- interactive screening or monitoring server logic 502 may record the analysis in the patient's clinical records such that the patient's behavioral health care provider may discuss the diary entry when the patient is next seen. In situations in which the triggering condition of the captured speech is particularly serious and urgent, interactive screening or monitoring server logic 502 may report the location of the speaker if it may be determined.
- health screening or monitoring server 102 may screen patients for any of a number of health states passively outside the confines of a one-to-one conversation with a health care professional.
- health care management logic 408 makes expert recommendations in response to health state analysis of interactive health screening or monitoring logic 402 .
- Health care management logic 408 is shown in greater detail in FIG. 68 .
- Health care management logic 408 includes manual work-flow management logic 4802 , automatic work-flow generation logic 4804 , work-flow execution logic 4806 , and work-flow configuration 4808 .
- Manual work-flow management logic 4802 implements a user interface through which a human administrator may create, modify, and delete work-flows 4810 of work-flow configuration 4808 by physical manipulation of one or more user input devices of a computer system used by the administrator.
- Automatic work-flow generation logic 4804 performs statistical analysis of patient data stored within screening or monitoring system data store 410 to identify work-flows to achieve predetermined goals. Examples of such goals include things like minimizing predicted costs for the next two ( 2 ) years of a patient's care and minimizing the cost of an initial referral while also maximizing a reduction in Hemoglobin A 1 C in one year.
- Work-flow execution logic 4806 processes work-flows 4810 of work-flow configuration 4808 , evaluating conditions and performing actions of work-flow elements 4820 .
- work-flow execution logic 4806 processes work-flows 4810 in response to receipt of final results of any screening or monitoring according to logic flow diagram 800 ( FIG. 8 ) using those results in processing conditions of the work-flows.
- Work-flow configuration 4808 ( FIG. 48 ) includes data representing a number of work-flows 4810 .
- Each work-flow 4810 includes work-flow metadata 4812 and data representing a number of work-flow elements 4820 .
- Work-flow metadata 4812 is metadata of work-flow 4810 and includes data representing a description 4812 , an author 4816 , and a schedule 4818 .
- Description 4812 is information intended to inform any human operator of the nature of work-flow 4810 .
- Author 4816 identifies the entity that created work-flow 4810 , whether a human administrator or automatic work-flow generation logic 4804 .
- Schedule 4818 specifies dates and times and/or conditions in which work-flow execution logic 4806 is to process work-flow 4810 .
- Work-flow elements 4820 collectively define the behavior of work-flow execution logic 4806 in processing the work-flow.
- work-flow elements are each one of two types: conditions, such as condition 4900 ( FIG. 49 ), and actions such as action 5000 ( FIG. 50 ).
- condition 4900 specifies a Boolean test that includes an operand 4902 , an operator 4904 , and another operand 4906 .
- Operands 4902 and 4906 may each be results 1820 ( FIG. 18 ) or any portion thereof, a constant, or null.
- any results of a given screening or monitoring e.g., results 1820 , any information about a given patient stored in screening or monitoring system data store 410 , and any combination thereof may be either of operands 4902 and 4906 .
- Next work-flow element(s) 4908 specify one or more work-flow elements to process if the test of operands 4902 and 4906 and operator 4904 evaluate to a Boolean value of true
- next work-flow element(s) 4910 specify one or more work-flow elements to process if the test of operands 4902 and 4906 and operator 4904 evaluate to a Boolean value of false.
- next work-flow element(s) 4908 and 4910 may be any of a condition, an action, or null.
- condition 4900 By accepting conditions such as condition 4900 in next work-flow element(s) 1908 and 4910 , complex tests with AND and OR operations may be represented in work-flow elements 4820 .
- condition 4900 may include more operands and operators combined with AND, OR, and NOT operations.
- condition 4900 may test for the mere presence or absence of an occurrence in the patient's data. For example, to determine whether a patient has ever had a Hemoglobin A1C blood test, condition 4900 may determine whether the most recent Hemoglobin A1C test results to null. If equal, the patient has not had any Hemoglobin A 1 C blood test at all.
- Action 5000 ( FIG. 50 ) includes action logic 5002 and one or more next work-flow element(s) 5004 .
- Action logic 5002 represents the substantive action to be taken by work-flow execution logic 4806 and typically makes or recommends a particular course of action in the care of the patient that may range from specific treatment protocols to more holistic paradigms. Examples include referring the patient to a care provider, enrolling the patient in a particular program of care, and recording recommendations to the patient's file such that the patient's clinician sees the recommendation at the next visit. Examples of referring a patient to a care provider include referring the patient to a psychiatrist, a medication management coach, physical therapist, nutritionist, fitness coach, dietitian, social worker, etc. Examples of enrolling the patient in a program include telepsychiatry programs, group therapy programs, etc.
- Examples of recommendations recorded to the patient's file include recommended changes to medication, whether a change in the particular drug prescribed or merely in dosage of the drug already prescribed to the patient, and other treatments.
- referrals and enrollment may be effected by recommendations for referrals and enrollment in the patient's file, allowing a clinician to make the final decision regarding the patient's care.
- automatic work-flow generation logic 4804 ( FIG. 48 ) performs statistical analysis of patient data stored within screening or monitoring system data store 410 to identify work-flows to achieve predetermined goals. Examples of such goals given above include minimizing predicted costs for the next two (2) years of a patient's care and minimizing the cost of an initial referral while also maximizing a reduction in Hemoglobin A 1 C in one year. Automatic work-flow generation logic 4804 is described in the illustrative context of the first, namely, minimizing predicted costs for the next two (2) years of a patient's care.
- Automatic work-flow generation logic 4804 includes deep learning machine logic.
- human computer engineers configure this deep learning machine logic of automatic work-flow generation logic 4804 to analyze patient data from screening or monitoring system data store 410 in the context of labels specified by users, e.g., labels related to costs of the care of each patient over a 2-year period in this illustrative example.
- Users of health screening or monitoring server 102 who are not merely patients are typically either health care providers or health care payers. In either case, information regarding events in a given patient's health care history is available and is included in automatic work-flow generation logic 4804 by the human engineers such that automatic work-flow generation logic 4804 may track costs of a patient's care from the patient's medical records.
- the human engineers use all relevant data of screening or monitoring system data store 410 to train the deep learning machine logic of automatic work-flow generation logic 4804 .
- the deep learning machine logic of automatic work-flow generation logic 4804 includes an extremely complex decision tree that predicts the costs of each patient over a 2-year period.
- automatic work-flow generation logic 4804 determines which events in a patient's medical history have the most influence over the cost of the patient's care in a 2-year period for statistically significant portions of the patient population.
- automatic work-flow generation logic 4804 identifies deep learning machine (DLM) nodes of the decision tree that have the most influence over the predetermined goals, e.g., costs of the care of a patient over a 2-year period.
- DLM deep learning machine
- nodes in machine learning parlance
- a deep learning machine examples include random decision forests (supervised or unsupervised), multinomial logistic regression, and na ⁇ ve Bayes classifiers, for example. These techniques are known and are not described herein.
- Loop step 5106 and next step 5112 define a loop in which automatic work-flow generation logic 4804 processes each of the influential nodes identified in step 5104 .
- the particular node processed by automatic work-flow generation logic 4804 is sometimes referred to as the subject node.
- automatic work-flow generation logic 4804 forms a condition, e.g., condition 4900 ( FIG. 49 ), from the internal logic of the subject node.
- the internal logic of the subject node receives data representing one or more events in a patient's history and/or one or more phenotypes of the patient and makes a decision that represents one or more branches to other nodes.
- automatic work-flow generation logic 4804 generalizes the data received by the subject node and the internal logic of the subject node that maps the received data to a decision.
- automatic work-flow generation logic 4804 forms an action, e.g., action 5000 ( FIG. 50 ), according to the branch from the subject node that ultimately leads to the best outcome related to the predetermined goal, e.g., to the lowest cost over a 2-year period.
- the condition formed in step 5108 ( FIG. 51 ) and the action formed in step 5110 collectively form a work-flow generated by automatic work-flow generation logic 4804 .
- processing by automatic work-flow generation logic 4804 completes, having formed a number of work-flows.
- the automatically generated work-flows are subject to human ratification prior to actual deployment within health care management logic 408 .
- health care management logic 408 automatically deploys work-flows generated automatically by automatic work-flow generation logic 4804 but limits actions to only recommendations to health care professionals. It's technically feasible to fully automate work-flow generation and changes to a patient's care without any human supervision. However, such may be counter to health care public policy in place today.
- the disclosed system may also be used to evaluate mental health from primary care health interactions.
- the system may be used to augment inferences about a patient's mental health taken by a trained health provider individual.
- the system may also be used to evaluate mental health from a preliminary screening or monitoring call (e.g., a call made to a health care provider organization by a prospective patient for the purpose of setting up a medical appointment with a trained mental health professional).
- a preliminary screening or monitoring call e.g., a call made to a health care provider organization by a prospective patient for the purpose of setting up a medical appointment with a trained mental health professional.
- the health care professional may ask specific questions to the patient in a particular order to ascertain mental health treatment needs of the patient.
- a recording device may record prospective patient responses to one or more of these questions. The prospective patient's consent may be obtained before this occurs.
- the system may perform an audio analysis or a semantic analysis on audio snippets it collects from the prospective patient. For example, the system may determine relative frequencies of words or phrases associated with depression. For example, the system may predict that a user has depression if the user speaks with terms associated with negative thoughts, such as phrases indicating suicidal thoughts, self-harm instincts, phrases indicating a poor body image or self-image, and feelings of anxiety, isolation, or loneliness.
- the system may also pick up non-lexical or non-linguistic cues for depression, such as pauses, gasps, sighs, and slurred or mumbled speech. These terms and non-lexical cues may be similar to those picked up from training examples, such as patients administered a survey (e.g., the PHQ-9).
- the system may determine information about mental health by probing a user's physical health. For example, a user may feel insecure or sad about his or her physical features or physical fitness. Questions used to elicit information may have to do with vitals, such as blood pressure, resting heart rate, family history of disease, blood sugar, body mass index, body fat percentage, injuries, deformities, weight, height, eyesight, eating disorders, cardiovascular endurance, diet, or physical strength. Patients may provide speech which indicates despondence, exasperation, sadness, or defensiveness. For example, a patient may provide excuses as to why he or she has not gotten a medical procedure performed, why his or her diet is not going well, why he or she has not started an exercise program, or speak negatively about his or her height, weight, or physical features. Expression of such negativity about one's physical health may be correlated to anxiety.
- vitals such as blood pressure, resting heart rate, family history of disease, blood sugar, body mass index, body fat percentage, injuries, deformities, weight, height, eyesight, eating
- the models may be continually active or passive.
- a passive learning model may not change the method by which it learns in response to new information. For example, a passive learner may continually use a specific condition to converge on a prediction, even as new types of feature information are added to the system. But such a model may be limited in effectiveness without a large amount of training data available.
- An active learning model may employ a human to converge more quickly. The active learner may ask targeted questions to the human in order to do this.
- a machine learning algorithm may be employed on a large amount of unlabeled audio samples. The algorithm may be able to easily classify some as being indicative of depression, but others may be ambiguous. The algorithm may ask the patient if he or she were feeling depressed when uttering a specific speech segment. Or the algorithm may ask a clinician to classify the samples.
- the system may be able to perform quality assurance of health providers using voice biomarkers.
- Data from the system may be provided to health care providers in order to assist the health care providers with detecting lexical and non-lexical cues that correspond to depression in patients.
- the health care providers may be able to use changes in pitch, vocal cadence, and vocal tics to determine how to proceed with care.
- the system may also allow health care providers to assess which questions elicit reactions from patients that are most predictive for depressions.
- Health care providers may use data from the system to train one another to search for lexical and non-lexical cues, and monitor care delivery to determine whether it is effective in screening or monitoring patients.
- a health care provider may be able to observe a second health care provider question a subject to determine whether the second health care provider is asking questions that elicit useful information from the patient.
- the health care provider may be asking the questions in person or may be doing so remotely, such as from a call center.
- Health care providers may, using the semantic and audio information produced by the system, produce standardized methods of eliciting information from patients, based on which methods produce the most cues from patients.
- the system may be used to provide a dashboard tabulating voice-based biomarkers observed in patients.
- health care providers may be able to track the frequencies of specific biomarkers, in order to keep track of patients' conditions. They may be able to track these frequencies in real time to assess how their treatment methods are performing. They may also be able to track these frequencies over time, in order to monitor patients' performances under treatment or recovery progress.
- Mental health providers may be able to assess each other's performances using this collected data.
- Dashboards may show real-time biomarker data as a snippet is being analyzed. They may show line graphs showing trends in measured biomarkers over time. The dashboards may show predictions taken at various time points, charting a patient's progress with respect to treatment. The dashboard may show patients' responses to treatment by different providers.
- the system may be able to translate one or more of its models across different patient settings. This may be done to account for background audio information in different settings.
- the system may employ one or more signal processing algorithms to normalize audio input across settings. This may be done by taking impulse response measurements of multiple locations and determining transfer functions of signals collected at those locations in order to normalize audio recordings.
- the system may also account for training in different locations. For example, a patient may feel more comfortable discussing sensitive issues at home or in a therapist's office than over the phone. Thus, voice-based biomarkers obtained in these settings may differ.
- the system may be trained in multiple locations, or training data may be labeled by location before it is processed by the system's machine learning algorithms.
- the models may be transferred from location to location, for example, by using signal processing algorithms. They may also be transferred by modifying the questions asked of patients based on their locations. For example, it may be determined which particular questions, or sequences of questions, correspond to particular reactions within a particular location context. The questions may then be administered by the health care providers in such fashion as to provide the same reactions from the patients.
- the system may be able to use standard clinical encounters to train voice biomarker models.
- the system may collect recordings of clinical encounters for physical complaints.
- the complaints may be regarding injuries, sicknesses, or chronic conditions.
- the system may record, with patient permission, conversation patients have with health care providers during appointments.
- the physical complaints may indicate patients' feelings about their health conditions. In some cases, the physical complaints may be causing patients significant distress, affecting their overall dispositions and possibly causing depression.
- the data may be encrypted as it is collected or while in transit to one or more servers within the system.
- the data may be encrypted using a symmetric-key encryption scheme, a public-key encryption scheme, or a blockchain encryption method.
- Calculations performed by the one or more machine learning algorithms may be encrypted using a homomorphic encryption scheme, such as a partially homomorphic encryption scheme or a fully homomorphic encryption scheme.
- the data may be analyzed locally, to protect privacy.
- the system may analyze data in real-time by implementing a trained machine learning algorithm to operate on speech sample data recorded at the location where the appointment is taking place.
- the data may be stored locally.
- features may be extracted before being stored in the cloud for later analysis.
- the features may be anonymized to protect privacy. For example, patients may be given identifiers or pseudonyms to hide their true identities.
- the data may undergo differential privacy to ensure that patient identities are not compromised. Differential privacy may be accomplished by adding noise to a data set. For example, a data set may include 100 records corresponding to 100 usernames and added noise. If an observer has information about 99 records corresponding to 99 users and knows the remaining username, the observer will not be able to match the remaining record to the remaining username, because of the noise present in the system.
- a local model may be embedded on a user device.
- the local model may be able to perform limited machine learning or statistical analysis, subject to constraints of device computing power and storage.
- the model may also be able to perform digital signal processing on audio recordings from patients.
- the mobile device used may be a smartphone or tablet computer.
- the mobile device may be able to download algorithms over a network for analysis of local data.
- the local device may be used to ensure privacy, as data collected and analyzed may not travel over a network.
- Voice-based biomarkers may be associated with lab values or physiological measurements. Voice-based biomarkers may be associated with mental health-related measurements. For example, they may be compared to the effects of psychiatric treatment, or logs taken by healthcare professionals such as therapists. They may be compared to answers to survey questions, to see if the voice-based analysis matches assessments commonly made in the field.
- Voice-based biomarkers may be associated with physical health-related measurements. For example, vocal issues, such as illness, may contribute to a patient producing vocal sounds that need to be accounted for in order to produce actionable predictions. In addition, depression predictions over a time scale in which a patient is recovering from an illness or injury may be compared to the patient's health outcomes over that time scale, to see if treatment is improving the patient's depression or depression-related symptoms. Voice-based biomarkers may be compared with data relating to brain activity collected during multiple time points, in order to determine the clinical efficacy of the system.
- Training of the models may be continuous, so that the model is continuously running while audio data is collected.
- Voice-based biomarkers may be continually added to the system and used for training during multiple epochs.
- Models may be updated using the data as it is collected.
- the system may use a reinforcement learning mechanism, where survey questions may be altered dynamically in order to elicit voice-based biomarkers that yield high-confidence depression predictions.
- the reinforcement learning mechanism may be able to select questions from a group. Based on a previous question or a sequence of previous questions, the reinforcement mechanism may choose a question that may yield a high-confidence prediction of depression.
- the system may be able to determine which questions or sequences of questions may be able to yield particular elicitations from patients.
- the system may use machine learning to predict a particular elicitation, by producing, for example, a probability.
- the system may also use a softmax layer to produce probabilities for multiple elicitations.
- the system may use as features particular questions as well as at what times these questions are asked, how long into a survey they are asked, the time of day in which they are asked, and the point of time within a treatment course within which they are asked.
- a specific question asked at a specific time about a sensitive subject for a patient may elicit crying from a patient. This crying may be associated strongly with depression.
- the system may, when receiving context that it is the specific time, may recommend presentation of the question to the patient.
- the system may include a method of using a voice-based biomarker to dynamically affect a course of treatment.
- the system may log elicitations of users over a period of time and determine, from the logged elicitations, whether or not treatment has been effective. For example, if voice-based biomarkers become less indicative of depression over a long time period, this might be evidence that the prescribed treatment is working. On the other hand, if the voice-based biomarkers become more indicative of depression over a long time period, the system may prompt health care providers to pursue a change in treatment, or to pursue the current course of treatment more aggressively.
- the system may spontaneously recommend a change in treatment.
- the system may detect a sudden increase in voice-based biomarkers indicating depression. This may occur over a relatively short time window in a course of treatment.
- the system may also be able to spontaneously recommend a change if a course of treatment has been ineffective for a particular time period (e.g., six months, a year).
- the system may be able to track a probability of a particular response to a medication.
- the system may be able to track voice based biomarkers taken before, during, and after a course of treatment, and analyze changes in scores indicative of depression.
- the system may be able to track a particular patient's probability of response to medication by having been trained on similar patients.
- the system may use this data to predict a patient's response based on responses of patients from similar demographics. These demographics may include age, gender, weight, height, medical history, or a combination thereof.
- the system may also be able to track a patient's likely adherence to a course of medicine or treatment. For example, the system may be able to predict, based on analysis of time series voice-based biomarkers, whether a treatment is having an effect on a patient. The health care provider may then ask the patient whether he or she is following the treatment.
- the system may be able to tell, based on surveying the questions, if the patient is following the treatment by analyzing his or her biomarkers. For example, a patient may become defensive, take long pauses, stammer, or act in a manner that the patient is clearly lying about having adhered to a treatment plan. The patient may also express sadness, shame, or regret regarding not having followed the treatment plan.
- the system may be able to predict whether a patient will adhere to a course of treatment or medication.
- the system may be able to use training data from voice-based biomarkers from many patients in order to make a prediction as to whether a patient will follow a course of treatment.
- the system may identify particular voice-based biomarkers as predicting adherence. For example, patients with voice-based biomarkers indicating dishonesty may be designated as less likely to adhere to a treatment plan.
- the system may be able to establish a baseline profile for each individual patient.
- An individual patient may have a particular style of speaking, with particular voice-based biomarkers indicating emotions, such as happiness, sadness, anger, and grief. For example, some people may laugh when frustrated or cry when happy. Some people may speak loudly or softly, speak clearly or mumble, have large or small vocabularies, and speak freely or more hesitantly. Some people may have extroverted personalities, while others may be more introverted.
- Some people may be more hesitant to speak than others. Some people may be more guarded about expressing their feelings. Some people may have experienced trauma and abuse. Some people may be in denial about their feelings.
- a person's baseline mood or mental state, and thus the person's voice-based biomarkers, may change over time.
- the model may be continually trained to account for this.
- the model may also predict depression less often.
- the model's predictions over time may be recorded by mental health professionals. These results may be used to show a patient's progress out of a depressive state.
- the system may be able to make a particular number of profiles to account for different types of individuals. These profiles may be related to individuals' genders, ages, ethnicities, languages spoken, and occupations, for example.
- Particular profiles may have similar voice-based biomarkers. For example, older people may have thinner, breathier voices than younger people. Their weaker voices may make it more difficult for microphones to pick up specific biomarkers, and they may speak more slowly than younger people. In addition, older people may stigmatize behavioral therapy, and thus, not share as much information as younger people might.
- Men and women may express themselves differently, which may lead to different biomarkers. For example, men may express negative emotions more aggressively or violently, while women may be better able to articulate their emotions.
- people from different cultures may have different methods of dealing with or expressing emotions, or may feel guilt and shame when expressing negative emotions. It may be necessary to segment people based on their cultural backgrounds, in order to make the system more effective with respect to picking up idiosyncratic voice-based biomarkers.
- the system may account for people with different personality types by segmenting and clustering by personality type. This may be done manually, as clinicians may be familiar with personality types and how people of those types may express feelings of depression. The clinicians may develop specific survey questions to elicit specific voice-based biomarkers from people from these segmented groups.
- the voice-based biomarkers may be able to be used to determine whether somebody is depressed, even if the person is holding back information or attempting to outsmart testing methods. This is because many of the voice-based biomarkers may be involuntary utterances. For example, the patient may equivocate or the patient's voice may quaver.
- Particular voice-based biomarkers may correlate with particular causes of depression. For example, semantic analysis performed on many patients, in order to find specific words, phrases, or sequences thereof that indicate depression. The system may also track effects of treatment options on users, in order to determine their efficacy. Finally, the system may use reinforcement learning to determine better methods of treatment available.
- Real-time system 302 is shown in greater detail in ( FIG. 52 ).
- Real-time system 302 includes one or more microprocessors 5202 (collectively referred to as CPU 5202 ) that retrieve data and/or instructions from memory 5204 and execute retrieved instructions in a conventional manner.
- Memory 5204 may include generally any computer-readable medium including, for example, persistent memory such as magnetic and/or optical disks, ROM, and PROM and volatile memory such as RAM.
- CPU 5202 and memory 5204 are connected to one another through a conventional interconnect 5206 , which is a bus in this illustrative embodiment and which connects CPU 5202 and memory 5204 to one or more input devices 5208 , output devices 5210 , and network access circuitry 5212 .
- Input devices 5208 may include, for example, a keyboard, a keypad, a touch-sensitive screen, a mouse, a microphone, and one or more cameras.
- Output devices 5210 may include, for example, a display—such as a liquid crystal display (LCD)—and one or more loudspeakers.
- Network access circuitry 5212 sends and receives data through computer networks such as network 308 ( FIG. 3 ).
- server computer systems often exclude input and output devices, relying instead on human user interaction through network access circuitry. Accordingly, in some embodiments, real-time system 302 does not include input device 708 and output device 5210 .
- assessment test administrator 2202 and composite model 2204 are each all or part of one or more computer processes executing within CPU 5302 from memory 5304 in this illustrative embodiment but may also be implemented using digital logic circuitry.
- Assessment test administrator 2202 and composite model 2204 are both logic.
- logic refers to (i) logic implemented as computer instructions and/or data within one or more computer processes and/or (ii) logic implemented in electronic circuitry.
- Assessment test configuration 5220 is data stored persistently in memory 5304 and may each be implemented as all or part of one or more databases.
- Modeling system 304 ( FIG. 3 ) is shown in greater detail in ( FIG. 53 ).
- Modeling system 304 includes one or more microprocessors 5302 (collectively referred to as CPU 5302 ), memory 5304 , an interconnect 5306 , input devices 5308 , output devices 5310 , and network access circuitry 5312 that are directly analogous to CPU 5202 ( FIG. 52 ), memory 5204 , interconnect 5206 , input devices 5208 , output devices 5210 , and network access circuitry 5212 , respectively.
- CPU 5302 microprocessors 5302
- memory 5304 includes one or more microprocessors 5302 (collectively referred to as CPU 5302 ), memory 5304 , an interconnect 5306 , input devices 5308 , output devices 5310 , and network access circuitry 5312 that are directly analogous to CPU 5202 ( FIG. 52 ), memory 5204 , interconnect 5206 , input devices 5208 , output devices 5210 , and network access circuitry 5212 , respectively.
- a number of components of modeling system 304 ( FIG. 53 ) are stored in memory 5304 .
- modeling system logic 5320 is all or part of one or more computer processes executing within CPU 5302 from memory 5304 in this illustrative embodiment but may also be implemented using digital logic circuitry.
- Collected patient data 2206 , clinical data 2220 , and modeling system configuration 5322 are each data stored persistently in memory 5304 and may be implemented as all or part of one or more databases.
- real-time system 302 modeling system 304
- clinical data server 306 are shown, at least in the Figures, as separate, single server computers. It should be appreciated that logic and data of separate server computers described herein may be combined and implemented in a single server computer and that logic and data of a single server computer described herein may be distributed across multiple server computers. Moreover, it should be appreciated that the distinction between servers and clients is largely an arbitrary one to facilitate human understanding of purpose of a given computer. As used herein, “server” and “client” are primarily labels to assist human categorization and understanding.
- Health screening or monitoring server 102 is shown in greater detail in FIG. 54 . As noted above, it should be appreciated that the behavior of health screening or monitoring server 102 described herein may be distributed across multiple computer systems using conventional distributed processing techniques. Health screening or monitoring server 102 includes one or more microprocessors 5402 (collectively referred to as CPU 5402 ) that retrieve data and/or instructions from memory 5404 and execute retrieved instructions in a conventional manner. Memory 5404 may include generally any computer-readable medium including, for example, persistent memory such as magnetic, solid state and/or optical disks, ROM, and PROM and volatile memory such as RAM.
- persistent memory such as magnetic, solid state and/or optical disks
- ROM read only memory
- PROM volatile memory
- CPU 5402 and memory 5404 are connected to one another through a conventional interconnect 5406 , which is a bus in this illustrative embodiment and which connects CPU 5402 and memory 5404 to one or more input devices 5408 , output devices 5410 , and network access circuitry 5412 .
- Input devices 5408 may include, for example, a keyboard, a keypad, a touch-sensitive screen, a mouse, a microphone, and one or more cameras.
- Output devices 5410 may include, for example, a display—such as a liquid crystal display (LCD)—and one or more loudspeakers.
- Network access circuitry 5412 sends and receives data through computer networks such as WAN 110 ( FIG. 1 ). Server computer systems often exclude input and output devices, relying instead on human user interaction through network access circuitry exclusively.
- health screening or monitoring server 102 does not include input devices 5408 and output devices 5410 .
- a number of components of health screening or monitoring server 102 are stored in memory 5404 .
- interactive health screening or monitoring logic 402 and health care management logic 408 are each all or part of one or more computer processes executing within CPU 5402 from memory 5404 .
- logic refers to (i) logic implemented as computer instructions and/or data within one or more computer processes and/or (ii) logic implemented in electronic circuitry.
- Screening system data store 410 and model repository 416 are each data stored persistently in memory 5404 and may be implemented as all or part of one or more databases. Screening system data store 410 also includes logic as described above.
- servers and clients are largely an arbitrary one to facilitate human understanding of purpose of a given computer.
- server and “client” are primarily labels to assist human categorization and understanding.
- PTSD post-traumatic stress disorder
- stress generally, drug and alcohol addiction, and bipolar disorder
- health screening or monitoring server 102 may screen for health states unrelated to mental or behavior health. Examples include Parkinson's disease, Alzheimer's disease, chronic obstructive pulmonary disease, liver failure, Crohn's disease, myasthenia gravis, amyotrophic lateral sclerosis (ALS) and decompensated heart failure.
- ALS amyotrophic lateral sclerosis
- corroborative patient data for mental illness diagnostics may be extracted from one or more of the patient's biometrics including heart rate, blood pressure, respiration, perspiration, body temperature. It may also be possible to use audio without words, for privacy or for cross-language analysis. It is also possible to use acoustics modeling without visual cues.
- FIGS. 57 and 58 illustrate a Computer System 5700 , which is suitable for implementing embodiments of the present invention.
- FIG. 57 shows one possible physical form of the Computer System 5700 .
- the Computer System 5700 may have many physical forms ranging from a printed circuit board, an integrated circuit, and a small handheld device up to a huge super computer, and a collection of networked computers (or computing components operating in a distributed network).
- Computer system 5700 may include a Monitor 5702 , a Display 5704 , a Housing 5706 , a Disk Drive 5708 , a Keyboard 5710 , and a Mouse 5712 .
- Storage medium 5714 is a computer-readable medium used to transfer data to and from Computer System 5700 .
- FIG. 58 is an example of a block diagram 5800 for Computer System 5700 . Attached to System Bus 5720 are a wide variety of subsystems.
- Processor(s) 5722 also referred to as central processing units, or CPUs
- Memory 5724 includes random access memory (RAM) and read-only memory (ROM).
- RAM random access memory
- ROM read-only memory
- RAM random access memory
- ROM read-only memory
- Both of these types of memories may include any suitable of the computer-readable media described below.
- a Fixed medium 5726 may also be coupled bi-directionally to the Processor 5722 ; it provides additional data storage capacity and may also include any of the computer-readable media described below.
- Fixed medium 5726 may be used to store programs, data, and the like and is typically a secondary storage medium (such as a hard disk) that is slower than primary storage. It will be appreciated that the information retained within Fixed medium 5726 may, in appropriate cases, be incorporated in standard fashion as virtual memory in Memory 5724 .
- Removable medium 5714 may take the form of any of the computer-readable media described below.
- Processor 5722 is also coupled to a variety of input/output devices, such as Display 5704 , Keyboard 5710 , Mouse 5712 and Speakers 5730 .
- an input/output device may be any of: video displays, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, motion sensors, motion trackers, brain wave readers, or other computers.
- Processor 5722 optionally may be coupled to another computer or telecommunications network using Network Interface 5740 .
- the Processor 5722 might receive information from the network or might output information to the network in the course of performing the above-described health screening or monitoring. Furthermore, method embodiments of the present invention may execute solely upon Processor 5722 or may execute over a network such as the Internet in conjunction with a remote CPU that shares a portion of the processing.
- Software is typically stored in the non-volatile memory and/or the drive unit. Indeed, for large programs, it may not even be possible to store the entire program in the memory. Nevertheless, it should be understood that for software to run, if necessary, it is moved to a computer readable location appropriate for processing, and for illustrative purposes, that location is referred to as the memory in this disclosure. Even when software is moved to the memory for execution, the processor will typically make use of hardware registers to store values associated with the software, and local cache that, ideally, serves to speed up execution.
- a software program is assumed to be stored at any known or convenient location (from non-volatile storage to hardware registers) when the software program is referred to as “implemented in a computer-readable medium.”
- a processor is considered to be “configured to execute a program” when at least one value associated with the program is stored in a register readable by the processor.
- the computer system 5700 may be controlled by operating system software that includes a file management system, such as a disk operating system.
- operating system software with associated file management system software is the family of operating systems known as Windows® from Microsoft Corporation of Redmond, Wash., and their associated file management systems.
- Windows® from Microsoft Corporation of Redmond, Wash.
- Windows® is the family of operating systems known as Windows® from Microsoft Corporation of Redmond, Wash.
- Linux operating system is the Linux operating system and its associated file management system.
- the file management system is typically stored in the non-volatile memory and/or drive unit and causes the processor to execute the various acts required by the operating system to input and output data and to store data in the memory, including storing files on the non-volatile memory and/or drive unit.
- the machine operates as a standalone device or may be connected (e.g., networked) to other machines.
- the machine may operate in the capacity of a server or a client machine in a client-server network environment or as a peer machine in a peer-to-peer (or distributed) network environment.
- the machine may be a server computer, a client computer, a virtual machine, a personal computer (PC), a tablet PC, a laptop computer, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, an iPhone, a Blackberry, a processor, a telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- PC personal computer
- PDA personal digital assistant
- machine-readable medium or machine-readable storage medium is shown in an exemplary embodiment to be a single medium, the term “machine-readable medium” and “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions.
- the term “machine-readable medium” and “machine-readable storage medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the presently disclosed technique and innovation.
- routines executed to implement the embodiments of the disclosure may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “computer programs.”
- the computer programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and when read and executed by one or more processing units or processors in a computer, cause the computer to perform operations to execute elements involving the various aspects of the disclosure.
- the systems disclosed herein may be used to augment care provided by healthcare providers.
- one or more of the systems disclosed may be used to facilitate handoffs of patients to patient care providers.
- the system following an assessment, produces a score above a threshold for a particular mental state, the system may refer the patient to a specialist for further investigation and analysis.
- the patient may be referred before the assessment has been completed, for example, if the patient is receiving treatment in a telemedicine system or if the specialist is co-located with the patient.
- the patient may be receiving treatment in a clinic with one or more specialists.
- the system disclosed may be able to direct clinical processes for patients, following scoring. For example, if the patient were taking the assessment using a client device, the patient may, following completion of the assessment, be referred to cognitive behavioral therapy (CBT) services. They may also be referred to health care providers, or have appointments with health care providers made by the system. The system disclosed may suggest one or more medications.
- CBT cognitive behavioral therapy
- FIG. 59 shows an instantiation of a precision case management use case for the system.
- the patient has a conversation with a case manager.
- one or more entities passively record the conversation, with consent of the patient.
- the conversation may be a face-to-face conversation.
- the case manager may perform the conversation remotely.
- the conversation may be a conversation using a telemedicine platform.
- real time results are passed to a payer.
- the real time results may include a score corresponding to a mental state.
- the case manager may update a care plan based on the real time results.
- a particular score that exceeds a particular threshold may influence a future interaction between a care provider and a patient and may cause the provider to ask different questions of the patient.
- the score may even trigger the system to suggest particular questions associated with the score.
- the conversation may be repeated with the updated care plan.
- FIG. 60 shows an instantiation of a primary care screening or monitoring use case for the system.
- the patient visits with a primary care provider.
- speech may be captured by the primary care provider's organization for e-transcription and the system may provide a copy for analysis.
- the primary care provider from the analysis, may receive a real-time vital sign informing the care pathway. This may facilitate a warm handoff to a behavioral health specialist or may be used to direct a primary care provider on a specific care pathway.
- FIG. 61 shows an example system for enhanced employee assistance plan (EAP) navigation and triage.
- the patient may call the EAP line.
- the system may record audiovisual data and screen the patient.
- the real time screening or monitoring results may be delivered to the provider in real time.
- the provider may be able to adaptively screen the patient about high risk topics, based on the collected real-time results.
- the real-time screening or monitoring data may also be provided to other entities.
- the real-time screening or monitoring data may be provided to a clinician-on-call, used to schedule referrals, used for education purposes, or for other purposes.
- the interaction between the patient and EAP may be in-person or may be remote.
- a person staffing an EAP line may be alerted in real-time that a patient has a positive screen and may be able to help direct the patient to a proper level of therapy.
- An EAP may also be directed to ask questions based on a result of an assessment administered to a patient, for example, a score corresponding to a patient's mental state.
- Speech data as described herein may be collected and analyzed in real-time, or it may be data that is recorded and then analyzed later.
- the system disclosed herein may be used to monitor interactions between unlicensed coaches and patients.
- the system may request consent from the patients before monitoring.
- the coaches may be used to administer questions.
- the coaches in tandem with the assessment may be able to provide an interaction with the patient that provides actionable predictions to clinicians and health care professionals, without being as costly as using the services of a clinician or health care.
- the assessment may be able to add rigor and robustness to judgments made by the unlicensed coaches.
- the assessment may also allow more people to take jobs as coaches, as it provides a method for validating coaches' methods.
- the term “about” refers to an amount that is near the stated amount by 10%, 5%, or 1%, including increments therein.
- the term “about” in reference to a percentage refers to an amount that is greater or less the stated percentage by 10%, 5%, or 1%, including increments therein.
- each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
- FIG. 62 shows a computer system 6201 that is programmed or otherwise configured to assess a mental state of a subject in a single session or over multiple different sessions.
- the computer system 6201 can regulate various aspects of assessing a mental state of a subject in a single session or over multiple different sessions of the present disclosure, such as, for example, presenting queries, retrieving data, and processing data.
- the computer system 6201 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device.
- the electronic device can be a mobile electronic device.
- the computer system 6201 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 6205 , which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- the computer system 6201 also includes memory or memory location 6210 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 6215 (e.g., hard disk), communication interface 6220 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 6225 , such as cache, other memory, data storage and/or electronic display adapters.
- the memory 6210 , storage unit 6215 , interface 6220 and peripheral devices 6225 are in communication with the CPU 6205 through a communication bus (solid lines), such as a motherboard.
- the storage unit 6215 can be a data storage unit (or data repository) for storing data.
- the computer system 6201 can be operatively coupled to a computer network (“network”) 6230 with the aid of the communication interface 6220 .
- the network 6230 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
- the network 6230 in some cases is a telecommunication and/or data network.
- the network 6230 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the network 6230 in some cases with the aid of the computer system 6201 , can implement a peer-to-peer network, which may enable devices coupled to the computer system 6201 to behave as a client or a server.
- the CPU 6205 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 6210 .
- the instructions can be directed to the CPU 6205 , which can subsequently program or otherwise configure the CPU 6205 to implement methods of the present disclosure. Examples of operations performed by the CPU 6205 can include fetch, decode, execute, and writeback.
- the CPU 6205 can be part of a circuit, such as an integrated circuit.
- a circuit such as an integrated circuit.
- One or more other components of the system 6201 can be included in the circuit.
- the circuit is an application specific integrated circuit (ASIC).
- the storage unit 6215 can store files, such as drivers, libraries and saved programs.
- the storage unit 6215 can store user data, e.g., user preferences and user programs.
- the computer system 6201 in some cases can include one or more additional data storage units that are external to the computer system 6201 , such as located on a remote server that is in communication with the computer system 6201 through an intranet or the Internet.
- the computer system 6201 can communicate with one or more remote computer systems through the network 6230 .
- the computer system 6201 can communicate with a remote computer system of a user (e.g., the clinician).
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants.
- the user can access the computer system 6201 via the network 6230 .
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 6201 , such as, for example, on the memory 6210 or electronic storage unit 6215 .
- the machine executable or machine readable code can be provided in the form of software.
- the code can be executed by the processor 6205 .
- the code can be retrieved from the storage unit 6215 and stored on the memory 6210 for ready access by the processor 6205 .
- the electronic storage unit 6215 can be precluded, and machine-executable instructions are stored on memory 6210 .
- the code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
- aspects of the systems and methods provided herein can be embodied in programming.
- Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- a machine readable medium such as computer-executable code
- a tangible storage medium such as computer-executable code
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- RF radio frequency
- IR infrared
- Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data.
- Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 6201 can include or be in communication with an electronic display 6235 that comprises a user interface (UI) 6240 for providing, for example, an assessment to a patient.
- UI user interface
- Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface.
- Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
- An algorithm can be implemented by way of software upon execution by the central processing unit 6205 .
- the algorithm can, for example, analyze speech using natural language processing.
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Public Health (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Physics & Mathematics (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Molecular Biology (AREA)
- Veterinary Medicine (AREA)
- Psychiatry (AREA)
- Biophysics (AREA)
- Educational Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- Computational Linguistics (AREA)
- Hospice & Palliative Care (AREA)
- Child & Adolescent Psychology (AREA)
- Social Psychology (AREA)
- Psychology (AREA)
- Developmental Disabilities (AREA)
- General Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Medicinal Chemistry (AREA)
- General Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
Abstract
The present disclosure provides systems and methods for assessing a mental state of a subject in a single session or over multiple different sessions, using for example an automated module to present and/or formulate at least one query based in part on one or more target mental states to be assessed. The query may be configured to elicit at least one response from the subject. The query may be transmitted in an audio, visual, and/or textual format to the subject to elicit the response. Data comprising the response from the subject can be received. The data can be processed using one or more individual, joint, or fused models. One or more assessments of the mental state associated with the subject can be generated for the single session, for each of the multiple different sessions, or upon completion of one or more sessions of the multiple different sessions.
Description
- This application is a continuation application of Ser. No. 16/918,624, filed Jul. 1, 2020, which is a continuation of U.S. patent application Ser. No. 16/560,720, filed Sep. 4, 2019, now U.S. Pat. No. 10,748,644, which is a continuation of U.S. application Ser. No. 16/523,298, filed on Jul. 26, 2019, which is a continuation of U.S. International Application No. PCT/US2019/037953, filed on Jun. 19, 2019, which claims priority to U.S. Provisional Application No. 62/687,176, filed Jun. 19, 2018, U.S. Provisional Application No. 62/749,113, filed Oct. 22, 2018, U.S. Provisional Application No. 62/749,654, filed Oct. 23, 2018, U.S. Provisional Application No. 62/749,663, filed Oct. 23, 2018, U.S. Provisional Application No. 62/749,669, filed Oct. 23, 2018, U.S. Provisional Application No. 62/749,672, filed Oct. 24, 2018, U.S. Provisional Application No. 62/754,534, filed Nov. 1, 2018, U.S. Provisional Application No. 62/754,541, filed Nov. 1, 2018,U.S. Provisional Application No. 62/754,547, filed Nov. 1, 2018, U.S. Provisional Application No. 62/755,356, filed Nov. 2, 2018,U.S. Provisional Application No. 62/755,361, filed Nov. 2, 2018, U.S. Provisional Application No. 62/733,568, filed Sep. 19, 2018, U.S. Provisional Application No. 62/733,552, filed Sep. 19, 2018, each of which is incorporated herein by reference in their entirety for all purposes.
- Behavioral health is a serious problem. In the United States, suicide ranks in the top 10 causes of death as reported by the Center for Disease Control (CDC). Depression is the leading cause of disability worldwide, according to the World Health Organization (WHO). Screening for depression and other mental health disorders by doctors and health service providers is widely recommended. The current “gold standard” for screening or monitoring for depression in patients is the PHQ-9 (Patient Health Questionnaire 9), a written depression screening or monitoring test with nine (9) multiple-choice questions. Other similar assessment tests include the PHQ-2 and the Generalized Anxiety Disorder 7 (GAD-7).
- Many believe the PHQ-9 and other, similar screening or monitoring tools for detecting behavioral health diagnoses such as depression are inadequate. While the PHQ-9 is purported to successfully detect depression in 85-95% of patients, it is also purported that 54% of all suicides are committed by people with no diagnosis of depression. These two assertions appear entirely inconsistent with each other, screening or monitoring but, the reality is that not enough people are being screened.
- Part of the problem is that, traditional screening or monitoring surveys are not engaging due to their repetitive nature and lack of personalization. Another problem is that patients can be dishonest in their responses to the assessment tool, and the PHQ-9 and similar tools provide no mechanism by which dishonesty in the patient's responses can be assessed. Finally, it takes effort on the part of the clinician and the patient for these surveys, as some patients need assistance for their completion, and this disrupts both the clinician and patient workflows.
- The present disclosure provides systems and methods that can more accurately and effectively assess, screen, estimate, and/or monitor the mental state of human subjects, when compared to conventional mental health assessment tools. In one aspect, a method for assessing a mental state of a subject in a single session or over multiple different sessions is provided. The method can comprise using an automated module to present and/or formulate at least one query based in part on one or more target mental states to be assessed. The at least one query can be configured to elicit at least one response from the subject. The method may also comprise transmitting the at least one query in an audio, visual, and/or textual format to the subject to elicit the at least one response. The method may also comprise receiving data comprising the at least one response from the subject in response to transmitting the at least one query. The data can comprise speech data. The method may further comprise processing the data using one or more individual, joint, or fused models comprising a natural language processing (NLP) model, an acoustic model, and/or a visual model. The method may further comprise generating, for the single session, for each of the multiple different sessions, or upon completion of one or more sessions of the multiple different sessions, one or more assessments of the mental state associated with the subject.
- In some embodiments, the one or more individual, joint, or fused models may comprise a metadata model. The metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.
- In some embodiments, the at least one query can comprise a plurality of queries and the at least one response can comprise a plurality of responses. The plurality of queries can be transmitted in a sequential manner to the subject and configured to systematically elicit the plurality of responses from the subject. In some embodiments, the plurality of queries can be structured in a hierarchical manner such that each subsequent query of the plurality of queries is structured as a logical follow on to the subject's response to a preceding query, and can be designed to assess or draw inferences on a plurality of aspects of the mental state of the subject.
- In some embodiments, the automated module can be further configured to present and/or formulate the at least one query based in part on a profile of the subject.
- In some embodiments, the one or more target mental states can be selected from the group consisting of depression, anxiety, post-traumatic stress disorder (PTSD), schizophrenia, suicidality, and bipolar disorder.
- In some embodiments, the one or more target mental states can comprise one or more conditions or disorders associated or comorbid with a list of predefined mental disorders. The list of predefined mental disorders may include mental disorders as defined or provided in the Diagnostic and Statistical Manual of Mental Disorders. In some embodiments, the one or more associated or comorbid conditions or disorders can comprise fatigue, loneliness, low motivation, or stress.
- In some embodiments, the assessment can comprise a score that indicates whether the subject is (i) more likely than others to experience at least one of the target mental states or (ii) more likely than others to experience at least one of the target mental states at a future point in time. In some embodiments, the future point in time can be within a clinically actionable future.
- In some embodiments, the method can further comprise: transmitting the assessment to a healthcare provider to be used in evaluating the mental state of the subject. The transmitting can be performed in real-time during the assessment, just-in-time, or after the assessment has been completed.
- In some embodiments, the plurality of queries can be designed to test for or detect a plurality of aspects of the mental state of the subject.
- In some embodiments, the assessment can comprise a score that indicates whether the subject is (i) more likely than others to experience at least one of the target mental states or (ii) more likely than others to experience at least one of the target mental states at a future point in time. The score can be calculated based on processed data obtained from the subject's plurality of responses to the plurality of queries. In some embodiments, the score can be continuously updated with processed data obtained from each of the subject's follow-on response to a preceding query.
- In some embodiments, the method can further comprise based on the at least one response, identifying additional information to be elicited from the subject. The method can further comprise transmitting a subsequent query to the subject. The subsequent query relates to the additional information and can be configured to elicit a subsequent response from the subject. The method can further comprise receiving data comprising the subsequent response from the subject in response to transmitting the subsequent query. The method can further comprise processing the subsequent response to update the assessment of the mental state of the subject. In some embodiments, identifying additional information to be elicited from the subject can comprise: identifying (i) one or more elements of substantive content or (ii) one or more patterns in the data that are material to the mental state of the subject. The method can further comprise: for each of the one or more elements of substantive content or the one or more patterns: identifying one or more items of follow-up information that are related to the one or more elements or the one or more patterns to be asked of the subject, and generating a subsequent query. The subsequent query can relate to the one or more items of follow-up information.
- In some embodiments, the NLP model can be selected from the group consisting of a sentiment model, a statistical language model, a topic model, a syntactic model, an embedding model, a dialog or discourse model, an emotion or affect model, and a speaker personality model.
- In some embodiments, the data can further comprise images or video of the subject. The data can be further processed using the visual model to generate the assessment of the mental state of the subject. In some embodiments, the visual model can be selected from the group consisting of a facial cue model, a body movement/motion model, and an eye activity model.
- In some embodiments, the at least one query can be transmitted in a conversational context in a form of a question, statement, or comment that is configured to elicit the at least one response from the subject. In some embodiments, the conversational context can be designed to promote elicitation of truthful, reflective, thoughtful, or candid responses from the subject. In some embodiments, the conversational context can be designed to affect an amount of time that the subject takes to compose the at least one response. In some embodiments, the method can further comprise: transmitting one or more prompts in the audio and/or visual format to the subject when a time latency threshold is exceeded. In some embodiments, the conversational context can be designed to enhance one or more performance metrics of the assessment of the mental state of the subject. In some embodiments, the one or more performance metrics can be selected from the group consisting of an F1 score, an area under the curve (AUC), a sensitivity, a specificity, a positive predictive value (PPV), and an equal error rate.
- In some embodiments, the at least one query is not or need not be transmitted or provided in a format of a standardized test or questionnaire. In some embodiments, the at least one query can comprise subject matter that has been adapted or modified from a standardized test or questionnaire. In some embodiments, the standardized test or questionnaire can be selected from the group consisting of PHQ-9, GAD-7, HAM-D, and BDI. The standardized test or questionnaire can be another similar test or questionnaire for assessing a patient's mental health state.
- In some embodiments, the one or more individual, joint, or fused models can comprise a regression model.
- In some embodiments, the at least one query can be designed to be open-ended without limiting the at least one response from the subject to be a binary yes-or-no response.
- In some embodiments, the score can be used to calculate one or more scores with a clinical value.
- In some embodiments, the assessment can comprise a quantized score estimate of the mental state of the subject. In some embodiments, the quantized score estimate can comprise a calibrated score estimate. In some embodiments, the quantized score estimate can comprise a binary score estimate.
- In some embodiments, the plurality of queries can be represented as a series of edges and the plurality of responses can be represented as a series of nodes in a nodal network.
- In some embodiments, the mental state can comprise one or more medical, psychological, or psychiatric conditions or symptoms.
- In some embodiments, the method can be configured to further assess a physical state of the subject as manifested based on the speech data of the subject. The method can further comprise: processing the data using the one or more individual, joint, or fused models to generate an assessment of the physical state of the subject. The assessment of the physical state can comprise a score that indicates whether the subject is (i) more likely than others to experience at least one of a plurality of physiological conditions or (ii) more likely than others to experience at least one of the physiological conditions at a future point in time.
- In some embodiments, the physical state of the subject is manifested due to one or more physical conditions that affect a characteristic or a quality of voice of the subject.
- In some embodiments, the automated module can be a mental health screening module that can be configured to dynamically formulate the at least one query based in part on the one or more target mental states to be assessed.
- In some embodiments, the one or more individual, joint, or fused models can comprise a composite model that can be an aggregate of two or more different models.
- Another aspect of the present disclosure provides a non-transitory computer readable-medium comprising machine-executable instructions that, upon execution by one or more computer processors, implements any of the foregoing methods described in the above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and memory comprising machine-executable instructions that, upon execution by the one or more computer processors, implements any of the methods foregoing described in the above or elsewhere herein.
- Another aspect of the present disclosure provides a method for screening or monitoring a subject for, or diagnosing the subject with a mental health disorder. The method can comprise: transmitting at least one query to the subject. The at least one query can be configured to elicit at least one response from the subject. The method can further comprise receiving data comprising the at least one response from the subject in response to transmitting the at least one query. The data can comprise speech data. The method can further comprise processing the data using one or more individual, joint, or fused models comprising a natural language processing (NLP) model, an acoustic model, and/or a visual model to generate an output. The method can further comprise using at least the output to generate a score and a confidence level of the score. The score can comprise an estimate that the subject has the mental health disorder. The confidence level can be based at least in part on a quality of the speech data and represents a degree to which the estimate can be trusted.
- In some embodiments, the one or more individual, joint, or fused models can comprise a metadata model. The metadata model can be configured to use demographic information and/or a medical history of the subject to generate the one or more assessments of the mental state associated with the subject.
- In some embodiments, the output can comprise an NLP output, an acoustic output, and a visual output. In some embodiments, the NLP output, the acoustic output, and the visual output can each comprise a plurality of outputs corresponding to different time ranges of the data. In some embodiments, generating the score can comprise: (i) segmenting the NLP output, the acoustic output, and the visual output into discrete time segments, (ii) assigning a weight to each discrete time segment, and (iii) computing a weighted average of the NLP output, the acoustic output, and the visual output using the assigned weights. In some embodiments, the weights can be based at least on (i) base weights of the one or more individual, joint, or fused models (ii) a confidence level of each discrete time segment of the NLP output, the acoustic output, and the visual output.
- In some embodiments, the one or more individual, joint, or fused models can be interdependent such that each of the one or more individual, joint, or fused models is conditioned on an output of at least one other of the one or more individual, joint, or fused models.
- In some embodiments, generating the score can comprise fusing the NLP output, the acoustic output, and the visual output.
- In some embodiments, generating the confidence level of the score can comprise fusing (i) a confidence level of the NLP output with (ii) a confidence level of the acoustic output.
- In some embodiments, the method can further comprise converting the score into one or more scores with a clinical value.
- In some embodiments, the method can further comprise transmitting the one or more scores with a clinical value to the subject and/or a contact for the subject. In some embodiments, the method can further comprise transmitting the one or more scores with a clinical value to a healthcare provider for use in evaluating and/or providing care for a mental health of the subject. In some embodiments, the transmitting can comprise transmitting the one or more scores with a clinical value to the healthcare provider during the screening, monitoring, or diagnosing. In some embodiments, the transmitting can comprise transmitting the one or more scores with a clinical value to the healthcare provider or a payer after the screening, monitoring, or diagnosing has been completed.
- In some embodiments, the at least one query can comprise a plurality of queries, the at least one response can comprise a plurality of responses. Generating the score can comprise updating the score after receiving each of the plurality of responses, and the method can further comprise: converting the score to one or more scores with a clinical value after each of the updates. The method can further comprise transmitting the one or more scores with a clinical value to a healthcare provider after the converting.
- In some embodiments, the method can further comprise: determining that the confidence level does not satisfy a predetermined criterion, in real time and based at least in part on the at least one response, generating at least one additional query, and using the at least one additional query, repeating steps (a)-(d) until the confidence level satisfies the predetermined criterion.
- In some embodiments, the confidence level can be based on a length of the at least one response. In some embodiments, the confidence level can be based on an evaluated truthfulness of the one or more responses of the subject.
- In some embodiments, the one or more individual, joint, or fused models can be trained on speech data from a plurality of test subjects, wherein each of the plurality of test subjects has completed a survey or questionnaire that indicates whether the test subject has the mental health disorder. The confidence level can be based on an evaluated truthfulness of responses in the survey or questionnaire.
- In some embodiments, the method can further comprise extracting from the speech data one or more topics of concern of the subject using a topic model.
- In some embodiments, the method can further comprise generating a word cloud from the one or more topics of concern. The word cloud reflects changes in the one or more topics of concern of the subject over time. In some embodiments, the method can further comprise transmitting the one or more topics of concern to a healthcare provider, the subject, or both.
- In some embodiments, the video output can be assigned a higher weight than the NLP output and the acoustic output in generating the score when the subject is not speaking. In some embodiments, a weight of the video output in generating the score can be increased when the NLP output and the acoustic output indicate that a truthfulness level of the subject is below a threshold.
- In some embodiments, the video model can comprise one or more of a facial cue model, a body movement/motion model, and a gaze model.
- In some embodiments, the at least one query can comprise a plurality of queries and the at least one response can comprise a plurality of responses. The plurality of queries can be configured to sequentially and systematically elicit the plurality of responses from the subject. The plurality of queries can be structured in a hierarchical manner such that each subsequent query of the plurality of queries can be a logical follow on to the subject's response to a preceding query and can be designed to assess or draw inferences about different aspects of the mental state of the subject.
- In some embodiments, the at least one query can include subject matter that has been adapted or modified from a clinically-validated survey, test or questionnaire.
- In some embodiments, the acoustic model can comprise one or more of an acoustic embedding model, a spectral-temporal model, a supervector model, an acoustic affect model, a speaker personality model, an intonation model, a speaking rate model, a pronunciation model, a non-verbal model, or a fluency model.
- In some embodiments, the NLP model can comprise one or more of a sentiment model, a statistical language model, a topic model, a syntactic model, an embedding model, a dialog or discourse model, an emotion or affect model, or a speaker personality model.
- In some embodiments, the mental health disorder can comprise depression, anxiety, post-traumatic stress disorder, bipolar disorder, suicidality or schizophrenia.
- In some embodiments, the mental health disorder can comprise one or more medical, psychological, or psychiatric conditions or symptoms.
- In some embodiments, the score can comprise a score selected from a range. The range can be normalized with respect to a general population or to a specific population of interest.
- In some embodiments, the one or more scores with a clinical value can comprise one or more descriptors associated with the mental health disorder.
- In some embodiments, steps (a)-(d) as described above can be repeated at a plurality of different times to generate a plurality of scores. The method can further comprise: transmitting the plurality of scores and confidences to a computing device and graphically displaying, on the computing device, the plurality of scores and confidences as a function of time on a dashboard or other representation for one or more end users.
- In some embodiments, the quality of the speech data can comprise a quality of an audio signal of the speech data.
- In some embodiments, the quality of the speech data can comprise a measure of confidence of a speech recognition process performed on an audio signal of the speech data.
- In some embodiments, the method can be implemented for a single session. The score and the confidence level of the score can be generated for the single session.
- In some embodiments, the method can be implemented for and over multiple different sessions, and the score and the confidence level of the score can be generated for each of the multiple different sessions, or upon completion of one or more sessions of the multiple different sessions.
- Another aspect of the present disclosure provides a non-transitory computer readable-medium comprising machine-executable instructions that, upon execution by one or more computer processors, implements any of the methods described in the above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and memory comprising machine-executable instructions that, upon execution by the one or more computer processors, implements any of the methods described above or elsewhere herein.
- Another aspect of the present disclosure provides a method for processing speech and/or video data of a subject to identify a mental state of the subject. The method can comprise: receiving the speech and/or video data of the subject and using at least one processing technique to process the speech and/or video data to identify the mental state at (i) a reduced error rate of at least 10% lower or (ii) an accuracy of at least 10% higher, than a standardized mental health questionnaire or testing tool usable for identifying the mental state. The reduced error rate or the accuracy can be established relative to at least one or more benchmark standards usable by an entity for identifying or assessing one or more medical conditions comprising the mental state.
- In some embodiments, the entity can comprise one or more of the following: clinicians, healthcare providers, insurance companies, and government-regulated bodies. In some embodiments, the at least one or more benchmark standards can comprise at least one clinical diagnosis that has been independently verified to be accurate in identifying the mental state. In some embodiments, the speech data can be received substantially in real-time as the subject is speaking. In some embodiments, the speech data can be produced in an offline mode from a stored recording of the subject's speech.
- Another aspect of the present disclosure provides a method for processing speech data of a subject to identify a mental state of the subject. The method can comprise: receiving the speech data of the subject and using at least one processing technique to process the speech data to identify the mental state. The identification of the mental state is better according to one or more performance metrics as compared to a standardized mental health questionnaire or testing tool usable for identifying the mental state.
- In some embodiments, the one or more performance metrics can comprise a sensitivity or specificity, and the speech data can be processed according to a desired level of sensitivity or a desired level of specificity. In some embodiments, the desired level of sensitivity or the desired level of specificity can be defined based on criteria established by an entity. In some embodiments, the entity can comprise one or more of the following: clinicians, healthcare providers, personal caregivers, insurance companies, and government-regulated bodies.
- Another aspect of the present disclosure provides a method for processing speech data of a subject to identify or assess a mental state of the subject. The method can comprise: receiving the speech data of the subject, using one or more processing technique to process the speech data to generate one or more descriptors indicative of the mental state, and generating a plurality of visual elements of the one or more descriptors. The plurality of visual elements can be configured to be displayed on a graphical user interface of an electronic device of a user and usable by the user to identify or assess the mental state.
- In some embodiments, the user can be the subject. In some embodiments, the user can be a clinician or healthcare provider. In some embodiments, the one or more descriptors can comprise a calibrated or normalized score indicative of the mental state. In some embodiments, the one or more descriptors further can comprise a confidence associated with the calibrated or normalized score.
- Another aspect of the present disclosure provides a method for identifying, assessing, or monitoring a mental state of a subject. The method can comprise using a natural language processing algorithm, an acoustic processing algorithm, or a video processing algorithm to process data of the subject to identify or assess the mental state of a subject, the data comprising speech or video data of the subject, and outputting a report indicative of the mental state of the subject. The report can be transmitted to a user to be used for identifying, assessing, or monitoring the mental state.
- In some embodiments, the user can be the subject. In some embodiments, the user can be a clinician or healthcare provider. In some embodiments, the report can comprise a plurality of graphical visual elements. In some embodiments, the report can be configured to be displayed on a graphical user interface of an electronic device of the user. In some embodiments, the method can further comprise: updating the report in response to one or more detected changes in the mental state of the subject. In some embodiments, the report can be updated substantially in real time as the one or more detected changes in the mental state are occurring in the subject.
- Another aspect of the present disclosure provides a method for identifying whether a subject is at risk of a mental or physiological condition. The method can comprise: obtaining speech data from the subject and storing the speech data in computer memory, processing the speech data using in part natural language processing to identify one or more features indicative of the mental or physiological condition, and outputting an electronic report identifying whether the subject is at a risk of the mental or physiological condition, and the risk can be quantified in a form of a normalized score with a confidence level. The normalized score with the confidence level can be usable by a user to identify whether the subject is at a risk of the mental or physiological condition.
- In some embodiments, the user can be the subject. In some embodiments, the user can be a clinician or healthcare provider. In some embodiments, the report can comprise a plurality of graphical visual elements. In some embodiments, the report can be configured to be displayed on a graphical user interface of an electronic device of the user.
- Another aspect of the present disclosure provides a method for identifying, assessing, or monitoring a mental state or disorder of a subject. The method can comprise: receiving audio or audio-visual data comprising speech of the subject in computer memory and processing the audio or audio-visual data to identify, assess, monitor, or diagnose the mental state or disorder of the subject, which processing can comprise performing natural language processing on the speech of the subject.
- In some embodiments, the audio or audio-visual data can be received in response to a query directed to the subject. In some embodiments, the audio or audio-visual data can be from a prerecording of a conversation to which the subject can be a party. In some embodiments, the audio or audio-visual data can be from a prerecording of a clinical session involving the subject and a healthcare provider. In some embodiments, the mental state or disorder can be identified at a higher performance level compared to a standardized mental health questionnaire or testing tool. In some embodiments, the processing further can comprise using a trained algorithm to perform acoustic analysis on the speech of the subject.
- Another aspect of the present disclosure provides a method for estimating whether a subject has a mental condition and providing the estimate to a stakeholder. The method can comprise: obtaining speech data from the subject and storing the speech data in computer memory. The speech data can comprise responses to a plurality of queries transmitted in an audio and/or visual format to the subject. The method can further comprise selecting (1) a first model optimized for sensitivity in estimating whether the subject has the mental condition or (2) a second model optimized for specificity in estimating whether the subject has the mental condition. The method can further comprise processing the speech data using the selected first model or the second model to generate the estimate. The method can further comprise transmitting the estimate to the stakeholder.
- In some embodiments, the first model can be selected and the stakeholder can be a healthcare payer. In some embodiments, the second model can be selected and the stakeholder can be a healthcare provider.
- Another aspect of the present disclosure provides a system for determining whether a subject can be at risk of having a mental condition. The system can be configured to (i) receive the speech data from the memory and (ii) process the speech data using at least one model to determine that the subject is at risk of having the mental condition. The at least one model can be trained on speech data from a plurality of other test subjects who have a clinical determination of the mental condition. The clinical determinations may serve as labels for the speech data. The system can be configured to generate the estimate of the mental condition that is better according to one or more performance metrics as compared to a clinically-validated survey, test or questionnaire.
- In some embodiments, the system can be configured to generate the estimate of the mental condition with a higher specificity compared to the clinically-validated survey, test or questionnaire. In some embodiments, the system can be configured to generate the estimate of the mental condition with a higher sensitivity compared to the clinically-validated survey, test, or questionnaire. In some embodiments, the identification can be output while the subject is speaking. In some embodiments, the identification can be output via streaming or a periodically updated signal.
- Another aspect of the present disclosure provides a method for assessing a mental state of a subject. The method can comprise using an automated screening module to dynamically formulate at least one query based in part on one or more target mental states to be assessed. The at least one query can be configured to elicit at least one response from the subject. The method can further comprise transmitting the at least one query in an audio and/or visual format to the subject to elicit the at least one response. The method can further comprise receiving data comprising the at least one response from the subject in response to transmitting the at least one query. The data can comprise speech data. The method can further comprise processing the data using a composite model comprising at least one or more semantic models to generate an assessment of the mental state of the subject.
- Another aspect of the present disclosure provides a non-transitory computer readable medium comprising machine executable code that, upon execution by one or more computer processors, implements any of the methods above or elsewhere herein.
- Another aspect of the present disclosure provides a system comprising one or more computer processors and computer memory coupled thereto. The computer memory comprises machine executable code that, upon execution by the one or more computer processors, implements any of the methods above or elsewhere herein.
- Additional aspects and advantages of the present disclosure will become readily apparent to those skilled in this art from the following detailed description, wherein only illustrative embodiments of the present disclosure are shown and described. As will be realized, the present disclosure is capable of other and different embodiments, and its several details are capable of modifications in various obvious respects, all without departing from the disclosure. Accordingly, the drawings and description are to be regarded as illustrative in nature, and not as restrictive.
- All publications, patents, and patent applications mentioned in this specification are herein incorporated by reference to the same extent as if each individual publication, patent, or patent application was specifically and individually indicated to be incorporated by reference. To the extent publications and patents or patent applications incorporated by reference contradict the disclosure contained in the specification, the specification is intended to supersede and/or take precedence over any such contradictory material.
- The novel features of the present disclosure are set forth with particularity in the appended claims. A better understanding of the features and advantages of the present disclosure will be obtained by reference to the following detailed description that sets forth illustrative embodiments, in which the principles of the disclosure are utilized, and the accompanying drawings (also “Figure” and “FIG.” herein), of which:
-
FIG. 1A shows a health screening or monitoring system in which a health screening or monitoring server and a clinical data server computer system and a social data server cooperate to estimate a health state of a patient in accordance with the present disclosure; -
FIG. 1B shows an additional embodiment of the health screening or monitoring system fromFIG. 1A ; -
FIG. 2 shows a patient screening or monitoring system in which a web server and modeling server(s) cooperate to assess a state of a patient through a wide area network, in accordance with some embodiments; -
FIG. 3 shows a patient assessment system in which a real-time computer system, a modeling computer system, and a clinical and demographic data server computer system that cooperate to assess a state of a patient and report the assessed state to a clinician using a clinician device through a wide area network in accordance with the present disclosure. -
FIG. 4 is a block diagram of the health screening or monitoring server ofFIG. 1A in greater detail; -
FIG. 5 is a block diagram of interactive health screening or monitoring logic of the health screening or monitoring server ofFIG. 4 in greater detail; -
FIG. 6 is a block diagram of interactive screening or monitoring server logic of the interactive health screening or monitoring logic ofFIG. 5 in greater detail; -
FIG. 7 is a block diagram of generalized dialogue flow logic of the interactive screening or monitoring server logic ofFIG. 6 in greater detail; -
FIG. 8 is a logic flow diagram illustrating the control of an interactive spoken conversation with the patient by the generalized dialogue flow logic in accordance with the present disclosure; -
FIG. 9 is a block diagram of a question and adaptive action bank of the generalized dialogue flow logic ofFIG. 7 in greater detail; -
FIG. 10 is a logic flow diagram of a step ofFIG. 8 in greater detail; -
FIG. 11 is a block diagram of question management logic of the question and adaptive action bank ofFIG. 9 in greater detail; -
FIG. 12 is a logic flow diagram of determination of the quality of a question in accordance with the present disclosure; -
FIG. 13 is a logic flow diagram of determination of the equivalence of two questions in accordance with the present disclosure; -
FIG. 14 is a logic flow diagram illustrating the control of an interactive spoken conversation with the patient by the real-time system in accordance with the present disclosure; -
FIGS. 15 and 16 are each a logic flow diagram of a respective step ofFIG. 14 in greater detail. -
FIG. 17 is a transaction flow diagram showing an illustrative example of a spoken conversation with, and controlled by, the real-time system ofFIG. 3 . -
FIG. 18 is a block diagram of runtime model server logic of the interactive health screening or monitoring logic ofFIG. 3 in greater detail; -
FIG. 19 is a block diagram of model training logic of the interactive health screening or monitoring logic ofFIG. 1A in greater detail; -
FIG. 20A shows a greater detailed block diagram of the patient screening or monitoring system, in accordance with some embodiments; -
FIG. 20B provides a block diagram of the runtime model server(s), in accordance with some embodiments; -
FIG. 21 provides a block diagram of the model training server(s), in accordance with some embodiments; -
FIG. 22 shows the real-time computer system and the modeling computer system ofFIG. 3 in greater detail, including a general flow of data. -
FIG. 23A provides a block diagram of the acoustic model, in accordance with some embodiments; -
FIG. 23B shows an embodiment ofFIG. 23A including an acoustic modeling block; -
FIG. 23C shows a score calibration and confidence module; -
FIG. 24 provides a simplified example of the high level feature representor of the acoustic model, for illustrative purposes; -
FIG. 25 provides a block diagram of the Natural Language Processing (NLP) model, in accordance with some embodiments; -
FIG. 26 provides a block diagram of the visual model, in accordance with some embodiments; -
FIG. 27 provides a block diagram of the descriptive features, in accordance with some embodiments; -
FIG. 28 provides a block diagram of the interaction engine, in accordance with some embodiments; -
FIG. 29 is a logic flow diagram of the example process of testing a patient for a mental health condition, in accordance with some embodiments; -
FIG. 30 is a logic flow diagram of the example process of model training, in accordance with some embodiments; -
FIG. 31 is a logic flow diagram of the example process of model personalization, in accordance with some embodiments; -
FIG. 32 is a logic flow diagram of the example process of client interaction, in accordance with some embodiments; -
FIG. 33 is a logic flow diagram of the example process of classifying the mental state of the client, in accordance with some embodiments; -
FIG. 34 is a logic flow diagram of the example process of model conditioning, in accordance with some embodiments; -
FIG. 35 is a logic flow diagram of the example process of model weighting and fusion, in accordance with some embodiments; -
FIG. 36 is a logic flow diagram of the example simplified process of acoustic analysis, provided for illustrative purposes only; -
FIG. 37 is a block diagram showing speech recognition logic of the modeling computer system in greater detail; -
FIG. 38 is a block diagram showing language model training logic of the modeling computer system in greater detail; -
FIG. 39 is a block diagram showing language model logic of the modeling computer system in greater detail; -
FIG. 40 is a block diagram showing acoustic model training logic of the modeling computer system in greater detail; -
FIG. 41 is a block diagram showing acoustic model logic of the modeling computer system in greater detail; -
FIG. 42 is a is a block diagram showing visual model training logic of the modeling computer system in greater detail; -
FIG. 43 is a block diagram showing visual model logic of the modeling computer system in greater detail; -
FIG. 44 is a block diagram of a screening or monitoring system data store of the interactive health screening or monitoring logic ofFIG. 1A in greater detail; -
FIG. 45 shows a health screening or monitoring system in which a health screening or monitoring server estimates a health state of a patient by passively listening to ambient speech in accordance with the present disclosure; -
FIG. 46 is a logic flow diagram illustrating the estimation a health state of a patient by passively listening to ambient speech in accordance with the present disclosure; -
FIG. 47 is a logic flow diagram illustrating the estimation a health state of a patient by passively listening to ambient speech in accordance with the present disclosure. -
FIG. 48 is a block diagram of health care management logic of the health screening or monitoring server ofFIG. 4 in greater detail. -
FIGS. 49 and 50 are respective block diagrams of component conditions and actions of work-flows of the health care management logic ofFIG. 48 . -
FIG. 51 is a logic flow diagram of the automatic formulation of a work-flow of the health care management logic ofFIG. 48 in accordance with the present disclosure; -
FIG. 52 is a block diagram of the real-time computer system ofFIG. 3 in greater detail; -
FIG. 53 is a block diagram of the modeling computer system ofFIG. 3 in greater detail; -
FIG. 54 is a block diagram of the health screening or monitoring server ofFIG. 1A in greater detail. -
FIGS. 55 and 56 provide example illustrations of spectrograms of an acoustic signal used for analysis, in accordance with some embodiments; -
FIGS. 57 and 58 are example illustrations of a computer system capable of embodying the current disclosure; -
FIG. 59 shows a precision case management use case for the system; -
FIG. 60 shows a primary care screening or monitoring use case for the system; -
FIG. 61 shows a system for enhanced employee assistance plan (EAP) navigation and triage; and -
FIG. 62 shows a computer system that is programmed or otherwise configured to assess a mental state of a subject in a single session or over multiple different sessions. - While various embodiments of the invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous variations, changes, and substitutions may occur to those skilled in the art without departing from the invention. It should be understood that various alternatives to the embodiments of the invention described herein may be employed.
- Aspects, features and advantages of exemplary embodiments of the present invention will become better understood with regard to the following description in connection with the accompanying drawing(s). It should be apparent to those skilled in the art that the described embodiments of the present invention provided herein are illustrative only and not limiting, having been presented by way of example only. All features disclosed in this description may be replaced by alternative features serving the same or similar purpose, unless expressly stated otherwise. Therefore, numerous other embodiments of the modifications thereof are contemplated as falling within the scope of the present invention as defined herein and equivalents thereto.
- Henceforth, use of absolute and/or sequential terms, such as, for example, “will,” “will not,” “shall,” “shall not,” “must,” “must not,” “first,” “initially,” “next,” “subsequently,” “before,” “after,” “lastly,” and “finally,” are not meant to limit the scope of the present invention as the embodiments disclosed herein are merely exemplary. The present invention relates to health screening or monitoring systems, and, more particularly, to a computer-implemented mental health screening or monitoring tool with significantly improved accuracy and efficacy by leveraging language analysis, visual cues and acoustic analysis. In this application, the specifics of improved acoustic, visual and speech analysis techniques are described as they pertain to the classification of a respondent as being depressed, or other mental state of interest. While much of the following disclosure will focus largely on assessing depression in a patient, the systems and methods described herein may be equally adept at screening or monitoring a user for a myriad of mental and physical ailments. For example, bipolar disorder, anxiety, and schizophrenia are examples of mental ailments that such a system may be adept at screening or monitoring for. It is also possible that physical ailments may be assessed utilizing such systems. It should be understood that while this disclosure may focus heavily upon depression screening or monitoring, this is not limiting. Any suitable mental or physical ailment may be screened using the disclosed systems and methods.
- The systems and methods disclosed herein may use natural language processing (NLP) to perform semantic analysis on patient speech utterances. Semantic analysis, as disclosed herein, may refer to analysis of spoken language from patient responses to assessment questions or captured conversations, in order to determine the meaning of the spoken language for the purpose of conducting a mental health screening or monitoring of the patient. The analysis may be of words or phrases, and may be configured to account for primary queries or follow-up queries. In the case of captured human-human conversations, the analysis may also apply to the speech of the other party. As used herein, the terms “semantic analysis” and “natural language processing (NLP)”may be used interchangeably. Semantic analysis may be used to determine the meanings of utterances by patients, in context. It may also be used to determine topics patients are speaking about.
- A mental state, as described herein, may be distinguished from an emotion or feeling, such as happiness, sadness, or anger. A mental state may include one or more feelings in combination with a philosophy of mind, including how a person perceives objects of his or her environment and the actions of other people toward him or her. While feelings may be transient, a mental state may describe a person's overarching disposition or mood, even in situations where the person's feelings may change. For example, a depressed person may variously feel, at different times, happy, sad, or angry.
- In accordance with one or more embodiments of the present invention, a server computer system (health screening or
monitoring server 102—FIG. 1A ) may apply a health state screening or monitoring test to a human patient using a client device (patient device 112), by engaging the patient in an interactive spoken conversation and applying a composite model, that may combine language, acoustic, metadata, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue. While the general subject matter of the screening or monitoring test may be similar to subject matter of standardized depression screening or monitoring tests such as the PHQ-9, the composite model may analyze, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient and (ii) estimate the patient's health. Appendix A illustrates an exemplary implementation that includes Calendaring, SMS, Dialog, Calling and User Management Services. While the latter goal is primary, the former goal is a significant factor in achieving the latter. Truthfulness of the patient in answering questions posed by the screening or monitoring test is critical in assessing the patient's mood. Health screening ormonitoring server 102 encourages patient honesty. - First, the spoken conversation may provide the patient with less time to compose a disingenuous response to a question rather than simply responding honestly to the question. Second, the conversation may feel, to the patient, more spontaneous and personal and may be less annoying to the patient than a generic questionnaire, as would be provided by, for example, simply administering the PHQ-9. Accordingly, the spoken conversation may not induce or exacerbate resentment in the patient for having to answer a questionnaire for the benefit of a doctor or other clinician. Third, the spoken conversation may be adapted in progress to be responsive to the patient, reducing the patient's annoyance with the screening or monitoring test and, in some situations, shortening the screening or monitoring test. Fourth, the screening or monitoring test as administered by health screening or
monitoring server 102 additionally may rely on non-verbal aspects of the conversation in addition to the verbal content of the conversation to assess depression in the patient. As shown inFIG. 1A , health screening or monitoring system 100 may include health screening ormonitoring server 102, acall center system 104, aclinical data server 106, asocial data server 108, apatient device 112, and aclinician device 114 that are connected to one another though a wide area network (WAN) 110, that is the Internet in this illustrative embodiment. In this illustrative embodiment,patient device 112 may also reachable bycall center system 104 through a public-switched telephone network (PSTN) 120 or directly. Health screening ormonitoring server 102 may be a server computer system that administers the health screening or monitoring test with the patient throughpatient device 112 and combines a number of language, acoustic, and visual models to produce results 1820 (FIG. 18 ), using clinical data retrieved fromclinical data server 106, social data retrieved fromsocial data server 108, and patient data collected from past screenings or monitoring to train the models of runtime model server 304 (FIG. 18 ). Clinical data server 106 (FIG. 1A ) may be a server computer system that makes available clinical or demographic data of the patient, including diagnoses, medication information, etc., available, e.g., to health screening ormonitoring server 102, in a manner that is compliant with HIPAA (Health Insurance Portability and Accountability Act of 1996) and/or any other privacy and security policies and regulations such as GDPR andSOC 2.Social data server 106 may be a server computer system that makes social data of the patient, including social media posts, online purchases, searches, etc., available, e.g., to health screening ormonitoring server 102.Clinician device 114 may be a client device that receives data representing results of the screening or monitoring regarding the patient's health from health screening ormonitoring server 102. - The system may be used to assess the mental state of the subject in a single session or over multiple sessions. Subsequent sessions may be informed by assessment results from prior assessments. This may be done by providing assessment data as inputs to machine learning algorithms or other analysis methods for the subsequent assessments. Each session may generate one or more assessments. Individual assessments may also compile data from multiple sessions.
-
FIG. 1B shows an additional embodiment of the health screening or monitoring system fromFIG. 1A .FIG. 1B illustrates a conversation betweenpatient 120 andclinician 130. Theclinician 130 may record one or more speech samples from thepatient 120 and upload them to thewide area network 110, with the consent of thepatient 120. The speech samples may be analyzed by one or more machine learning algorithms, described elsewhere herein. -
FIG. 2 provides an additional embodiment of a health screening or monitoring system. Health screening ormonitoring system 200 may apply a health state screening or monitoring test to a human patient using a client device (clients 260 a-n), by engaging the patient in an interaction and applying a composite model that combines language, acoustic, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue. While the general subject matter of the screening or monitoring test may be similar to the subject matter of standardized depression screening or monitoring tests such as the PHQ-9, the composite model can be configured to analyze, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient, (ii) estimate the patient's mental health, and (iii) provide a judgment free and less embarrassing experience for the patient, who may already be suffering from anxiety and other mental barriers to receiving proper screening or monitoring from a clinician. - It should be noted that throughout this disclosure a series of terms may be used interchangeably, and this usage is not intended to limit the scope of the disclosure in any manner. For example, the terms “patient”, “client”, “subject”, “respondent” and “user” may all be employed interchangeably to refer to the individual being screened for the mental health conditions and/or the device being utilized by this individual to collect and transmit the audio and visual data that is used to screen them. Likewise, “semantic analysis” and “NLP” may be used interchangeably to reference natural language processing models and elements. In a similar manner, “stakeholders” is employed to refer to a wide variety of interested third parties who are not the patient being screened. These stakeholders may include physicians, health care providers, care team members, insurance companies, research organizations, and family/relatives of the patient, hospitals, crisis centers and the like. It should thus be understood that when another label is employed, such as “physician”, the intention in this disclosure is to reference any number of stakeholders.
- The health screening or
monitoring system 200 includes a backend infrastructure designed to administer the screening or monitoring interaction and analyze the results. This includes one ormore model servers 230 coupled to aweb server 240. Theweb server 240 and model server(s) 230 leverage user data 220 which is additionally populated by clinical andsocial data 210. The clinical data portion may be compiled from the healthcare providers, and may include diagnoses, vital information (age, weight, height, blood chemistry, etc.), diseases, medications, lists of clinical encounters (hospitalizations, clinic visits, Emergency Department visits), clinician records, and the like. This clinical data may be compiled from one or more electronic health record (EHR) systems or Health Information Exchanges (HIE) by way of a secure application protocol, extension or socket. Social data may include information collected from a patient's social networks, including social media postings, from databases detailing patient's purchases, and from databases containing patient's economic, educational, residential, legal and other social determinants. This information may be compiled together with additional preference data, metadata, annotations, and voluntarily supplied information, to populate the user database 220. Themodel server 230 andweb server 240 are additionally capable of populating and/or augmenting the user data 220 with preferences, extracted features and the like. - The backend infrastructure communicates with the clients 260 a-n via a
network infrastructure 250. Commonly this network may include the internet, a corporate local area network, private intranet, cellular network, or some combination of these. The clients 260 a-n include a client device of a person being screened, which accesses the backend screening or monitoring system and includes a microphone and camera for audio and video capture, respectively. The client device may be a cellular phone, tablet, laptop or desktop equipped with a microphone and optional camera, smart speaker in the home or other location, smart watch with a microphone and optional camera, or a similar device. - A client device may collect additional data, such as biometric data. For example, smart watches and fitness trackers already have the capability of measuring motion, heart rate and sometimes respiratory rate and blood oxygenation levels and other physiologic parameters. Future smart devices may record conductivity measurements for tracking perspiration, pH changes in the skin, and other chemical or hormonal changes. Client devices may operate in concert to collect data. For example, a phone may capture the audio and visual data while a Bluetooth paired fitness tracker may provide body temperature, pulse rate, respiratory rate and movement data simultaneously.
- All of the collected data for each client 260 a-n is provided back to the
web server 240 via thenetwork infrastructure 250. After processing, results are provided back to the client 260 a-n for consumption, and when desired for sharing with one or more stakeholders 270 a-n associated with the given client 260 a-n, respectively. In this example figure, the stakeholders 270 a-n are illustrated as being in direct communication with their respective clients 260 a-n. While in practice this may indeed be possible, often the stakeholder 270 a-n will be capable of direct access to the backend screening or monitoring system via thenetwork infrastructure 250 andweb server 240, without the need to use the client 260 a-n as an intermediary.FIG. 2 provides the present arrangement, however, to more clearly illustrate that each client 260 a-n may be associated with one or more stakeholders 270 a-n, which may differ from any other client's 260 a-n stakeholders 270 a-n. - In another embodiment of the screening or monitoring system, a server computer system (real-
time system 302—FIGS. 3 and 22 ) applies a depression assessment test to a human patient using a client device (portable device 312), by engaging the patient in an interactive spoken conversation and applying a composite model, that combines language, acoustic, and visual models, to a captured audiovisual signal of the patient engaged in the dialogue. While the general subject matter of the assessment test may incorporate queries including subject matter similar to questions asked in standardized depression assessment tests such as the PHQ-9, the assessment does not merely include analysis of answers to survey questions. In fact, the screening or monitoring system's composite model analyzes, in real time, the audiovisual signal of the patient (i) to make the conversation more engaging for the patient and (ii) assess the patient's mental health. - While the latter goal is the primary goal, the former goal is a significant factor in achieving the latter. Truthfulness of the patient in answering questions posed by the assessment test is critical in assessing the patient's mood. Real-
time system 302 encourages honesty of the patient in a number of ways. First, the spoken conversation provides the patient with less time to compose a response to a question rather than it would take to simply respond honestly to the question. - Second, the conversation feels, to the patient, more spontaneous and personal and is less annoying to the patient than an obviously generic questionnaire. Accordingly, the spoken conversation does not induce or exacerbate resentment in the patient for having to answer a questionnaire before seeing a doctor or other clinician. Third, the spoken conversation is adapted in progress to be responsive to the patient, reducing the patient's annoyance with the assessment test and, in some situations, shortening the assessment test. Fourth, the assessment test as administered by real-
time system 302 may rely more on non-verbal aspects of the conversation and the patient than on the verbal content of the conversation to assess depression in the patient. - As shown in (
FIG. 3 ),patient assessment system 300 includes real-time system 302, a modeling system 304, aclinical data server 306, apatient device 312, and aclinician device 314 that are connected to one another though a wide area network (WAN) 310, that is the Internet in this illustrative embodiment. Real-time system 302 is a server computer system that administers the depression assessment test with the patient throughpatient device 312. Modeling system 304 is a server computer system that combines a number of language, acoustic, and visual models to produce a composite model 2204 (FIG. 22 ), using clinical data retrieved fromclinical data server 306 and patient data collected from past assessments to train composite model 2204. Clinical data server 306 (FIG. 3 ) is a server computer system that makes clinical data of the patient, including diagnoses, medication information, etc., available, e.g., to modeling system 304, in a manner that is compliant with HIPAA (Health Insurance Portability and Accountability Act of 1996) and/or any other privacy and security policies and regulations such as GDPR andSOC 2.Clinician device 314 is a client device that receives data representing a resulting assessment regarding depression from real-time system 302. - The systems disclosed herein may provide medical care professionals with a prediction of a mental state of a patient. The mental state may be depression, anxiety, or another mental condition. The systems may provide the medical care professionals with additional information, outside of the mental state prediction. The system may provide demographic information, such as age, weight, occupation, height, ethnicity, medical history, psychological history, and gender to medical care professionals via client devices, such as the client devices 260 a-n of
FIG. 2 . The system may provide information from online systems or social networks to which the patient may be registered. The patient may opt in, by setting permissions on his or her client device to provide this information before the screening or monitoring process begins. The patient may also be prompted to enter demographic information during the screening or monitoring process. Patients may also choose to provide information from their electronic health records to medical care professionals. In addition, medical care professionals may interview patients during or after a screening or monitoring event to obtain the demographic information. During registration for screening or monitoring, patients may also enter information that specifies or constraints their interests. For example, they may enter topics that they do and/or do not wish to speak about. In this disclosure, the terms “medical care provider” and “clinician” are used interchangeably. Medical care providers may be doctors, nurses, physician assistants, nursing assistants, clinical psychologists, social workers, technicians, or other health care providers. - A clinician may set up the mental health assessment with the patient. This may include choosing a list of questions for the system to ask the patient, including follow-up questions. The clinician may add or remove specific questions from the assessment, or change an order in which the questions are administered to the patient. The clinician may be available during the assessment as a proctor, in order to answer any clarifying questions the patient may have.
- The system may provide the clinician with the dialogue between itself and the patient. This dialogue may be a recording of the screening or monitoring process, or a text transcript of the dialogue. The system may provide a summary of the dialogue between itself and the patient, using semantic analysis to choose segments of speech that were most important to predicting the mental state of the patient. These segments may be selected because they might be highly weighted in a calculation of a binary or scaled score indicating a mental state prediction, by example. The system may incorporate such a produced score into a summary report for the patient, along with semantic context taken from a transcript of the interview with the patient.
- The system may additionally provide the clinician with a “word cloud” or “topic cloud” extracted from a text transcript of the patient's speech. A word cloud may be a visual representation of individual words or phrases, with words and phrases used most frequently designated using larger font sizes, different colors, different fonts, different typefaces, or any combination thereof. Depicting word or phrase frequency in such a way may be helpful as depressed patients commonly say particular words or phrases with larger frequencies than non-depressed patients. For example, depressed patients may use words or phrases that indicate dark, black, or morbid humor. They may talk about feeling worthless or feeling like failures, or use absolutist language, such as “always”, “never”, or “completely.” Depressed patients may also use a higher frequency of first-person singular pronouns (e.g., “I”, “me”) and a lower frequency of second- or third-person pronouns when compared to the general population. The system may be able to train a machine learning algorithm to perform semantic analysis of word clouds of depressed and non-depressed people, in order to be able to classify people as depressed or not depressed based on their word clouds. Word cloud analysis may also be performed using unsupervised learning. For example, the system may analyze unlabeled word clouds and search for patterns, in order to separate people into groups based on their mental states.
- The systems described herein can output an electronic report identifying whether a patient is at risk of a mental or physiological condition. The electronic report can be configured to be displayed on a graphical user interface of a user's electronic device. The electronic report can include a quantification of the risk of the mental or physiological condition, e.g., a normalized score. The score can be normalized with respect the entire population or with respect to a sub-population of interest. The electronic report can also include a confidence level of the normalized score. The confidence level can indicate the reliability of the normalized score (i.e., the degree to which the normalized score can be trusted).
- The electronic report can include visual graphical elements. For example, if the patient has multiple scores from multiple screening or monitoring sessions that occurred at several different times, the visual graphical element may be a graph that shows the progression of the patient's scores over time.
- The electronic report may be output to the patient or a contact person associated with the patient, a healthcare provider, a healthcare payer, or another third-party. The electronic report can be output substantially in real-time, even while the screening, monitoring, or diagnosis is ongoing. In response to a change in the normalized score or confidence during the course of the screening, monitoring, or diagnosis, the electronic report can be updated substantially in real-time and be re-transmitted to the user.
- In some cases, the electronic report may include one or more descriptors about the patient's mental state. The descriptors can be a qualitative measure of the patient's mental state (e.g., “mild depression”). Alternatively or additionally, the descriptors can be topics that the patient mentioned during the screening. The descriptors can be displayed in a graphic, e.g., a word cloud.
- The models described herein may be optimized for a particular purpose or based on the entity that may receive the output of the system. For example, the models may be optimized for sensitivity in estimating whether a patient has a mental condition. Healthcare payers such as insurance companies may prefer such models so that they can minimize the number of insurance payments made to patients with false positive diagnoses. In other cases, the models may be optimized for specificity in estimating whether a patient has a mental condition. Healthcare providers may prefer such models. The system may select the appropriate model based on the stakeholder to which the output will be transmitted. After processing, the system can transmit the output to the stakeholder.
- The models described herein can alternatively be tuned or configured to process speech and other data according to a desired level of sensitivity or a desired level of specificity determined by a clinician, healthcare provider, insurance company, or government regulated body.
- The system may be used to monitor teenagers for depression. The system may perform machine learning analysis on groups of teenagers in order to determine voice-based biomarkers that may uniquely classify teenagers as being at risk for depression. Depression in teenagers may have different causes than in adults. Hormonal changes may also introduce behaviors in teenagers that would be atypical for adults. A system for screening or monitoring teenagers would need to employ a model tuned to recognize these unique behaviors. For example, depressed or upset teenagers may be more prone to anger and irritability than adults, who may withdraw when upset. Thus, questions from assessments may elicit different voice-based biomarkers from teenagers than adults. Different screening or monitoring methods may be employed when testing teenagers for depression, or studying teenagers' mental states, than are employed for screening or monitoring adults. Clinicians may modify assessments to particularly elicit voice-based biomarkers specific to depression in teenagers. The system may be trained using these assessments, and determine a teenager-specific model for predicting mental states. Teenagers may further be segmented by household (foster care, adoptive parent(s), two biological parents, one biological parent, care by guardian/relative, etc.), medical history, gender, age (old vs. young teenager), and socioeconomic status, and these segments may be incorporated into the model's predictions.
- The system may also be used to monitor the elderly for depression and dementia. The elderly may also have particular voice-based biomarkers that younger adults may not have. For example, the elderly may have strained or thin voices, owing to aging. Elderly people may exhibit aphasia or dysarthria, have trouble understanding survey questions, follow-ups, or conversational speech, and may use repetitive language. Clinicians may develop, or algorithms may be used to develop, surveys for eliciting particular voice-based biomarkers from elderly patients. Machine learning algorithms may be developed to predict mental states in elderly patients, specifically, by segmenting patients by age. Differences may be present in elderly patients from different generations (e.g., Greatest, Silent, Boomer), who may have different views on gender roles, morality, and cultural norms. Models may be trained to incorporate elder age brackets, gender, race, socioeconomic status, physical medical conditions, and family involvement.
- The system may be used to test airline pilots for mental fitness. Airline pilots have taxing jobs, and may experience large amounts of stress and fatigue on long flights. Clinicians or algorithms may be used to develop screening or monitoring methods for these conditions. For example, the system may base an assessment off of queries similar to those tested in the Minnesota Multiphasic Personality Inventory (MMPI) and MMPI-2.
- The system may also be used to screen military personnel for mental fitness. For example, the system may implement an assessment that uses queries with similar subject matter to those asked on the Primary Care Post-Traumatic Stress Disorder for Diagnostic and Statistical Manual of Mental Disorders (DSM)-5 (PC-PTSD-5) to test for PTSD. In addition to PTSD, the system may screen military personnel for depression, panic disorder, phobic disorder, anxiety, and hostility. The system may employ different surveys to screen military personnel pre-and post-deployment. The system may segment military personnel by segmenting for occupation, and segment military personnel by branch, officer or enlisted, gender, age, ethnicity, number of tours/deployments, marital status, medical history, and other factors.
- The system may be used to evaluate prospective gun buyers, e.g., by implementing background checks. Assessments may be designed, by clinicians or algorithmically, to evaluate prospective buyers for mental fitness for owning a firearm. The survey may have a requirement to determine, using questions and follow-up questions, if a prospective gun buyer would be able to be certified as a danger to him or herself or others, by a court or other authority.
- Health screening or monitoring server 102 (
FIG. 1A ) is shown in greater detail inFIG. 4 and in even greater detail inFIG. 22 . As shown inFIG. 4 , health screening ormonitoring server 102 includes interactive health screening ormonitoring logic 402 and healthcare management logic 408. In addition, health screening ormonitoring server 102 includes screening or monitoringsystem data store 410 andmodel repository 416. - Each of the components of health screening or
monitoring server 102 is herein described more completely. Briefly, interactive health screening ormonitoring logic 402 conducts an interactive conversation with the subject patient and estimates one or more health states of the patient by application of the models of runtime model server 504 (FIG. 18 ) to audiovisual signals representing responses by the patient. In this illustrative embodiment, interactive health screening or monitoring logic 402 (FIG. 4 ) may also operate in a passive listening mode, observing the patient outside the context of an interactive conversation with health screening ormonitoring server 102, e.g., during a session with a health care clinician, and estimating a health state of the patient from such observation. Healthcare management logic 408 makes expert recommendations in response to health state estimations of interactive health screening ormonitoring logic 402. Screeningsystem data store 410 stores and maintains all user and patient data needed for, and collected by, screening or monitoring in the manner described herein. - The conversational context of the health screening or monitoring system may improve one or more performance metrics associated with one or more machine learning algorithms used by the system. These metrics may include metrics such as an F1 score, an area under the curve (AUC), a sensitivity, a specificity, a positive predictive value (PPV), and an equal error rate.
- It should be appreciated that the behavior of health screening or
monitoring server 102 described herein may be distributed across multiple computer systems. For example, in some illustrative embodiments, real-time, interactive behavior of health screening or monitoring server 102 (e.g., interactive screening ormonitoring server logic 502 and runtimemodel server logic 504 described below) is implemented in one or more servers configured to handle large amounts of traffic through WAN 110 (FIG. 1A ) and computationally intensive behavior of health screening or monitoring server 102 (e.g., healthcare management logic 408 and model training logic 506) is implemented in one or more other servers configured to efficiently perform highly complex computation. Distribution of various loads carried by health screening ormonitoring server 102 may be distributed among multiple computer systems. - Interactive health screening or
monitoring logic 402 is shown in greater detail inFIG. 5 . Interactive health screening ormonitoring logic 402 includes interactive screening ormonitoring server logic 502, runtimemodel server logic 504, andmodel training logic 506. Interactive screening ormonitoring server logic 502 conducts an interactive screening or monitoring conversation with the human patient; runtimemodel server logic 504 uses and adjusts a number of machine learning models to concurrently evaluate responsive audiovisual signals of the patient; andmodel training logic 506 trains models of runtimemodel server logic 504. - Interactive screening or
monitoring server logic 502 is shown in greater detail inFIG. 6 and includes generalizeddialogue flow logic 602 and input/output (I/O)logic 604. I/O logic 604 affects the interactive screening or monitoring conversation by sending audiovisual signals to, and receiving audiovisual signals from,patient device 112. I/O logic 604 receives data from generalizeddialogue flow logic 602 that specifies questions to be asked of the patient and sends audiovisual data representing those questions topatient device 112. In embodiments in which the interactive screening or monitoring conversation is effected through PSTN 120 (FIG. 1 ), I/O logic 604 (i) sends an audiovisual signal topatient device 112 by sending data to a human, or automated, operator ofcall center 104 prompting the operator to ask a question in a telephone call with patient device 112 (or alternatively by sending data to a backend automated dialog system destined for patients) and (ii) receives an audiovisual signal frompatient device 112 by receiving an audiovisual signal of the interactive screening or monitoring conversation forwarded bycall center 104. I/O logic 604 also sends at least portions of the received audiovisual signal of the interactive screening or monitoring conversation to runtime model server logic 504 (FIG. 18 ) and model training logic 506 (FIG. 19 ). - The queries asked to patients, or questions, may be stored as nodes, while patient responses, collected as audiovisual signals, may be stored as edges. A screening or monitoring event, or set of screening or monitoring events, for a particular patient, may be therefore represented as a graph. For example, different answers to different follow-up questions may be represented as multiple spokes connecting a particular node to a plurality of other nodes. Different graph structures for different patients may be used as training examples for a machine learning algorithm as another method of determining a mental state classification for a patient. Classification may be performed by determining similarities between graphs of, for example, depressed patients. Equivalent questions, as discussed herein, may be labeled as such within the graph. Thus, the graphs may also be studied and analyzed to determine idiosyncrasies in interpretations of different versions of questions by patients.
- I/
O logic 604 also receives results 1820 (FIG. 18 ) fromruntime server logic 504 that represent evaluation of the audiovisual signal. Generalizeddialogue flow logic 602 conducts the interactive screening or monitoring conversation with the human patient. Generalizeddialogue flow logic 602 determines what questions I/O logic 604 should ask of the patient and monitors the reaction of the patient as represented inresults 1820. In addition, generalizeddialogue flow logic 602 determines when to politely conclude the interactive screening or monitoring conversation. - Generalized
dialogue flow logic 602 is shown in greater detail inFIG. 5 . Generalizeddialogue flow logic 602 includes interactioncontrol logic generator 702. Interactioncontrol logic generator 702 manages the interactive screening or monitoring conversation with the patient by sending data representing dialogue actions to I/O logic 604 (FIG. 6 ) that direct the behavior of I/O logic 604 in carrying out the interactive screening or monitoring conversation. Examples of dialogue actions include asking a question of the patient, repeating the question, instructing the patient, politely concluding the conversation, changing aspects of a display ofpatient device 112, and modifying characteristics of the speech presented by the patient by I/O logic 604, i.e., pace, volume, apparent gender of the voice, etc. - Interaction
control logic generator 702 customizes the dialogue actions for the patient. Interactioncontrol logic generator 702 receives data from screening ormonitoring data store 210 that represents subjective preferences of the patient and a clinical and social history of the patient. In this illustrative embodiment, the subjective preferences are explicitly specified by the patient, generally prior to any interactive screening or monitoring conversation, and include such things as the particular voice to be presented to the patient through I/O logic 604, default volume and pace of the speech generated by I/O logic 604, and display schemes to be used withinpatient device 112. - The clinical and social history of the patient, in combination with identified interests of the patient, may indicate that questions related to certain topics should be asked of the patient. Interaction
control logic generator 702 uses the patient's preferences and medical history to set attributes of the questions to ask the patient. - Interaction
control logic generator 702 receives data from runtimemodel server logic 504 that represents analytical results of responses of the patient in the current screening or monitoring conversation. In particular, interactioncontrol logic generator 702 receives data representing analytical results of responses, i.e., results 1820 (FIG. 18 ) of runtimemodel server logic 504 and patient and results metadata from descriptive model andanalytics 1812 that facilitates proper interpretation of the analytical results. Interactioncontrol logic generator 702 interprets the analytical results in the context of the results metadata to determine the patient's current status. - History and
state machine 720 tracks the progress of the screening or monitoring conversation, i.e., which questions have been asked and which questions are yet to be asked. Question anddialogue action bank 710 is a data store that stores all dialogue actions that may be taken by interactioncontrol logic generator 702, including all questions that may be asked of the patient. In addition, history andstate machine 720 informs question anddialogue action bank 710 as to which question is to be asked next in the screening or monitoring conversation. - Interaction
control logic generator 702 receives data representing the current state of the conversation and what questions are queued to be asked from history andstate machine 720. Interactioncontrol logic generator 702 processes the received data to determine the next action to be taken by interactive screening ormonitoring server logic 302 in furtherance of the screening or monitoring conversation. Once the next action is determined, interactioncontrol logic generator 702 retrieves data representing the action from question anddialogue action bank 710 and sends a request to I/O logic 604 to perform the next action. - The overall conducting of the screening or monitoring conversation by generalized
dialogue flow logic 602 is illustrated in logic flow diagram 800 (FIG. 8 ). The logic flow diagram ofFIG. 8 describes actions taken by components of the interaction engine in the block diagram ofFIG. 28 . In addition, the logic flow diagram ofFIG. 8 is an instantiation of the process described inFIG. 14 . Instep 802, generalizeddialogue flow logic 602 selects a question or other dialogue action to initiate the conversation with the patient. Interactioncontrol logic generator 702 receives data from history andstate machine 720 that indicates that the current screening or monitoring conversation is in its initial state. Interactioncontrol logic generator 702 receives data that indicates (i) subjective preferences of the patient and (ii) topics of relatively high pertinence to the patient. Given that information, interactioncontrol logic generator 702 selects an initial dialogue action with which to initiate the screening or monitoring conversation. Examples of the initial dialogue action may include (i) asking a common conversation-starting question such as “can you hear me?” or “are you ready to begin?”; (ii) asking a question from a predetermined script used for all patients; (iii) reminding the patient of a topic discussed in a previous screening or monitoring conversation with the patient and asking the patient a follow-up question on that topic; or (iv) presenting the patient with a number of topics from which to select using a user-interface technique onpatient device 112. Instep 802, interactioncontrol logic generator 702 causes I/O logic 604 to carry out the initial dialogue action. -
Loop step 804 andnext step 816 define a loop in which generalizeddialogue flow logic 602 conducts the screening or monitoring conversation according to steps 806-814 until generalizeddialogue flow logic 602 determines that the screening or monitoring conversation is completed. - In
step 806, interactioncontrol logic generator 702 causes I/O logic 604 to carry out the selected dialogue action. In the initial performance ofstep 806, the dialogue action is selected instep 802. In subsequent performances ofstep 806, the dialogue action is selected instep 814 as described below. Instep 808, generalizeddialogue flow logic 602 receives an audiovisual signal of the patient's response to the question. While processing according to logic flow diagram 800 is shown in a manner that suggests synchronous processing, generalizeddialogue flow logic 602 performsstep 808 effectively continuously during performance of steps 802-816 and processes the conversation asynchronously. The same is true for steps 810-814. Instep 810, I/O logic 604 sends the audiovisual signal received instep 808 to runtimemodel server logic 504, which processes the audiovisual signal in a manner described below. Instep 812, I/O logic 604 of generalizeddialogue flow logic 602 receives multiplex data from runtimemodel server logic 504 and produces therefrom an intermediate score for the screening or monitoring conversation so far. - As described above, the results data include analytical results data and results metadata. I/O logic 604 (i) determines to what degree the screening or monitoring conversation has completed screening or monitoring for the target health state(s) of the patient, (ii) identifies any topics in the patient's response that warrant follow-up questions, and (iii) identifies any explicit instructions from the patient for modifying the screening or monitoring conversation. Examples of the last include patient statements such as “can you speak louder?”, “can you repeat that?” or “what?”, and “please speak more slowly.” In
step 814, generalizeddialogue flow logic 602 selects the next question to ask the subject patient, along with other dialogue actions to be performed by I/O logic 604, in the next performance ofstep 806. In particular, interaction control logic generator 702 (i) receives dialogue state data from history andstate machine 720 regarding the question to be asked next, (ii) receives intermediate results data from I/O logic 604 representing evaluation of the patient's health state so far, and (iii) receives patient preferences and pertinent topics. - Processing transfers through
next step 816 toloop step 804. Generalizeddialogue flow logic 602 repeats the loop of steps 804-816 until interactioncontrol logic generator 702 determines that the screening or monitoring conversation is complete, at which point generalizeddialogue flow logic 602 politely terminates the screening or monitoring conversation. The screening or monitoring conversation is complete when (i) all mandatory questions have been asked and answered by the patient and (ii) the measure of confidence in the score resulting from screening or monitoring determined instep 812 is at least a predetermined threshold. It should be noted that confidence in the screening or monitoring is not symmetrical. - The screening or monitoring conversation seeks to detect specific health states in the patient, e.g., depression and anxiety. If such states are detected quickly, they're detected. However, absence of such states is not assured by failing to detect them immediately. More generally, absence of proof is not proof of absence. Thus, generalized
dialogue flow logic 602 finds confidence in early detection but not in early failure to detect. Thus, health screening or monitoring server 102 (FIG. 4 ) estimates the current health state, e.g., mood, of the patient using a spoken conversation with the patient throughpatient device 112. Interactive screening ormonitoring server logic 502 sends data representing the resulting screening or monitoring of the patient to the patient's doctor or other clinicians by sending the data toclinician device 114. In addition, interactive screening ormonitoring server logic 502 records the resulting screening or monitoring in screening or monitoringsystem data store 410. A top priority of generalizeddialogue flow logic 602 is to elicit speech from the patient that is highly informative with respect to the health state attributes for which health screening ormonitoring server 102 screens the patient. For example, in this illustrative embodiment, health screening ormonitoring server 102 screens most patients for depression and anxiety. The analysis performed by runtimemodel server logic 504 is most accurate when presented with patient speech of a particular quality. In this context, speech quality refers to the sincerity with which the patient is speaking. Generally speaking, high quality speech is genuine and sincere, while poor quality speech is from a patient not engaged in the conversation or being intentionally dishonest. - For example, if the patient does not care about the accuracy of the screening or monitoring, but instead wants to answer all questions as quickly as possible to end the screening or monitoring as quickly as possible, it is unlikely to reveal much about the patient's true health. Similarly, if the patient intends to control the outcome of the screening or monitoring by giving false responses, not only are the responses linguistically false but the emotional components of the speech may be distorted or missing due to the disingenuous participation by the patient. There are a number of ways in which generalized
dialogue flow logic 602 increases the likelihood that the patient's responses are relatively highly informative. For example, generalizeddialogue flow logic 602 may invite the patient to engage interactive screening ormonitoring server logic 502 as an audio diary whenever the patient is so inclined. Voluntary speech by the patient whenever motivated tends to be genuine and sincere and therefore highly informative. - Generalized
dialogue flow logic 602 may also select topics that are pertinent to the patient. These topics may include topics specific to clinical and social records of the patient and topics specific to interests of the patient. Using topics of interest to the patient may have the negative effect of influencing the patient's mood. For example, asking the patient about her favorite sports team may cause the patient's mood to rise or fall with the most recent news of the team. Accordingly, generalizeddialogue flow logic 602 distinguishes health-relevant topics of interest to the patient from health-irrelevant topics of interest to the patient. For example, questions related to an estranged relative of the patient may be health-relevant while questions related to the patient's favorite television series are typically not. Adapting any synthetic voice to match the preferences of the patient makes the screening or monitoring conversation more engaging for the patient and therefore elicits more informative speech. In embodiments in whichpatient device 112 displays a video representation of a speaker, i.e., an avatar, to the patient, patient preferences include, in addition to the preferred voice, physical attributes of the appearance of the avatar. - When a patient has not specified preferences for a synthetic voice or avatar, generalized
dialogue flow logic 602 may use a synthetic voice and avatar chosen for the first screening or monitoring conversation and, in subsequent screening or monitoring conversations, change the synthetic voice and avatar and compare the degree of informativeness of the patient's responses to determine which voice and avatar elicit the most informative responses. The voice and avatar chosen for the initial screening or monitoring conversation may be chosen according to which voice and avatar tends to elicit the most informative speech among the general population or among portions of the general population sharing one or more phenotypes with the patient. The manner in which the informativeness of responses elicited by a question is determined is described below. - To make the screening or monitoring conversation more interactive and engaging, generalized
dialogue flow logic 602 inserts a synthetic backchannel in the conversation. For example, generalizeddialogue flow logic 602 may utter “uh-huh” during short pauses in the patient's speech to indicate that generalizeddialogue flow logic 602 is listening and interested in what the patient has to say. Similarly, generalizeddialogue flow logic 602 may cause the video avatar to exhibit non-verbal behavior (sometimes referred to as “body language”) to indicate attentiveness and interest in the patient. - Generalized
dialogue flow logic 602 also selects questions that are of high quality. Question quality is measured in the informativeness of responses elicited by the question. In addition, generalizeddialogue flow logic 602 avoids repetition of identical questions in subsequent screening or monitoring conversations, substituting equivalent questions when possible. The manner in which questions are determined to be equivalent to one another is described more completely below. As described above, question and adaptive action bank 710 (FIG. 5 ) is a data store that stores all dialogue actions that may be taken by interactioncontrol logic generator 702, including all questions that may be asked of the patient. - Question and
adaptive action bank 710 is shown in greater detail inFIG. 9 . Question andadaptive action bank 710 is shown in greater detail inFIG. 7 and includes a number ofquestion records 902 and adialogue 912. Each ofquestion records 902 includes data representing a single question that may be asked of a patient.Dialogue 912 is a series of questions to ask a patient in a spoken conversation with the patient. Each ofquestion records 902 includes aquestion body 904, aclassification 906, aquality 908, and anequivalence 910.Question body 904 includes data specifying the substantive content of the question, i.e., the sequence of words to be spoken to the patient to effect asking of the question.Topic 906 includes data specifying a hierarchical topic category to which the question belongs. Categories may correlate to (i) specific health diagnoses such as depression, anxiety, etc.; (ii) specific symptoms such as insomnia, lethargy, general disinterest, etc.; and/or (iii) aspects of a patient's treatment such as medication, exercise, etc.Quality 908 includes data representing the quality of the question. The quality of the question is a measure of informativeness of responses elicited by the question.Equivalence 910 is data identifying one or more other questions inquestion records 902 that are equivalent to the question represented by this particular one of question records 902. In this illustrative embodiment, only questions of thesame topic 906 may be considered equivalent. In an alternative embodiment, any questions may be considered equivalent regardless of classification.Dialogue 912 includes an ordered sequence ofquestions 914A-N, each of which identifies a respective one ofquestion records 902 to ask in a spoken conversation with the patient. In this illustrative embodiment, the spoken conversation begins with twenty (20) preselected questions and may include additional questions as necessary to produce a threshold degree of confidence to conclude the conversation of logic flow diagram 600 (FIG. 6 ). The preselected questions include, in order, five (5) open-ended questions of high quality, eight (8) questions of the standard and known PHQ-8 screening or monitoring tool for depression, and the seven (7) questions of the standard and known GAD-7 screening or monitoring tool for anxiety. In other examples, the questions may be generated algorithmically.Dialogue 912 specifies these twenty (20) questions in this illustrative embodiment. As described above with respect to step 814 (FIG. 10 ), interactioncontrol logic generator 702 determines the next question to ask the patient instep 814. One embodiment ofstep 814 is shown as logic flow diagram 1014 (FIG. 8 ). Instep 1002, interactioncontrol logic generator 702 dequeues a question fromdialogue 912, treating the ordered sequence ofquestions 914A-N as a queue. History andstate machine 720 keeps track of which ofquestions 914A-N is next. If the screening or monitoring conversation is not complete according to the intermediate score and all ofquestions 914A-N have been processed in previous performances ofstep 1002 in the same spoken conversation, i.e., if the question queue is empty, interactioncontrol logic generator 702 selects questions from those ofquestion records 902 with thehighest quality 908 and pertaining to topics selected for the patient. - If interaction
control logic generator 702 selects multiple questions, interactioncontrol logic generator 702 may select one as the dequeued question randomly with each question weighted by itsquality 908 and its closeness to suggested topics. - In step 1004 (
FIG. 10 ), interactioncontrol logic generator 702 collects all equivalent questions identified by equivalence 910 (FIG. 9 ) for the question dequeued instep 1002. Instep 1006, interactioncontrol logic generator 702 selects a question from the collection of equivalent questions collected instep 1004, including the question dequeued instep 1002 itself. Interactioncontrol logic generator 702 may select one of the equivalent questions randomly or using information about prior interactions with the patient, e.g., to select the one of the equivalent questions least recently asked of the patient. Interactioncontrol logic generator 702 processes the selected question as the next question in the next iteration of the loop of steps 804-816 (FIG. 8 ). The use of equivalent questions is important. The quality of a question, i.e., the degree to which responses the question elicits are informative in runtimemodel server logic 504, decreases for a given patient over time. In other words, if a given question is asked to a given patient repeatedly, each successive response by the patient becomes less informative than it was in all prior asking's of the question. In a sense, questions become stale over time. To keep questions fresh, i.e., soliciting consistently informative responses over time, a given question is replaced with an equivalent, but different, question in a subsequent conversation. However, the measurement of equivalence may be accurate for comparison of responses to equivalent questions over time to be consistent. - Thus, two important concepts of questions in generalized dialogue flow logic 602 (
FIG. 7 ) are question quality and question equivalence. Question quality and question equivalence are managed byquestion management logic 916, which is shown in greater detail inFIG. 11 .Question management logic 916 includesquestion quality logic 1102, which measures a question's quality, andquestion equivalence logic 1104, which determines whether two (2) questions are equivalent in the context of health screening ormonitoring server 102.Question quality logic 1102 includes a number ofmetric records 1106 andmetric aggregation logic 1112. To measure the quality of a question, i.e., to measure how informative are the responses elicited by the question,question quality logic 1102 uses a number of metrics to be applied to a question, each of which results in a numeric quality score for the question and each of which is represented by one ofmetric records 1106. Each ofmetric records 1106 represents a single metric for measuring question quality and includesmetric metadata 1108 andquantification logic 1110.Metric metadata 1108 represents information about the metric ofmetric record 1106.Quantification logic 1110 defines the behavior ofquestion quality logic 1102 in evaluating a question's quality according to the metric ofmetric record 1106. - The following are examples of metrics that may be applied by
question quality logic 1102 to measure the quality of various questions: (i) the length of elicited responses in terms of a number of words; (ii) the length of elicited responses in terms of duration of the responsive utterance; (iii) a weighted word score; (iv) an amount of acoustic energy in elicited responses; and (v) “voice activation” in the responses elicited by the question. Each is described in turn. - In a
metric record 1106 representing a metric of the length of elicited responses in terms of a number of words,quantification logic 1110 retrieves all responses to a given question from screening or monitoring system data store 410 (FIG. 4 ) and uses associated results data from screening or monitoringsystem data store 410 to determine the number of words in each of the responses.Quantification logic 1110 quantifies the quality of the question as a statistical measure of the number of words in the responses, e.g., a statistical mean thereof. - With respect to the length of elicited responses in terms of duration of the responsive utterance, the duration of elicited responses may be measured in a number of ways. In one, the duration of the elicited response is simply the elapsed duration, i.e., the entire duration of the response as recorded in screening or monitoring
system data store 410. In another, the duration of the elicited response is the elapsed duration less pauses in speech. In yet another, the duration of the elicited response is the elapsed duration less any pause in speech at the end of the response. - In a metric record 1106 (
FIG. 11 ) representing a metric of the duration of elicited responses,quantification logic 1110 retrieves all responses to a given question from screening or monitoring system data store 410 (FIG. 4 ) and determines the duration of those responses. Quantification logic 1110 (FIG. 11 ) quantifies the quality of the question as a statistical measure of the duration of the responses, e.g., a statistical mean thereof. - With respect to a weighted word score, semantic models of NLP model 1806 (
FIG. 18 ) estimate a patient's health state from positive and/or negative content of the patient's speech. The semantic models correlate individual words and phrases to specific health states the semantic models are designed to detect. In a metric record 1106 (FIG. 11 ) representing a metric of a weighted word score,quantification logic 1110 retrieves all responses to a given question from collected patient data 410 (FIG. 5 ) and uses the semantic models to determine correlation of each word of each response to one or more health states. An individual response's weighted word score is the statistical mean of the correlations of the weighted word scores.Quantification logic 1110 quantifies the quality of the question as a statistical measure of the weighted word scores of the responses, e.g., a statistical mean thereof. - With respect to an amount of acoustic energy in elicited responses, runtime model server logic 504 (
FIG. 18 ) estimates a patient's health state from pitch and energy of the patient's speech as described below. How informative speech is to the various models of runtimemodel server logic 504 is directly related to how emotional the speech is. In a metric record 1106 (FIG. 11 ) representing a metric of an amount of acoustic energy,quantification logic 1110 retrieves all responses to a given question from screening or monitoring system data store 410 (FIG. 4 ) and uses response data from runtimemodel server logic 504 to determine an amount of energy present in each response.Quantification logic 1110 quantifies the quality of the question as a statistical measure of the measured acoustic energy of the responses, e.g., a statistical mean thereof. - With respect to “voice activation” in the responses elicited by the question, the quality of a question is a measure of how similar responses to the question are to utterances recognized by runtime models 1802 (
FIG. 18 ) as highly indicative of a health state that runtimemodels 1802 are trained to recognize. In a metric record 1106 (FIG. 11 ) representing a metric of voice activation,quantification logic 1110 determines how similar deep learning machine features for all responses to a given question are to deep learning machine features for health screening ormonitoring server 102 as a whole. - Deep learning machine features are known but are described herein briefly to facilitate understanding and appreciation of the present invention. Deep learning is a sub-science of machine learning in that a deep learning machine is a machine learning machine, i.e., learning machine, that learns for itself how to distinguish one thing represented in data from another thing represented in data. The following is a simple example to illustrate the distinction.
- Consider an ordinary (not deep) learning machine that is configured to recognize the representation of a cat in image data. Such a learning machine is typically a computer process with multiple layers of logic. One layer is manually configured to recognize contiguous portions of an image with transitions from one color to another (e.g., light to dark, red to green, etc.). This is commonly referred to as edge detection. A subsequent layer receives data representing the recognized edges and is manually configured to recognize edges that join together to define shapes. A final layer receives data representing shapes and is manually configured to recognize a symmetrical grouping of triangles (cat's ears) and dark regions (eyes and nose). Other layers may be used between those mentioned here.
- In machine learning, the data received as input to any step in the computation, including intermediate results from other steps in the computation, are called features. The results of the learning machine are called labels. In this illustrative example, the labels are “cat” and “no cat”.
- This manually configured learning machine may work reasonably well but may have significant shortcomings. For example, recognizing the symmetrical grouping of shapes might not recognize an image in which a cat is represented in profile. In a deep learning machine, the machine is trained to recognize cats without manually specifying what groups of shapes represent a cat. The deep learning machine may utilize manually configured features to recognize edges, shapes, and groups of shapes, however these are not a required component of a deep learning system. Features in a deep learning system may be learned entirely automatically by the algorithm based on the labeled training data alone.
- Training a deep learning machine to recognize cats in image data can, for example, involve presenting the deep learning machine with numerous, preferably many millions of, images and associated knowledge as to whether each image includes a cat, i.e., associated labels of “cat” or “no cat”. For each image received in training, the last, automatically configured layer of the deep learning machine receives data representing numerous groupings of shapes and the associated label of “cat” or “no cat”. Using statistical analysis and conventional techniques, the deep learning machine determines statistical weights to be given each type of shape grouping, i.e., each feature, in determining whether a previously unseen image includes a cat.
- These trained, i.e., automatically generated, features of the deep learning machine will likely include the symmetrical grouping of shapes manually configured into the learning machine as described above. However, these features will also likely include shape groupings and combinations of shape groupings not thought of by human programmers.
- In measuring the quality of a question, the features of the constituent models of runtime model server logic 504 (
FIG. 18 ) specify precisely the type of responses that indicate a health state that the constituent models of runtimemodel server logic 504 are configured to recognize. Thus, in evaluating the quality of a question, these features represent an exemplary feature set. To measure the quality of a question using this metric, quantification logic 1110 (FIG. 11 ) retrieves all responses to the question from screening or monitoringsystem data store 410 and data representing the diagnoses associated with those responses and trainsruntime models 1802 andmodel repository 416 using those responses and associated data. - In
training runtime models 1802 andmodel repository 416, the deep learning machine develops a set of features specific to the question being measured and the determinations to be made by the trained models.Quantification logic 1110 measures similarity between the feature set specific to the question and the exemplary feature set in a manner described below with respect toquestion equivalence logic 1104. - As described above, interaction control logic generator 702 (
FIG. 7 ) uses quality 908 (FIG. 9 ) of various questions in determining which question(s) to ask a particular patient. To provide a comprehensive measure of quality of a question to store in quality 908 (FIG. 9 ), metric aggregation logic 1112 (FIG. 11 ) aggregates the various measures of quality according tometric records 1106. The manner in whichaggregation logic 1112 aggregates the measures of quality for a given question is illustrated by logic flow diagram 1200 (FIG. 12 ). -
Loop step 1202 andnext step 1210 define a loop in whichmetric aggregation logic 1112 processes each ofmetric records 1106 according to steps 1204-1208. The particular one ofmetric records 1106 processed in an iteration of the loop of steps 1202-1210 is sometimes referred to as “the subject metric record”, and the metric represented by the subject metric record is sometimes referred to as “the subject metric.” Instep 1204,metric aggregation logic 1112 evaluates the subject metric, usingquantification logic 1110 of the subject metric record and all responses in screening or monitoring system data store 410 (FIG. 4 ) to the subject question. In test step 1206 (FIG. 12 ),metric aggregation logic 1110 determines whether screening or monitoringsystem data store 410 includes a statistically significant sample of responses to the subject question by the subject patient. If so,metric aggregation logic 1110 evaluates the subject metric usingquantification logic 1110 and only data corresponding to the subject patient in screening or monitoringsystem data store 410 instep 1208. Conversely, if collectedpatient data 410 does not include a statistically significant sample of responses to the subject question by the subject patient,metric aggregation logic 1112 skips step 1208. Thus,metric aggregation logic 1112 evaluates the quality of a question in the context of the subject patient to the extent screening or monitoringsystem data store 410 contains sufficient data corresponding to the subject patient. - After steps 1206-1208, processing transfers through
next step 1210 toloop step 1202 andmetric aggregation logic 1110 processes the next metric according to the loop of steps 1202-1210. Once all metrics have been processed according to the loop of steps 1202-1210, processing transfers to step 1212 in whichmetric aggregation logic 1110 aggregates the evaluated metrics from all performances ofsteps quality 908. In this illustrative embodiment,metric metadata 1108 stores data specifying howmetric aggregation logic 1112 is to include the associated metric in the aggregate measure instep 1212. For example,metric metadata 1108 may specify a weight to be attributed to the associated metric relative to other metrics. - After step 1212 (
FIG. 12 ), processing according to logic flow diagram 1200 completes. - As described above,
equivalence 910 for a given question identifies one or more other questions inquestion records 902 that are equivalent to the given question. Whether two questions are equivalent is determined by question equivalence logic 1104 (FIG. 11 ) by comparing similarity between the two questions to a predetermined threshold. The similarity here is not how similar the words and phrasing of the sentences are but instead how similarly models ofruntime model server 504 andmodel repository 416 sees them. The predetermined threshold is determined empirically.Question equivalence logic 1104 measures the similarity between two questions in a manner illustrated by logic flow diagram 1300 (FIG. 13 ). -
Loop step 1302 andnext step 1306 define a loop in whichquestion equivalence logic 1304 processes each ofmetric records 1106 according tostep 1304. The particular one ofmetric records 1306 processed in an iteration of the loop of steps 1302-1306 is sometimes referred to as “the subject metric record”, and the metric represented by the subject metric record is sometimes referred to as “the subject metric.” Instep 1304,question equivalence logic 1104 evaluates the subject metric for each of the two questions. Once all metrics have been processed according to the loop of steps 1302-1106, processing byquestion equivalence logic 1104 transfers to step 1308. - In
step 1308,question equivalence logic 1104 combines the evaluated metrics for each question into a respective multi-dimensional vector for each question. - In
step 1310,question equivalence logic 1104 normalizes both vectors to have a length of 1.0. Instep 1312,question equivalence logic 1104 determines an angle between the two normalized vectors. - In
step 1314, the cosine of the angle determined instep 1312 is determined byquestion equivalence logic 1104 to be the measured similarity between the two questions. - Since the vectors are normalized to a length of 1.0, the similarity between two questions ranges from −1.0 to 1.0, 1.0 being perfectly equivalent. In this illustrative embodiment, the predetermined threshold is 0.98 such that two questions have a measured similarity of at least 0.98 are considered equivalent and are so represented in equivalence 910 (
FIG. 9 ) for both questions. - In addition, since the comparison between questions is not comparison of a single value but instead a comparison of multi-dimensional vectors, two questions are equivalent, not if only similar in general, but if similar in most or every way measured.
- In another embodiment (
FIG. 3 ), assessment test administrator 2202 (FIG. 22 ) administers a depression assessment test to the subject patient by conducting an interactive spoken conversation with the subject patient through patient device 312.The manner in whichassessment test administrator 2202 does so is illustrated in logic flow diagram 1400 (FIG. 14 ). Thetest administrator 2202 may be a computer program configured to questions to the patient. The questions may be algorithmically generated questions. The questions may be generated by, for example, a natural language processing (NLP) algorithm. Examples of NLP algorithms are semantic parsing, sentiment analysis, vector-space semantics, and relation extraction. In some embodiments, the methods described herein may be able to generate an assessment without requiring the presence or intervention of a human clinician. In other embodiments, the methods described herein may be able to be used to augment or enhance clinician-provided assessments, or aid a clinician in providing an assessment. The assessment may include queries containing subject matter that has been adapted or modified from screening or monitoring methods, such as the PHQ-9 and GAD-7 assessments. The assessment herein may not merely use the questions from such surveys verbatim, but may adaptively modify the queries based at least in part on responses from subject patients. - In
step 1402,assessment test administrator 2202 optimizes the testing environment.Step 1402 is shown in greater detail in logic flow diagram 1402 (FIG. 15 ). - In
step 1502,assessment test administrator 2202 initiates the spoken conversation with the subject patient. In this illustrative embodiment,assessment test administrator 2202 initiates a conversation by asking the patient the initial question of the assessment test. The initial question is selected in a manner described more completely below. The exact question asked isn't particularly important. What is important is that the patient responds with enough speech thatassessment test administrator 2202 may evaluate the quality of the video and audio signal received frompatient device 312. -
Assessment test administrator 2202 receives and processes audiovisual data frompatient device 312 throughout the conversation.Loop step 1504 andnext step 1510 define a loop in whichassessment test administrator 2202 processes the audiovisual signal according to steps 1506-1508 untilassessment test administrator 2202 determines that the audiovisual signal is of high quality or at least of adequate quality to provide accurate assessment. - In
step 1506,assessment test administrator 2202 evaluates the quality of the audiovisual signal received frompatient device 312. In particular,assessment test administrator 2202 measures the volume of speech, the clarity of the speech, and to what degree the patient's face and, when available, body is visible. - In
step 1508,assessment test administrator 2202 reports the evaluation to the patient. In particular,assessment test administrator 2202 generates an audiovisual signal that represents a message to be played to the patient throughpatient device 312. If the audiovisual signal received frompatient device 312 is determined byassessment test administrator 2202 to be of inadequate quality, the message asks the patient to adjust her environment to improve the signal quality. For example, if the audio portion of the signal is poor, the message may be “I'm having trouble hearing you. may you move the microphone closer to you or find a quieter place?” If the patient's face and, when available, body isn't clearly visible, the message may be “I can't see your face (and body). may you reposition your phone so I may see you?” Afterstep 1508, processing byassessment test administrator 2202 transfers throughnext step 1510 toloop step 1504 andassessment test administrator 2202 continues processing according to the loop of steps 1504-1510 until the received audiovisual is adequate or is determined to be as good as it will get for the current assessment test. It is preferred that subsequent performances ofstep 1508 are responsive to any speech by the patient. For example, the patient may attempt to comply with a message to improve the environment with the question, “Is this better?” The next message sent in reporting ofstep 1508 should include an answer to the patient's question. As described herein, composite model 2204 includes a language model component, soassessment test administrator 2202 necessarily performs speech recognition. - When the received audiovisual is adequate or is determined to be as good as it will get for the current assessment test, processing by
assessment test administrator 2202 according to the loop of steps 1504-1510 completes. In addition, processing according to logic flow diagram 1402, and therefore step 1402 (FIG. 14 ), completes. -
Loop step 1404 andnext step 1416 define a loop in whichassessment test administrator 2202 conducts the spoken conversation of the assessment test according to steps 1406-1414 untilassessment test administrator 2202 determines that the assessment test is completed. - In
step 1406,assessment test administrator 2202 asks a question of the patient in furtherance of the spoken conversation. In this illustrative embodiment,assessment test administrator 2202 uses a queue of questions to ask the patient, and that queue is sometimes referred to herein as the conversation queue. In the first performance ofstep 1406, the queue may be prepopulated with questions to be covered during the assessment test. In general, these questions cover the same general subject matter covered by currently used written assessment tests such as the PHQ-9 and GAD-7. However, while the questions in those tests are intentionally designed to elicit extremely short and direct answers,assessment test administrator 2202 may require more audio and video than provided by one-word answers. Accordingly, it is preferred that the initially queued questions be more open-ended. - In this illustrative embodiment, the initial questions pertain to the topics of general mood, sleep, and appetite. An example of an initial question pertaining to sleep is question 1702 (
FIG. 17 ): “How have you been sleeping recently?” This question is intended to elicit a sentence or two from the patient to thereby provide more audio and video of the patent than would ordinarily be elicited by a highly directed question. - In step 1408 (
FIG. 14 ),assessment test administrator 2202 receives an audiovisual signal of the patient's response to the question. While processing according to logic flow diagram 1400 is shown in a manner that suggests synchronous processing,assessment test administrator 2202 performsstep 1408 effectively continuously during performance of steps 1402-1416 and processes the conversation asynchronously. The same is true for steps 1410-1414. - 12481 In
step 1410,assessment test administrator 2202 processes the audiovisual signal received instep 1408 using composite model 2204. Instep 1412,assessment test administrator 2202 produces an intermediate score for the assessment test according to the audiovisual signal received so far. - In
step 1414,assessment test administrator 2202 selects the next question to ask the subject patient in the next performance ofstep 1406, and processing transfers throughnext step 1416 toloop step 1404.Step 1414 is shown in greater detail as logic flow diagram 1414 (FIG. 16 ). In addition,FIG. 16 may be construed to follow fromstep 814 fromFIG. 8 . - In
step 1602,assessment test administrator 2202 identifies significant elements in the patient's speech. In particular,assessment test administrator 2202 uses language portions of composite model 2204 to identify distinct assertions in the portion of the audiovisual signal received after the last question asked in step 1406 (FIG. 14 ). That portion of the audiovisual signal is sometimes referred to herein as “the patient's response” in the context of a particular iteration of the loop of steps 1604-1610. - An example of a conversation conducted by
assessment test administrator 2202 of real-time system 302 andpatient device 312 is shown in (FIG. 17 ). It should be appreciated thatconversation 1700 is illustrative only. The particular questions to ask, which parts of the patient's response are significant, and the depth to which any topic is followed is determined by the type information to be gathered byassessment test administrator 2202 and is configured therein. Instep 1702,assessment test administrator 2202 asks the question, “How have you been sleeping recently?” The patient's response is “Okay . . . I've been having trouble sleeping lately. I have meds for that. They seem to help.” Instep 1602,assessment test administrator 2202 identifies three (3) significant elements in the patient's response: (i) “trouble sleeping” suggests that the patient has some form of insomnia or at least that sleep is poor; (ii) “I have meds” suggests that the user is taking medication; and (iii) “They seem to help” suggests that the medication taken by the user is effective. In the illustrative example ofconversation 1700, each of these significant elements is processed byassessment test administrator 2202 in the loop of steps 1604-1610. -
Loop step 1604 andnext step 1610 define a loop in whichassessment test administrator 2202 processes each significant element of the patient's answer identified instep 1602 according to steps 1606-1608. In the context of a given iteration of the loop of steps 1604-1610, the particular significant element processed is sometimes referred to as “the subject element.” Instep 1606,assessment test administrator 2202 processes the subject element, recording details included in the element and identifying follow-up questions. For example, in conversation 1700 (FIG. 17 ),assessment test administrator 2202 identifies three (3) topics for follow-up questions for the element of insomnia: (i) type of insomnia (initial, middle, or late), (ii) the frequency of insomnia experienced by the patient, and (iii) what medication if any the patient is taking for the insomnia. - In
step 1608,assessment test administrator 2202 enqueues any follow-up questions identified instep 1606. - After
step 1608, processing byassessment test administrator 2202 transfers throughnext step 1610 toloop step 1604 untilassessment test administrator 2202 has processed all significant elements of the patient's response according to the loop of steps 1604-1610. Onceassessment test administrator 2202 has processed all significant elements of the patient's response according to the loop of steps 1604-1610, processing transfers fromloop step 1604 to step 1612. - In the illustrative context of conversation 1700 (
FIG. 17 ), the state of the conversation queue is as follows.FIG. 17 shows a particular instantiation of a conversation proceeding between the system and a patient. The queries and replies disclosed herein are exemplary and should not be construed as being required to follow the sequence disclosed inFIG. 17 . In the processing the response element of insomnia,assessment test administrator 2202 identifies and enqueues follow-up topics regarding the type insomnia and any medication taken for the insomnia. - In processing the response element of medication in the patient's response,
assessment test administrator 2202 observes that the patient is taking medication. Instep 1606,assessment test administrator 2202 records that fact and, identifying a queued follow-up question regarding medication for insomnia, processes the medication element as responsive to the queued question. - In
step 1608 for the medication element,assessment test administrator 2202 enqueues follow-up questions regarding the particular medicine and dosage used by the patient and its efficacy as shown instep 1708. - In this illustrative embodiment, questions in the conversation queue are hierarchical. In the hierarchy, each follow-up question is a child of the question for which the follow-up question follows up. The latter question is the parent of the follow-up question. In dequeuing questions from the conversation queue,
assessment test administrator 2202 implements a pre-order depth-first walk of the conversation queue hierarchy. In other words, all child questions of a given question are processed before processing the next sibling question. In conversational terms, all follow-up questions of a given question are processed before processing the next question at the same level, recursively. In the context ofconversation 1700,assessment test administrator 2202 processes all follow-up questions of the type of insomnia before processing the questions of frequency and medication and any of their follow-up questions. This is the way conversations happen naturally—staying with the most recently discussed topic until complete before returning to a previously discussed topic. - In addition, the order in which sibling questions are processed by
assessment test administrator 2202 may be influenced by the responses of the patient. In this illustrative example, a follow-up question regarding the frequency of insomnia precedes the follow-up question regarding medication. However, when processing the element regarding medication instep 1606,assessment test administrator 2202 changes the sequence of follow-up questions such that the follow-up question regarding medication is processed prior to processing the follow-up question regarding insomnia frequency. Since medication was mentioned by the patient, we'll discuss that before adding new subtopics to the conversation. This is another way in whichassessment test administrator 2202 is responsive to the patient. - In processing the response element of medication efficacy (i.e., “They seem to help.”),
assessment test administrator 2202 records that the medication is moderately effective. Seeing that the conversation queue includes a question regarding the efficacy of medication,assessment test administrator 2202 applies this portion of the patient's response as responsive to the queued follow-up question instep 1710. - In
step 1612,assessment test administrator 2202 dequeues the next question from the conversation queue and processing according to logic flow diagram 1414, and therefor step 1414, completes and the conversation continues. Prior to returning to discussion of (FIG. 14 ), it is helpful to consider additional performances ofstep 1414, and therefore logic flow diagram 1414, in the context of illustrative conversation 1700.The question dequeued as the next question in this illustrative embodiment asks about the patient's insomnia, trying to discern the type of insomnia. It is appreciated that conventional thinking as reflected in the PHQ-9 and GAD-7 is that the particular type of sleep difficulties experienced by a test subject isn't as strong an indicator of depression as the mere fact that sleep is difficult. However, delving more deeply into a topic of conversation has a number of beneficial consequences. Most significantly is that the user is encouraged to provide more speech for more accurate assessment of the patient's state. In addition, by asking questions about something the patient has just said suggests thatassessment test administrator 2202 is interested in the patient personally and, by earning good will from the patient, makes the patient more likely to be honest, both in speech and behavior. - In the illustrative example of
conversation 1700, the next question is related to the type of insomnia. The question is intentionally as open-ended as possible while still targeted at specific information: “Have you been waking up in the middle of the night?” Seequestion 1712. While this question may elicit a “Yes” or “No” answer, it may also elicit a longer response, such as response 1714: “No. I just have trouble falling asleep.” Afterstep 1612, processing according to logic flow diagram 1414, and therefore step 1414 (FIG. 14 ), completes. In successive iterations of the loop of steps 1404-1416,assessment test administrator 2202 continues the illustrative example ofconversation 1700. In the next performance ofstep 1406,assessment test administrator 2202 asks question 1712 (FIG. 17 ). In the next continuing performance ofstep 1408,assessment test administrator 2202 receivesresponse 1714.Assessment test administrator 2202 processesresponse 1714 in the next performance ofstep 1414. - In this illustrative performance of step 1602 (
FIG. 16 ),assessment test administrator 2202 identifies a single significant element, namely, that the patient has trouble falling asleep and doesn't wake in the middle of the night. Instep 1606,assessment test administrator 2202 records the type of insomnia (see step 1716) and, in this illustrative embodiment, there are no follow-up questions related to that. - In this illustrative performance of
step 1608,assessment test administrator 2202 dequeues the next question from the conversation queue. Since no follow-up questions for the type of insomnia and whether the patient is treating the insomnia with medication have already been answered, the next question is the first child question related to medication, namely, the particular medication taken by the patient. - In the next iterative performance of step 1406 (
FIG. 14 ),assessment test administrator 2202 forms the question, namely, which particular medication the patient is taking for insomnia. In some embodiments,assessment test administrator 2202 asks that question in the most straight-forward way, e.g., “You said you're taking medication for your insomnia. Which drug are you taking?” This has the advantage of being open-ended and eliciting more speech than would a simple yes/no question. - In other embodiments,
assessment test administrator 2202 accesses clinical data related to the patient to help identify the particular drug used by the patient. The clinical data may be received from modeling system 302 (FIG. 22 ), usingclinical data 2220, or from clinical data server 306 (FIG. 3 ). Accordingly,assessment test administrator 2202 may ask a more directed question using the assumed drug's most common name and generic name. For example, if the patient's data indicates that the patient has been prescribed Zolpidem (the generic name of the drug sold under the brand name, Ambien), question 1720 (FIG. 17 ) may be, “You said you're taking medication for insomnia. Is that Ambien or Zolpidem?” This highly directed question risks eliciting no more than a simple yes/no response (e.g., response 1722). However, this question also shows a knowledge of, and interest in, the patient—further garnering goodwill and increasing the likelihood of honest responses by the patient and a willingness to continue the assessment test longer. - In this illustrative embodiment,
assessment test administrator 2202 determines whether to ask a highly directed question rather than a more open-ended question based on whether requisite clinical data for the patient is available and to what degree additional speech is needed to achieve an adequate degree of accuracy in assessing the state of the patient. - The illustrative example of
conversation 1700 continues withassessment test administrator 2202 recording the substance ofresponse 1722 instep 1724. -
Assessment test administrator 2202 in this illustrative embodiment determines the responsiveness to the patient also in the mannerassessment test administrator 2202 determines whether the patient has completed her response to the most recently asked question, e.g., in determining when an answer received instep 1408 is complete and selection of the next question instep 1414 may begin. - To further develop good will in the patient,
assessment test administrator 2202 avoids interrupting the patient as much as possible. It helpful to consider response 1704: “Okay . . . I've been having trouble sleeping lately. I have meds for that. They seem to help.” The ellipsis after “Okay.” indicates a pause in replying by the patient. To this end,assessment test administrator 2202 waits long enough to permit the patient to pause briefly without interruption but not so long as to cause the patient to believe thatassessment test administrator 2202 has become unresponsive, e.g., due to a failure ofassessment test administrator 2202 or the communications links therewith. Moreover, pauses in speech are used in assessment as described more completely below andassessment test administrator 2202 should avoid interfering with the patient's speech fluency. - In this illustrative embodiment,
assessment test administrator 2202 uses two pause durations, a short one and a long one. After a pause for the short duration,assessment test administrator 2202 indicates thatassessment test administrator 2202 continues to listen by playing a very brief sound that acknowledges an understanding and a continuation of listening, e.g., “uh-huh” or “mmm-hmmm”. After playing the message,assessment test administrator 2202 waits during any continued pause for the long duration. If the pause continues that long,assessment test administrator 2202 determines that the patient has completed her response. - The particular respective lengths of the short and long durations may be determined empirically. In addition, the optimum lengths may vary from patient to patient. Accordingly,
assessment test administrator 2202 continues to adjust these durations for the patient whenever interacting with the patient.Assessment test administrator 2202 recognizes durations that are too short when observing cross-talk, i.e., when speed is being received from the patient whileassessment test administrator 2202 concurrently plays any sound.Assessment test administrator 2202 recognizes durations that are too long when (i) the patient explicitly indicates so (e.g., saying “Hello?” or “Are you still there?”) and/or (ii) the patient's response indicates increased frustration or agitation relative to the patient's speech earlier in the same conversation. - The conversation is terminated politely by
assessment test administrator 2202 when the assessment test is complete. The assessment test is complete when (i) the initial questions in the conversation queue and all of their descendant questions have been answered by the patient or (ii) the measure of confidence in the score resulting from assessment determined instep 1412 is at least a predetermined threshold. It should be noted that confidence in the assessment is not symmetrical. The assessment test seeks depression, or other behavioral health conditions, in the patient. If it's found quickly, it's found. However, its absence is not assured by failing to find it immediately. Thus,assessment test administrator 2202 finds confidence in early detection but not in early failure to detect. - Thus, real-time system 302 (
FIG. 22 ) assesses the current mental state of the patient using an interactive spoken conversation with the patient throughpatient device 312.Assessment test administrator 2202 sends data representing the resulting assessment of the patient to the patient's doctor or other clinician by sending the data toclinician device 314. In addition,assessment test administrator 2202 records the resulting assessment inclinical data 2220. - While
assessment test administrator 2202 is described as conducting an interactive spoken conversation with the patient to assess the mental state of the patient, in other embodiments,assessment test administrator 2202 passively listens to the patient speaking with the clinician and assesses the patient's speech in the manner described herein. The clinician may be a mental health professional, a general practitioner or a specialist such as a dentist, cardiac surgeon, or an ophthalmologist. In one embodiment,assessment test administrator 2202 passively listens to the conversation between the patient and clinician throughpatient device 312 upon determining that the patient is in conversation with the clinician, e.g., by a “START” control on the clinician's iPad. Upon determining that the conversation between the patient and clinician is completed, e.g., by a “STOP” control on the clinician's iPad,assessment test administrator 2202 ceases passively listening and assessing speech in the manner described above. In addition, sincepatient device 312 is listening passively and not prompting the patient,assessment test administrator 2202 makes no attempt to optimize the audiovisual signal received throughpatient device 312 and makes no assumption that faces in any received video signal are that of the patient. - In some embodiments, at the start of the conversation between the patient and the clinician, the clinician asks the patient to initiate listening by
assessment test administrator 2202 and the patient does so by issuing a command throughpatient device 312 that directsassessment test administrator 2202 to begin listening. Similarly, at end of the conversation, the clinician asks the patient to terminate listening byassessment test administrator 2202 and the patient does so by issuing a command throughpatient device 312 that directsassessment test administrator 2202 to cease listening. - In alternative embodiments,
assessment test administrator 2202 listens to the conversation between the patient and the clinician through clinician device 314.The clinician may manually start and stop listening byassessment test administrator 2202 throughclinician device 314 using conventional user-interface techniques. - During the conversation passively heard by
assessment test administrator 2202,assessment test administrator 2202 assesses the patient's speech and not the clinician's speech.Assessment test administrator 2202 may distinguish the voices in any of a number of ways, e.g., by a “MUTE” control on the clinician's iPad. In embodiments in whichassessment test administrator 2202 listens throughpatient device 312,assessment test administrator 2202 uses acoustic models (e.g., acoustic models 2218) to distinguish the two voices.Assistant test administrator 2202 identifies the louder voice as that of the patient, assumingpatient device 312 is closer to the patient than to the clinician. This may also be the case in embodiments in whichclinician device 312 is set up to hear the patient more loudly. For example,clinician device 314 may be configured to listen through a highly directional microphone that the clinician directs toward the patient such that any captured audio signal represents the patient's voice much more loudly than other, ambient sounds such as the clinician's voice.Assessment test administrator 2202 may further distinguish the patient's voice from the clinician's voice usinglanguage models 2214, particularly, semantic pattern models such assemantic pattern modules 4004, to identify which of the two distinguished voices more frequently asks questions.Assessment test administrator 2202 may further distinguish the patient's voice from the clinician's voice using acoustic models 2016, which may identify and segment out the clinician's voice from an acoustic analysis of the clinician's voice performed prior to the clinical encounter. - Throughout the conversation between the patient and the clinician,
assessment test administrator 2202 assesses the mental state of the patient from the patient's speech in the manner described herein and finalizes the assessment upon detecting the conclusion of the conversation. - Runtime model server logic 704, shown in greater detail in
FIG. 18 , processes audiovisual signals representing the patient's responses in the interactive screening or monitoring conversation and, while the conversation is ongoing, estimates the current health of the patient from the audiovisual signals. - Automatic speech recognition (ASR)
logic 1804 is logic that processes speech represented in the audiovisual data from I/O logic 604 (FIG. 6 ) to identify words spoken in the audiovisual signal. The results of ASR logic 1804 (FIG. 18 ) are sent toruntime models 1802. -
Runtime models 1802 also receive the audiovisual signals directly from I/O logic 604. In a manner described more completely below,runtime models 1802 combine language, acoustic, and visual models to produceresults 1820 from the received audiovisual signal. In turn, interactive screening ormonitoring server logic 702 usesresults 1820 in real time as described above to estimate the current state of the patient and to accordingly make the spoken conversation responsive to the patient as described above. - In addition to identifying words in the audiovisual signal,
ASR logic 1804 also identifies where in the audiovisual signal each word appears and a degree of confidence in the accuracy of each identified word in this illustrative embodiment.ASR logic 1804 may also identify non-verbal content of the audiovisual signals, such as laughter and fillers for example, along with location and confidence information.ASR logic 1804 makes such information available toruntime models 1802. -
Runtime models 1802 include descriptive model andanalytics 1812, natural language processing (NLP)model 1806,acoustic model 1808, andvisual model 1810. -
NLP model 1806 includes a number of text-based machine learning models to (i) predict depression, anxiety, and perhaps other health states directly from the words spoken by the patient and (ii) model factors that correlate with such health states. Examples of machine learning that models health states directly include sentiment analysis, semantic analysis, language modeling, word/document embeddings and clustering, topic modeling, discourse analysis, syntactic analysis, and dialogue analysis. Models do not need to be constrained to one type of information. A model may contain information for example from both sentiment and topic based features. NLP information includes the score output of specific modules for example the score from a sentiment detector trained for sentiment rather than for mental health state. NLP information includes that obtained via transfer learning based systems. -
NLP model 1806 stores text metadata and modeling dynamics and shares that data withacoustic model 1808,visual model 1810, and descriptive model andanalytics 1812. Text data may be received directly fromASR logic 1804 as described above or may be received as text data fromNLP model 1806. Text metadata may include, for example, data identifying, for each word or phrase, parts of speech (syntactic analysis), sentiment analysis, semantic analysis, topic analysis, etc. Modeling dynamics includes data representing components of constituent models ofNLP model 1806. Such components include machine learning features ofNLP model 1806 and other components such as long short-term memory (LSTM) units, gated recurrent units (GRUs), hidden Markov model (HMM), and sequence-to-sequence (seq2seq) translation information. NLP metadata allowsacoustic model 1808,visual model 1810, and descriptive model andanalytics 1812 to correlate syntactic, sentimental, semantic, and topic information to corresponding portions of the audiovisual signal. Accordingly,acoustic model 1808,visual model 1810, and descriptive model andanalytics 1812 may more accurately model the audiovisual signal. -
Runtime models 1802 includeacoustic model 1808, which analyzes the audio portion of the audiovisual signal to find patterns associated with various health states, e.g., depression. Associations between acoustic patterns in speech and health are in some cases applicable to different languages without retraining. They may also be retrained on data from that language. A of the particular language spoken. Accordingly,acoustic model 1808 analyzes the audiovisual signal in a language-agnostic fashion. In this illustrative embodiment,acoustic model 1808 uses machine learning approaches such as convolutional neural networks (CNN), long short-term memory (LSTM) units, hidden Markov models (HMM), etc. for learning high-level representations and for modeling the temporal dynamics of the audiovisual signals. -
Acoustic model 1808 stores data representing attributes of the audiovisual signal and machine learning features ofacoustic model 1808 as acoustic model metadata and shares that data withNLP model 1806,visual model 1810, and descriptive model andanalytics 1812. The acoustic model metadata may include, for example, data representing a spectrogram of the audiovisual signal of the patient's response. In addition, the acoustic model metadata may include both basic features and high-level feature representations of machine learning features. More basic features may include Mel-frequency cepstral coefficients (MFCCs), and various log filter banks, for example, ofacoustic model 1808. High-level feature representations may include, for example, convolutional neural networks (CNNs), autoencoders, variational autoencoders, deep neural networks, and support vector machines ofacoustic model 1808. The acoustic model metadata allowsNLP model 1806 to, for example, use acoustic analysis of the audiovisual signal to improve sentiment analysis of words and phrases. The acoustic model metadata allowsvisual model 1810 and descriptive model andanalytics 1812 to, for example, use acoustic analysis of the audiovisual signal to more accurately model the audiovisual signal. - Runtime model server logic 504 (
FIG. 18 ) includesvisual model 1810, which infers various health states of the patient from face, gaze and pose behaviors.Visual model 1810 may include facial cue modeling, eye/gaze modeling, pose tracking and modeling, etc. These are merely examples. -
Visual model 1810 stores data representing attributes of the audiovisual signal and machine learning features ofvisual model 1810 as visual model metadata and shares that data withNLP model 1806,acoustic model 1808, and descriptive model andanalytics 1812. For example, the visual model metadata may include data representing face locations, pose tracking information, and gaze tracking information of the audiovisual signal of the patient's response. In addition, the visual model metadata may include both basic features and high-level feature representations of machine learning features. More basic features may include image processing features ofvisual model 1810. High-level feature representations may include, for example, CNNs, autoencoders, variational autoencoders, deep neural networks, and support vector machines ofvisual model 1810. The visual model metadata allows descriptive model andanalytics 1812 to, for example, use video analysis of the audiovisual signal to improve sentiment analysis of words and phrases. Descriptive model andanalytics 1812 may even use the visual model metadata in combination with the acoustic model metadata to estimate the veracity of the patient in speaking words and phrases for more accurate sentiment analysis. The visual model metadata allowsacoustic model 1808 to, for example, use video analysis of the audiovisual signal to better interpret acoustic signals associated with various gazes, poses, and gestures represented in the video portion of the audiovisual signal. - Descriptive features or descriptive analytics are interpretable descriptions that may be computed based on features in the speech, language, video, and metadata that convey information about a speaker's speech patterns in a way in which a stakeholder may understand. For example, descriptive features may include a speaker sounding nervous or anxious, having a shrill or deep voice, or speaking quickly or slowly. Humans can interpret “features” of voices, such as pitch, rate of speaking, and semantics, in order to mentally determine emotions. A descriptive analytics module, by applying interpretable labels to speech utterances, based on their features, differs from a machine learning module. Machine learning models also make predictions by analyzing features, but the methods by which machine learning algorithms process the features, and determine representations of those features, differs from how humans interpret them. Thus, labels that machine learning algorithms may “apply” to data, in the context of analyzing features, may not be labels that humans may be able to interpret.
- Descriptive model and analytics 1812 (
FIG. 18 ) may generate analytics and labels for numerous health states, not just depression. Examples of such labels include emotion, anxiety, how engaged the patient is, patient energy, sentiment, speech rate, and dialogue topics. In addition, descriptive model andanalytics 1812 applies these labels to each word of the patient's response and determines how significant each word is in the patient's response. While the significance of any given word in a spoken response may be inferred from the part of speech, e.g., articles and filler words as relatively insignificant, descriptive model andanalytics 1812 infers a word's significance from additional qualities of the word, such as emotion in the manner in which the word is spoken as indicated byacoustic model 1808. - Descriptive model and
analytics 1812 also analyzes trends over time and uses such trends, at least in part, to normalize analysis of the patient's responses. For example, a given patient might typically speak with less energy than others. Normalizing analysis for this patient might set a lower level of energy as “normal” than would be used for the general population. In addition, a given patient may use certain words more frequently than the general population and use of such words by this patient might not be as notable as such use would be by a different patient. Descriptive model andanalytics 1812 may analyze trends in real-time, i.e., while a screening or monitoring conversation is ongoing, and in non-real-time contexts. - Descriptive model and
analytics 1812 stores data representing the speech analysis and trend analysis described above, as well as metadata of constituent models of descriptive model andanalytics 1812, as descriptive model metadata and shares that data withNLP model 1806,acoustic model 1808, andvisual model 1810. The descriptive model metadata allowsNLP model 1806,acoustic model 1808, andvisual model 1810 to more accurately model the audiovisual signal. - Through
runtime models 1802, runtimemodel server logic 504 estimates a health state of a patient using what the patient says, how the patient says it, and contemporaneous facial expressions, eye expressions, and poses in combination and stores resulting data representing such estimation asresults 1820. Such provides a particularly accurate and effective tool for estimating the patient's health state. - Runtime
model server logic 504 sendsresults 1820 to I/O logic 604 (FIG. 6 ) to enable interactive screening ormonitoring server logic 502 to respond to the patient's responses, thereby making the screening or monitoring dialogue interactive in the manner described above. Runtime model server logic 504 (FIG. 18 ) also sendsresults 1820 to screening or monitoringsystem data store 410 to be included in the history of the subject. -
Model training logic 506, shown in greater detail inFIG. 19 , trains the models used by runtime model server logic 504 (FIG. 18 ). - Model training logic 506 (
FIG. 19 ) includesruntime models 1802 andASR logic 1804 and trainsruntime models 1802.Model training logic 506 sends the trained models tomodel repository 416 to makeruntime models 1802, as trained, available to runtimemodel server logic 504. -
FIG. 20A provides a more detailed example illustration of the backend screening or monitoring system of the embodiment ofFIG. 2 . In this example block diagram, theweb server 240 is expanded to illustrate that it includes a collection of functional modules. The primary component of theweb server 240 includes an input/output (IO) module 2041 for accessing the system via thenetwork infrastructure 250. This IO 2041 enables the collection of response data (in the form of at least speech and video data) and labels from the clients 260 a-n, and the presentation of prompting information (such as a question or topic), and feedback to the clients 260 a-n. The prompting materials is driven by theinteraction engine 2043, which is responsive to the needs of the system, user commands and preferences to fashion an interaction that maintains the clients' 260 a-n engagement and generates meaningful response data. The interaction engine will be discussed in greater detail below. - Truthfulness of the patient in answering questions (or other forms of interaction) posed by the screening or monitoring test is critical in assessing the patient's mental state, as is having a system that is approachable and that will be sought out and used by a prospective patient. The health screening or
monitoring system 200 encourages honesty of the patient in a number of ways. First, a spoken conversation provides the patient with less time to compose a response to a question, or discuss a topic, than a written response may take. This truncated time generally results in a more honest and “raw” answer. Second, the conversation feels, to the patient, more spontaneous and personal and is less annoying than an obviously generic questionnaire, especially when user preferences are factored into the interaction, as will be discussed below. Accordingly, the spoken interaction does not induce or exacerbate resentment in the patient for having to answer a questionnaire before seeing a doctor or other clinician. Third, the spoken interaction is adapted in progress to be responsive to the patient, reducing the patient's annoyance with the screening or monitoring test and, in some situations, shortening the screening or monitoring test. Fourth, the screening or monitoring test as administered by health screening ormonitoring system 200 relies on more than mere verbal components of the interaction. Non-verbal aspects of the interaction are leveraged synergistically with the verbal content to assess depression in the patient. In effect, ‘what is said’ is not nearly as reliably accurate in assessing depression as is ‘how it's said’. - The final component of the
web server 240 is a results and presentation module 2045 which collates the results from the model server(s) 230 and provides then to the clients 260 a-n via the IO 2041, as well as providing feedback information to theinteraction engine 2043 for dynamically adapting the course of the interaction to achieve the system's goals. Additionally, the results and presentation module 2045 additionally supplies filtered results to stakeholders 270 a-n via astakeholder communication module 2003. Thecommunication module 2003 encompasses a process engine, routing engine and rules engine. The rules engine embodies conditional logic that determines what, when and who to send communications to, the process engine embodies clinical and operational protocol logic to pass messages through a communications chain that may be based on serial completion of tasks and the routing engine gives the ability to send any messages to the user's platform of choice (e.g., cellphone, computer, landline, tablet, etc.). - The filtering and/or alteration of the results by the results and presentation module 2045 is performed when necessary to maintain HIPAA (Health Insurance Portability and Accountability Act of 1996) and other privacy and security regulations and policies such as GDPR and
SOC 2 compliance as needed and to present the relevant stakeholder 270 a-n with information of the greatest use. For example, a clinician may desire to receive not only the screening or monitoring classification (e.g., depressed or neurotypical) but additional descriptive features, such as suicidal thoughts, anxiety around another topic, etc. In contrast, an insurance provider may not need or desire many of these additional features, and may only be concerned with a diagnosis/screening or monitoring result. Likewise, a researcher may be provided only aggregated data that is not personally identifiable, in order to avoid transgression of privacy laws and regulations. - The IO 2041, in addition to connecting to the clients 260 a-n, provides connectivity to the user data 220 and the model server(s) 230. The collected speech and video data (raw audio and video files in some embodiments) are provided by the IO 2041 to the user data 220, runtime model server(s) 2010 and a
training data filter 2001. Label data from the clients 260 a-n is provided to alabel data set 2021 in the user data 220. This may be stored in various databases 2023. Label data includes not only verified diagnosed patients, but inferred labels collected from particular user attributes or human annotation. Client ID information and logs may likewise be supplied from the IO 2041 to the user data 220. The user data 220 may be further enriched with clinical andsocial records 210 sourced from any number of third party feeds. This may include social media information obtained from web crawlers, EHR databases from healthcare providers, public health data sources, and the like. - 13041 The training data filter 2001 may consume speech and video data and append
label data 2021 to it to generate a training dataset. This training dataset is provided to model training server(s) 2030 for the generation of a set of machine learned models. The models are stored in amodel repository 2050 and are utilized by the runtime model server(s) 2010 to make a determination of the screening or monitoring results, in addition to generating other descriptors for the clients 260 a-n. Themodel repository 2050 together with the model training server(s) 2030 and runtime model server(s) 2010 make up the model server(s) 250. The runtime model server(s) 2010 and model training server(s) 2030 are described in greater detail below in relation toFIGS. 20B and 21 , respectively. - In
FIG. 20B the runtime model server(s) 2010 is provided in greater detail. The server received speech and video inputs that originated from the clients 260 a-n. A signal preprocessor andmultiplexer 2011 performs conditioning on the inputted data, such as removal of noise or other artifacts in the signal that may cause modeling errors. These signal processing and data preparation tasks include diarization, segmentation and noise reduction for both the speech and video signals. Additionally, metadata may be layered into the speech and video data. This data may be supplied in this preprocessed form to a bus 2014 formodelers 2020 consumption and may also be subjected to any number of third parties, off the shelf Automatic Speech Recognition (ASR)systems 2012. TheASR 2012 output includes a machine readable transcription of the speech portion of the audio data. ThisASR 2012 output is likewise supplied to the bus 2014 for consumption by later components. The signal preprocessor andmultiplexer 2011 may be provided with confidence values, such as audio quality (signal quality, length of sample) and transcription confidence (how accurate the transcription is) values 2090 and 2091. -
FIG. 20B also includes ametadata model 2018. The metadata model may analyze patient data, such as demographic data, medical history data, and patient-provided data. - Additionally, clinical data, demographic data, and social data may be presented to the bus 2014 for subsequent usage by the modelers 202. Lastly, a
model reader 2013 may access protected models from amodel repository 2050 which are likewise provided to the bus 2014. Themodelers 2020 consume the models, preprocessed audio and visual data, andASR 2012 output to analyze the clients' 260 a-n responses for the health state in question. Unlike prior systems for modeling a health condition, the present system includes a natural language processing (NLP)model 2015, acoustic model 2016, andvideo model 2017 that all operate in concert to generate classifications for the clients' 260 a-n health state. These modelers not only operate in tandem, but consume outputs from one another to refine the model outputs. Each of the modelers and the manner in which they coordinate to enhance their classification accuracy will be explored in greater detail in conjunction with subsequent figures. - The output for each of these
modelers 2020 is provided, individually, to a calibration, confidence, and desireddescriptors module 2092. This module calibrates the outputs in order to produce scaled scores, as well as provides confidence measures for the scores. The desired descriptors module may assign human-readable labels to scores. The output of desireddescription module 2092 is provided to model weight and fusion engine 2019. This model weight and fusion engine 2019 combines the model outputs into a single consolidated classification for the health state of each client 260 a-n. Model weighting may be done using static weights, such as weighting the output of theNLP model 2015 more than either the acoustic model 2016 orvideo model 2017 outputs. However, more robust and dynamic weighting methodologies may likewise be applied. For example, weights for a given model output may, in some embodiments, be modified based upon the confidence level of the classification by the model. For example, if theNLP model 2015 classifies an individual as being not depressed, with a confidence of 0.56 (out of 0.00-1.00), but the acoustic model 2016 renders a depressed classification with a confidence of 0.97, in some cases the weight of a the models' outputs may be weighted such that the acoustic model 2016 is provided a greater weight. In some embodiments, the weight of a given model may be linearly scaled by the confidence level, multiplied by a base weight for the model. In yet other embodiments, model output weights are temporally based. For example, generally theNLP model 2015 may be afforded a greater weight than other models, however, when the user isn't speaking, thevideo model 2017 may be afforded a greater weight for that time domain. Likewise, if thevideo model 2017 and acoustic model 2016 are independently suggesting the person is being nervous and untruthful (frequent gaze shifting, perspiration increased, pitch modulation upward, increased speech rate, etc.) then the weight of theNLP model 2015 may be minimized, since it is likely the individual is not answering the question truthfully. - After model output fusion and weighting the resulting classification may be combined with features and other user information in a
multiplex output module 2051 in order to generate the final results. As discussed before, these results are provided back to the user data 220 for storage and potentially as future training materials, and also to the results and presentation module 2045 of thewebserver 240 for display, at least in part, to the clients 260 a-n and the stakeholders 270 a-n. These results are likewise used by theinteraction engine 2043 to adapt the interaction with the client 260 a-n moving forward. - Turning now to
FIG. 21 , the model training server(s) 2030 is provided in greater detail. Like the runtime model server(s) 2010, the model training server(s) 2030 consume a collection of data sources. However, these data sources have been filtered by the training data filter 2001 to provide only data for which label information is known or imputable. The model training server additionally takes as inputs audio quality confidence values 2095 (which may include bit rate, noise, and length of the audio signal) and transcription confidence values 2096. These confidence values may include the same types of data as those ofFIG. 20B . The filtered social, demographic, and clinical data, speech and video data, and label data are all provided to a preprocessor 2031 for cleaning and normalization of the filtered data sources. The processed data is then provided to a bus 2040 for consumption by various trainers 2039, and also to one or more third party ASR systems 2032 for the generation of ASR outputs, which are likewise supplied to the bus 2040. The signal preprocessor andmultiplexer 2011 may be provided with confidence values, such as audio quality (signal quality, length of sample) and transcription confidence (how accurate the transcription is) values 2095 and 2096. - The model trainers 2039 consume the processed audio, visual, metadata, and ASR output data in a NLP trainer 2033, an acoustic trainer 2034, a video trainer 2035, and a metadata trainer 2036. The trained models are provided, individually, to a calibration, confidence, and desired
descriptors module 2097. This module calibrates the outputs in order to produce scaled scores, as well as provides confidence measures for the scores. The desired descriptors module may assign human-readable labels to scores. The trained and calibrated models are provided to a fused model trainer 2037 for combining the trained models into a trained combinational model. Each individual model and the combined model may be stored in themodel repository 2050. Additionally and optionally, the trained models may be provided to a personalizer 2038, which leverages metadata (such as demographic information and data collated from social media streams) to tailor the models specifically for a given client 260 a-n. - For example, a particular model xo may be generated for classifying acoustic signals as either representing someone who is depressed, or not. The tenor, pitch and cadence of an audio input may vary significantly between a younger individual versus and elderly individual. As such, specific models are developed based upon if the patient being screened is younger or elderly (models xy and xe respectively). Likewise, women generally have variances in their acoustic signals as compared to men, suggesting that yet another set of acoustic models are needed (models xf and xm respectively). It is also apparent that combinational models are desired for a young woman versus an elderly woman, and a young man versus an elderly man (models xyf, xef, xym and xem respectively). Clearly, as further personalization groupings are generated the possible number of applicable models will increase exponentially.
- In some embodiments, if the metadata for an individual provides insight into that person's age, gender, ethnicity, educational background, accent/region they grew up in, etc. this information may be utilized to select the most appropriate model to use in future interactions with this given patient, and may be likewise used to train models that apply to individuals that share similar attributes.
- In addition to personalizing models based upon population segments and attributes, the personalizer 2038 may personalize a model, or set of models, for a particular individual based upon their past history and label data known for the individual. This activity is more computationally expensive than relying upon population wide, or segment wide, modeling, but produces more accurate and granular results. All personalized models are provided from the personalizer 2038 to the
model repository 2050 for retention until needed for patient assessment. - During analysis then, a client 260 a-n is initially identified, and when able, a personalized model may be employed for their screening or monitoring. If not available, but metadata is known for the individual, the most specific model for the most specific segment is employed in their screening or monitoring. If no metadata is available, then the model selected is the generic, population-wide model. Utilizing such a tiered modeling structure, the more information that is known regarding the client 260 a-n allows for more specific and accurate models to be employed. Thus, for each client 260 a-n, the ‘best’ model is leveraged given the data available for them.
- The general overall flow of information is shown in (
FIG. 22 ).Assessment test administrator 2202 of real-time system 302 conducts an interactive conversation with the patient throughpatient device 312. The responsive audiovisual signal of the patient is received by real-time system 302 frompatient device 312. The exchange of information between real-time system 302 andpatient device 312 may be through a purpose-built app executing inpatient device 112 or through a conventional video call betweenpatient device 312 and video call logic ofassessment test administrator 2202. While this illustrative embodiment uses an audiovisual signal to assess the state of the patient, it should be appreciated that, in alternative embodiments, an audio-only signal may be used with good results. In such alternative embodiments, an ordinary, audio-only telephone conversation may serve as the vehicle for assessment byassessment test administrator 2202. - In a manner described more completely below,
assessment test administrator 2202 uses composite model 2204 to assess the state of the patient in real-time, i.e., as the spoken conversation transpires. Such intermediate assessment is used, in a manner described more completely below, to control the conversation, making the conversation more responsive, and therefore more engaging, to the patient and to help make the conversation as brief as possible while maintaining the accuracy of the final assessment. - Modeling system 304 receives collected
patient data 2206 that includes the audiovisual signal of the patient during the assessment test. In embodiments in which the assessment test involves patient device 312 a purpose-built app executing inpatient device 312,modeling system 104 may receive collectedpatient data 2206 frompatient device 312. Alternatively, and in embodiments in which the assessment test involves a video or voice call withpatient device 312, modeling system 304 receives collectedpatient data 2206 from real-time system 302. - Modeling system 304 retrieves
clinical data 2220 fromclinical data server 306.Clinical data 2220 includes generally any available clinical data related to the patient, other patients assessed byassessment test administrator 2202, and the general public that may be helpful in training any of the various models described herein. -
Preprocessing 2208 conditions any audiovisual data for optimum analysis. Having a high-quality signal to start is very helpful in providing accurate analysis.Preprocessing 2208 is shown within modeling system 304. In alternative embodiments, preprocessing is included in real-time system 302 to improve accuracy in application of composite model 204. -
Speech recognition 2210 processes speech represented in the audiovisual data after preprocessing 2208, including automatic speech recognition (ASR). ASR may be conventional.Language model training 2212 uses the results ofspeech recognition 2210 to train language models 214. -
Acoustic model training 2216 uses the audiovisual data after preprocessing 2208 to trainacoustic models 2218.Visual model training 2224 uses the audiovisual data after preprocessing 2208 to trainvisual models 2226. To the extent sufficient data (both collectedpatient data 2206 and clinical data 2222) is available for the subject patient,language model training 2212,acoustic model training 2216, andvisual model training 2224train language models 2214,acoustic models 2218, andvisual models 2226, respectively, specifically for the subject patient. Training may also useclinical data 2222 for patients that share one or more phenotypes with the subject patient. - In a manner described more completely below,
composite model builder 2222 useslanguage models 2214,acoustic models 2218, andvisual models 2226, in combination withclinical data 2220, to combine language, acoustic, and visual models into composite model 2204. In turn,assessment test administrator 2202 uses composite model 2204 in real time to assess the current state of the subject patient and to accordingly make the spoken conversation responsive to the subject patient as described more completely below. - As mentioned above,
assessment test administrator 2202 administers a depression assessment test to the subject patient by conducting an interactive spoken conversation with the subject patient throughpatient device 312. - Attention will now be focused upon the specific models used by the runtime model server(s) 2010. Moving on to
FIG. 23A , a general block diagram for one example substantiation of the acoustic model 2016 is provided. The speech and video data is provided to a highlevel feature representor 2320 that operates in concert with a temporal dynamics modeler 2330. Influencing the operation of these components is amodel conditioner 2340 that consumed features from thedescriptive features 2018, results generated from the speech andvideo models - Returning to the acoustic model 2016, the high
level feature representor 2320 and temporal dynamics modeler 2330 also receive raw and higherlevel feature extractor 2310 outputs, that identify features within the incoming acoustic signals, and feeds them to the models. The highlevel feature representor 2320 and temporal dynamics modeler 2330 generate the acoustic model results, which may be fused into a final result that classifies the health state of the individual, and may also be consumed by the other models for conditioning purposes. - The high
level feature representor 2320 includes leveraging existing models for frequency, pitch, amplitude and other acoustic features that provide valuable insights into feature classification. A number of off-the-shelf “black box” algorithms accept acoustic signal inputs and provide a classification of an emotional state with an accompanying degree of accuracy. For example, emotions such as sadness, happiness, anger and surprise are already able to be identified in acoustic samples using existing solutions. Additional emotions such as envy, nervousness, excited-ness, mirth, fear, disgust, trust and anticipation will also be leveraged as they are developed. However, the present systems and methods go further by matching these emotions, strength of the emotion, and confidence in the emotion, to patterns of emotional profiles that signify a particular mental health state. For example, pattern recognition may be trained, based upon patients that are known to be suffering from depression, to identify the emotional state of a respondent that is indicative of depression. -
FIG. 23B shows an embodiment ofFIG. 23A including anacoustic modeling block 2341. Theacoustic modeling block 2341 includes a number of acoustic models. The acoustic models may be separate models that use machine learning algorithms. The illustrated listing of models shown inFIG. 23B is not necessarily an exhaustive listing of possible models. These models may include a combination of existing third party models and internally derived models.FIG. 23B includes acoustic embeddingmodel 2342, spectraltemporal model 2343,acoustic effect model 2345,speaker personality model 2346,intonation model 2347, temporal/speakingrate model 2348,pronunciation models 2349, andfluency models 2361. The machine learning algorithms used by these models may include neural networks, deep neural networks, support vector machines, decision trees, hidden Markov models, and Gaussian mixture models. -
FIG. 23C shows a score calibration andconfidence module 2370. The score calibration andconfidence module 2370 includes ascore calibration module 2371 and aperformance estimation module 2374. Thescore calibration module 2371 includes aclassification module 2372 and amapping module 2373. - The score calibration and
confidence module 2370 may accept as inputs a raw score, produced by a machine learning algorithm, such as a neural network or deep learning network, that may be analyzing audiovisual data. The score calibration andconfidence module 2370 may also accept a set of labels, with which to classify data. The labels may be provided by clinicians. Theclassification module 2371 may apply one or more labels to the raw score, based on the value of the score. For example, if the score is a probability near 1, theclassification module 2371 may apply a “severe” label to the score. Theclassification module 2371 may apply labels based on criteria set by clinicians, or may algorithmically determine labels for scores, e.g., using a machine learning algorithm. Themapping module 2372 may scale the raw score to fit within a range of numbers, such as 120-180 or 0-700. Theclassification module 2371 may operate before or after themapping module 2372. - After calibrating the data, the score calibration and
confidence module 2370 may determine a confidence measure 2376 by estimating a performance for the labeled, scaled score. The performance may be estimated by analyzing features of the collected data, such as duration, sound quality, accent, and other features. The estimated performance may be a weighted parameter that is applied to the score. This weighted parameter may comprise the score confidence. - To provide greater context, and clarification around the acoustic model's 2016 operation, a highly simplified and single substantiation of one possible version of the high
level feature representor 2320 is provided in relation toFIG. 24 . It should be noted that this example is provided for illustrative purposes only, and is not intended to limit the embodiments of the highlevel feature representor 2320 in any way. - In this example embodiment, the raw and high
level feature extractor 2310 takes the acoustic data signal and converts it into aspectrogram image 2321.FIG. 55 provides an example image of such a spectrogram 5500 of a human speaking. A spectrogram of this sort provides information along one axis regarding the audio signal frequency, amplitude of the signal (here presented in terms of intensity/how dark the frequency is labeled), and time. Such a spectrogram 5500 is considered a raw feature of the acoustic signal, as would pitch, cadence, energy level, etc. - A spectrogram sampler 2323 then selects a portion of the image at a constant timeframe, for example between time zero and 10 seconds is one standard sample size, but other sample time lengths are possible.
FIG. 56 provides an example of a sampled portion 5502 of the spectrogram 5600. This image data the then represented as an MxN matrix (x), in this particular non-limiting example. An equation that includes x as a variable, and for which the solution is known, is then processed to determine estimates of the unknown variables (matrices and vectors) within the equation. For example, a linear equation such as: ŷ=wTx+b may be utilized. As noted, the solution y is known. - 13351 This includes determining a set of randomized guesses for the unknown variables (wT and b in this example equation). The equation is solved for, using these guessed variables, and the error of this solved solution is computed using the known solution value. The error may be computed as:
-
- By repeating this process iteratively, thousands if not millions of times, values for the variables that are approximates to the actual variable values may be determined. This is a brute force regression, where the error value (Ê) is minimized for.
- This approximate value is an abstraction of the mental state being tested, dependent upon the input equation. The system may have previously determined threshold, or
cutoff values 2322, for the variables which indicate if the response is indicative of the mental state or not. These cutoff values are trained for by analyzing responses from individuals for which the mental state is already known. - Equation determination may leverage deep learning techniques, as previously discussed. This may include recurrent neural networks 2324 and/or convolutional neural networks 2325. In some cases, long short-term memory (LSTM) or gated recurrent unit (GRU) may be employed, for example. In this manner, depression, or alternate mental states may be directly analyzed for in the acoustic portion of the response. This, in combination with using off-the-shelf emotion detection ‘black box’ systems, with pattern recognition, may provide a robust classification by a
classifier 2326 of the mental state based upon the acoustic signal which, in this example, is provided as acoustic analysis output 2327. - As noted above, this example of using a spectrogram as a feature for analysis is but one of many possible substantiations of the high level feature representor's 2320 activity. Other features and mechanisms for processing these features may likewise be analyzed. For example pitch levels, isolated breathing patterns, total energy of the acoustic signal, or the like may all be subject to similar temporally based analysis to classify the feature as indicative of a health condition.
- Turning now to
FIG. 25 , theNLP model 2015 is provided in greater detail. This system consumes the output from theASR system 2012 and performs post-processing on it via an ASRoutput post processor 2510. This post processing includes reconciling the ASR outputs (when multiple outputs are present). Post processing may likewise include n-gram generation, parsing activities and the like. - Likewise, the results from the video and
acoustic models 2016 and 2017 respectively, as well as clinical and social data are consumed by a model conditioner 2540 for altering the functioning of thelanguage models 2550. Thelanguage models 2550 operate in concert with a temporal dynamics modeler 2520 to generate the NLP model results. - The
language models 2550 include a number of separate models. The illustrated listing of models shown inFIG. 25 is not necessarily an exhaustive listing of possible models. These models may include a combination of existing third party models and internally derived models. Language models may use standard machine learning or deep learning algorithms, as well as language modeling algorithms such as n-grams. For example,sentiment model 2551 is a readily available third party model that uses either original text samples or spoken samples that have been transcribed by a human or machine speech recognizer, to output to determine if the sentiment of the discussion is generally positive or negative. In general, a positive sentiment is inversely correlated with depression, whereas a negative sentiment is correlated with a depression classification. -
Statistical language model 2552 utilizes n-grams and pattern recognition within the ASR output to statistically match patterns and n-gram frequency to known indicators of depression. For example, particular sequences of words may be statistically indicative of depression. Likewise, particular vocabulary and word types used by a speaker may indicate depression or not having depression. - A
topic model 2553 identifies types of topics within the ASR output. Particular topics, such as death, suicide, hopelessness and worth (or lack thereof) may all be positively correlated with a classification of depression. Additionally, there is a latent negative correlation between activity (signified by verb usage) and depression. Thus, ASR outputs that are high in verb usage may indicate that the client 260 a-n is not depressed. Furthermore, topic modeling based on the known question or prompt given the subject, can produce better performance via using pre-trained topic-specific models for processing the answer for mental health state. - Syntactic model 2554 identifies situations where the focus of the ASR output is internal versus external. The usage of terms like ‘I’ and ‘me’ are indicative of internal focus, while terms such as ‘you’ and ‘they’ are indicative of a less internalized focus. More internal focus has been identified as generally correlated with an increased chance of depression. Syntactic model 2554 may additionally look at speech complexity. Depressed individuals tend to have a reduction in sentence complexity. Additionally, energy levels, indicated by language that is strong or polarized, is negatively correlated with depression. Thus, someone with very simple, sentences focused internally, and with low energy descriptive language would indicate a depressed classification.
- Embedding and clustering model 2556 maps words to prototypical words or word categories. For example, the terms “kitten”, “feline” and “kitty” may all be mapped to the term “cat”. Unlike the other models, the embedding and clustering model 2556 does not generate a direct indication of whether the patient is depressed or not, rather this model's output is consumed by the
other language models 2550. - A dialogue and
discourse model 2557 identifies latency and usage of spacer words (“like”, “umm”, etc.) Additionally the dialogue anddiscourse model 2557 identifies dialogue acts such as questions versus statements. - An emotion or affect
model 2558 provides a score, typically a posterior probability over a set of predetermined emotions (for example happy, sad) that describes how well the sample matches pre-trained models for each of the said emotions. These probabilities can then be used in various forms as input to the mental health state models, and/or in a transfer learning set up. Aspeaker personality model 2559 provides a score, typically a posterior probability over a set of predetermined speaker personality traits (for example agreeableness, openness) that describes how well the sample matches pre-trained models for each of the said traits. These probabilities can then be used in various forms as input to the mental health state models, and/or in a transfer learning set up. - The non-verbal model 2561 using ASR events may provide a score based on non-lexical speech utterances of patients, which may regardless be indicative of mental state. These utterances may be laughter, sighs, or deep breaths, which may be picked up and transcribed by an ASR.
- The text
quality confidence module 2560 determines a confidence measure for the output of the ASRoutput post processor 2510. The confidence measure may be determined based on text metadata (demographic information about the patient, environmental conditions, method of recording, etc.) as well as context (e.g., length of speech sample, question asked). - It should be noted that each of these models may impact one another and influence the results and/or how these results are classified. For example, a low energy language response typically is indicative of depression, whereas high energy verbiage would negatively correlate with depression.
- Turning now to the
video model 2017 ofFIG. 26 , again we see a collection offeature extractors 2610 that consume the video data. Within thefeature extractors 2610 there is aface bounder 2611 which recognizes the edges of a person's face, and extract this region of the image for processing. Obviously, facial features provide significant input on how an individual is feeling. Sadness, exhaustion, worry, and the like, are all associated with a depressive state, whereas jubilation, excitation, and mirth are all negatively correlated with depression. - Additionally, more specific bounders are contemplated, for example the region around the eyes may be analyzed separately from regions around the mouth. This allows greater emphasis to be placed upon differing image regions based upon context. In this set of examples, the region around the mouth generally provides a large amount of information regarding an individual's mood, however when a person is speaking, this data is more likely to be inaccurate due to movements associated with the speech formation. The acoustic and language models may provide insight as to when the user is speaking in order to reduce reliance on the analysis of a mouth region extraction. In contrast, the region around the eyes is generally very expressive when someone is speaking, so the reliance upon this feature is relied upon more during times when the individual is speaking.
- A
pose tracker 2612 is capable or looking at larger body movements or positions. A slouched position indicates unease, sadness, and other features that indicate depression. The presence of excessing fidgeting, or conversely unusual stillness likewise are indicative of depression. Moderate movement and fidgeting, however, is not associated with depression. Upright posture and relaxed movement likewise are inversely related to a depressive classification. Lastly, even the direction that the individual sits or stands is an indicator of depression. A user who directly faces the camera is less likely to be depressed. In contrast, an individual that positions their body oblique to the camera, or otherwise covers themselves (by crossing their arms for example) is more likely to be depressed. - A
gaze tracker 2613 is particularly useful in determining where the user is looking, and when (in response to what stimulus) the person's gaze shifts. Looking at the screen or camera of the client device 260 a-n indicates engagement, confidence and honesty—all hallmarks of a non-depressed state. Looking down constantly, on the other hand, is suggestive of depression. Constantly shifting gaze indicates nervousness and dishonesty. Such feedback may be used by theNLP model 2015 to reduce the value of analysis based on semantics during this time period as the individual is more likely to be hedging their answers and/or outright lying. This is particularly true if the gaze pattern alters dramatically in response to a stimulus. For example, if the system asks if the individual has had thoughts of self-harm, and suddenly the user looks away from the camera and has a shifting gaze, a denial of such thought (which traditionally would be counted strongly as an indication of a non-depressed state) is discounted. Rather, emphasis is placed on outputs of the acoustic model 2016, and thevideo model 2017 from analysis of the other extracted features. - The image processing features
extractor 2614 may take the form of any number of specific feature extractions, such as emotion identifiers, speaking identifiers (from the video as opposed to the auditory data), and the above disclosed specific bounder extractors (region around the eyes for example). All of the extracted features are provided to a high-level feature representor 2620 and classifier and/orregressor 2630 that operate in tandem to generate the video model results. As with the other models, thevideo model 2017 is influenced by the outputs of theNLP model 2015 and the acoustic model 2016, as well as clinical and social data. Themodel conditioner 2640 utilizes this information to modify what analysis is performed, or the weight afforded to any specific findings. - The
descriptive features module 2018 ofFIG. 27 includesdirect measurements 2710 andmodel outputs 2720 that result from the analysis of the speech and video data. The descriptive features module may not be included in either theruntime model servers 2010 ormodel training servers 2030. Instead, descriptive features may be incorporated in the acoustic and NLP models. Disclosed in the description ofFIG. 27 are examples of descriptive features. Manydifferent measurements 2710 andmodel outputs 2720 are collected by thedescriptive features 2018 module. For example, measurements include at leastspeech rate analyzer 2711 which tracks a speaker's words per minute. Faster speech generally indicates excitement, energy and/or nervousness. - Slow speech rates on the other hand are indicative of hesitancy, lethargy, or the presence of a difficult topic. Alone, this measurement has little value, but when used as an input for other models, the speech rate provides context that allows for more accurate classification by these other models. Likewise
energy analyzer 2713 measures the total acoustic energy in an audio component. Increased energy may indicate emphasis on particular portions of the interaction, general excitement or lethargy levels, and the like. Again, such information alone provides very little in determining if a person has depression, but when combined with the other models is useful for ensuring that the appropriate classification is being made. For example, if the energy level increases when a person is speaking about their pet dog, the system determines that this topic is of interest to the individual, and if a longer interaction is needed to collect additional user data for analysis, the interaction may be guided to this topic. Atemporal analyzer 2715 determines the time of the day, week and year, in order to provide context around the interaction. For example, people are generally more depressed in the winter months, around particular holidays, and at certain days of the week and times of the day. All this timing information is usable to alter the interaction (by providing topicality) or by enabling classification thresholds to be marginally altered to reflect these trends. - The model outputs 2720 may include a
topic analyzer 2721, various emotion analyzers 2723 (anxiety, joy, sadness, etc.),sentiment analyzer 2725,engagement analyzer 2727, andarousal analyzer 2729. Some of these analyzers may function similarly in the other models; for example theNLP model 2015 already includes asentiment model 2551, however thesentiment analyzer 2725 in thedescriptive features 2018 module operates independently from the other models, and includes different input variables, even if the output is similar. - The
engagement analyzer 2727 operates to determine how engaged a client 260 a-n is in the interaction. High levels of engagement tend to indicate honesty and eagerness.Arousal analyzer 2729 provides insights into how energetic or lethargic the user is. A key feature of thedescriptive features 2018 module is that each of these features, whether measured or the result of model outputs, is normalized by the individual by anormalizer 2730. For example, some people just speak faster than others, and a higher word per minute measurement for this individual versus another person may not indicate anything unusual. The degree of any of these features is adjusted for the baseline level of the particular individual by thenormalizer 2730. Obviously, thenormalizer 2730 operates more accurately the more data that is collected for any given individual. A first time interaction with a client 260 a-n cannot be effectively normalized immediately, however as the interaction progresses, the ability to determine a baseline for this person's speech rate, energy levels, engagement, general sentiment/demeanor, etc. may be more readily ascertained using standard statistical analysis of variation of these features over time. This becomes especially true after more than one interaction with any given individual. - After normalization, the system may identify trends in these features for the individual by analysis by a
trend tracker 2740. The trend tracker splits the interaction by time domains and looks for changes in values between the various time periods. Statistically significant changes, and especially changes that continue over multiple time periods, are identified as trends for the feature for this individual. The features, both in raw and normalized form, and any trends are all output as the descriptive results. - Although not addressed in any of the Figures, it is entirely within the scope of embodiments of this disclosure that additional models are employed to provide classification regarding the client's 260 a-n health state using alternate data sources. For example, it has been discussed that the client devices may be capable of collecting biometric data (temperature, skin chemistry data, pulse rate, movement data, etc.) from the individual during the interaction. Models focused upon these inputs may be leveraged by the runtime model server(s) 2010 to arrive at determinations based upon this data. The disclosed systems may identify chemical markers in the skin (cortisol for example), perspiration, temperature shifts (e.g. flushing), and changes in heart rate, etc. for diagnostic purposes.
- Now that the specifics of the runtime model server(s) 2010 has been discussed in considerable depth, attention will be turned to the
interaction engine 2043, as seen in greater detail in relation toFIG. 28 . A process flow diagram featuring the components of theinteraction engine 2043 is featured inFIG. 8 . Theinteraction engine 2043 dictates the interactions between the web server(s) 240 and the clients 260 a-n. These interactions, as noted previously may consist of a question and answer session, with a set number and order or questions. In such embodiments, this type of assessment is virtually an automated version of what has previously been leveraged for depression diagnosis, except with audio and video capture for improved screening or monitoring accuracy. Such question and answer may be done with text questions displayed on the client device, or through a verbal recording of a question. However, such systems are generally not particularly engaging to a client 260 a-n, and may cause the interaction to not be completed honestly, or terminated early. As such, it is desirable to have a dynamic interaction which necessitates a moreadvanced interaction engine 2043, such as the one seen in the present Figure. - This
interaction engine 2043 includes the ability to take a number of actions, including different prompts, questions, and other interactions. These are stored in a question andaction bank 2810. Theinteraction engine 2043 also includes a history andstate machine 2820 which tracks what has already occurred in the interaction, and the current state of the interaction. - The state and history information, database of possible questions and actions, and additional data is consumed by an
interaction modeler 2830 for determining next steps in the interaction. The other information consumed consists of user data, clinical data and social data for the client being interacted with, as well as model results, NLP outputs and descriptive feature results. The user data, clinical data and social media data are all consumed by a user preference analyzer 2832 for uncovering the preferences of a user. As noted before, appealing to the user is one of the large hurdles to successful screening or monitoring. If a user doesn't want to use the system they will not engage it in the first place, or may terminate the interaction prematurely. Alternatively, an unpleasant interaction may cause the user to be less honest and open with the system. Not being able to properly screen individuals for depression, or health states generally, is a serious problem, as these individuals are likely to continue struggling with their disease without assistance, or even worse die prematurely. Thus, having a high degree of engagement with a user may literally save lives. - By determining preference information, the interactions are tailored in a manner that appeals to the user's interests and desires. Topics identified within social media feeds are incorporated into the interaction to pique interest of the user. Collected preference data from the user modulates the interaction to be more user friendly, and particular needs or limitations of the user revealed in clinical data are likewise leveraged to make the interaction experience user-friendly. For example, if the clinical data includes information that the user experiences hearing loss, the volume of the interaction may be proportionally increased to make the interaction easier. Likewise, if the user indicates their preferred language is Spanish, the system may automatically administer the interaction in this language.
- 13671 The descriptive features and model results, in contrast, are used by a
user response analyzer 2831 to determine if the user has answered the question (when the interaction is in a question-answer format), or when sufficient data has been collected to generate an appropriate classification if the interaction is more of a ‘free-form’ conversation, or even a monologue by the client about a topic of interest. - Additionally, a
navigation module 2834 receives NLP outputs and semantically analyzes the NLP results for command language in near real time. Such commands may include statements such as “Can you repeat that?”, “Please speak up”, “I don't want to talk about that”, etc. These types of ‘command’ phrases indicate to the system that an immediate action is being requested by the user. - Output from each of the
navigation module 2834,user response analyzer 2831 and user preference analyzer 2832 are provided to anaction generator 2833, in addition to access to the question andadaptive action bank 2810 and history andstate machine 2820. Theaction generator 2833 applies a rule based model to determine which action within the question andadaptive action bank 2810 is appropriate. Alternatively, a machine learned model is applied in lieu of a rule based decision model. This results in the output of a customized action that is supplied to the IO 2041 for communication to the client 260 a-n. The customized action is likewise passed back to the history andstate machine 2820 so that the current state, and past actions may be properly logged. Customized actions may include, for example, asking a specific question, prompting a topic, switching to another voice or language, ending the interaction, altering the loudness of the interaction, altering speech rates, font sizes and colors, and the like. - Now that the structures and systems of the health screening or monitoring system 2000 have been described in considerable detail, attention will now be turned to one
example process 2900 of health screening or monitoring of a client. In this example process the clinical and social data for the clients are collated and stored within the data store (at step 2910). This information may be gathered from social media platforms utilizing crawlers or similar vehicles. Clinical data may be collected from health networks, physicians, insurance companies or the like. In some embodiments, the health screening or monitoring system 2000 may be deployed as an extension of the care provider, which allows the sharing of such clinical data with reduced concerns with violation of privacy laws (such as HIPAA). However, when the health screening or monitoring system 2000 is operated as a separate entity, outside a healthcare network, additional consents, encryption protocols, and removal of personally identifiable information may be required to enable open sharing of the clinical data while staying in compliance with applicable regulations. Clinical data may include electronic health records, physician notes, medications, diagnoses and the like. - Next, the process may require that models are available to analyze a client's interaction. Initial datasets that include labeling data (confirmed or imputed diagnoses of depression) are fed to a series of trainers that train individual models, and subsequently fuse them into a combined model (at 2920). Such training may also include personalization of models when additional metadata is available.
-
FIG. 30 provides a greater detailed illustration of an example process for such model training. As mentioned, label data is received (at 3010). Labels include a confirmed diagnosis of depression (or other health condition being screened for). Likewise, situations where the label may be imputed or otherwise estimated are used to augment the training data sets. - Imputed label data is received by a manual review of a medical record and/or interaction record with a given client. For example, in prediction mode, when the label is unknown, it is possible to decide whether it is possible to estimate a label for a data point given other information such as patient records, system predictions, clinically-validated surveys and questionnaires, and other clinical data. Due to the relative rarity of label data sets, and the need for large numbers of training samples to generate accurate models, it is often important that the label data includes not just confirmed cases of depression, but also these estimated labels.
- Additionally, the process includes receiving filtered data (at 3020). This data is filtered so that only data for which labels are known (or estimated) is used. Next each of the models is trained. Such training includes training of the NLP model (at 3030), the acoustic model (at 3040) the video model (at 3050) and the descriptive features (at 3060). It should be noted that these training processes occur in any order, or are trained in parallel. In some embodiments the parallel training includes generating cross dependencies between the various models. These cross dependencies are one of the critical features that render the presently disclosed systems and methods uniquely capable of rendering improved and highly accurate classifications for a health condition.
- The resulting trained models are fused, or aggregated, and the final fused trained model may be stored (at 3070). The models (both individual and fused models) are stored in a model repository. However, it is also desirable to generate model variants that are customized to different population groups or even specific individuals (at 3080).
- 13761 The process for model customization and personalization is explored in further depth in relation to
FIG. 31 . Personalization relies upon metadata stored with the filtered training data. This metadata is received (at 3081). Particular population segment features are identified in the metadata and extracted out (at 3082). These segment features are used to train models that are specific to that segment. This is accomplished by clustering the filtered training data by these segmentation features (at 3083). A given training piece may be included in a number of possible segments, each non-overlapping, or of continually increasing granularity. - For example, assume labeled training data is received from a known individual. This individual is identified as a black woman in her seventies, in this example. This training data is then used to train for models specific to African American individuals, African American women, women, elderly people, elderly women, elderly African American people, and elderly African American women. Thus, this single piece of training data is used to generate seven different models, each with slightly different scope and level of granularity. In situations where age is further divided out, this number of models being trained off of this data is increased even further (e.g., adult women, women over 50, women over 70, individuals over 70, etc.). The models are then trained on this segment-by-segment basis (at 3084). The customized models are annotated by which segment(s) they are applicable to (at 3085), allowing for easy retrieval when a new response is received for classification where information about the individual is known, and may be utilized to select the most appropriate/tailored model for this person.
- This is important because, often, the model for one identifying a health condition in one individual may be wholly inadequate for classifying another individual. For example, a Caucasian person may require different video models compared to an individual of African descent. Likewise, men and women often have divergent acoustic characteristics that necessitate the leveraging of different acoustic models to accurately classify them. Even a woman in her early twenties sounds different than a woman in her fifties, which again differs from a woman in her eighties. NLP models for a native speaker, versus a second language speaker, may likewise be significantly different. Even between generations, NLP models differ significantly to address differences in slang and other speech nuances. By making models available for individuals at different levels of granularity, the most appropriate model may be applied, thereby greatly increasing classification accuracy by these models.
- Returning to
FIG. 30 , after this personalization is completed, the customized models are also stored in the model repository, along with the original models and fused models (at 3090). It should be noted that while model customization generally increases classification accuracy, any such accuracy gains are jeopardized if a low number of training datasets are available for the models. The system tracks the number of training data sets that are used to train any given customized model, and only models with sufficiently large enough training sets are labeled as ‘active’ within the model repository. Active models are capable of being used by the runtime model server(s) 2010 for processing newly received response data. Inactive models are merely stored until sufficient data has been collected to properly train these models, at which time they are updated as being active. - Returning to
FIG. 29 , after model training, the process may engage with an interaction with a client (at 2930). This interaction may consist of a question and answer style format, a free-flowing conversation, or even a topic prompt and the client providing a monologue style input. -
FIG. 32 provides an example of this interaction process. Initially the system needs to be aware of the current state of the interaction (at 3210) as well as the historical action that have been taken in the interaction. A state machine and log of prior actions provides this context. The process also receives user, clinical and social data (at 3220). This data is used to extract user preference information (at 3230). For example, preferences may be explicitly directed in the user data, such as language preferences, topic of interest, or the like. Alternatively, these preferences are distilled from the clinical and social data. For example the social data provides a wealth of information regarding the topics of interest for the user, and clinical data provides insight into any accessibility issues, or the like. - Additionally, the model results are received (at 3240), which are used to analyze the user's responses (at 3250) and make decisions regarding the adequacy of the data that has already been collected. For example, if it is determined via the model results that there is not yet a clear classification, the interaction will be focused on collecting more data moving forward. Alternatively, if sufficient data has been collected to render a confident classification, the interaction may instead be focused on a resolution. Additionally, the interaction management will sometimes receive direct command statements/navigational commands (at 3260) from the user. These include actions such as repeating the last dialogue exchange, increasing or decreasing the volume, rephrasing a question, a request for more time, a request to skip a topic, and the like.
- All this information is consumed by the action generator to determine the best course of subsequent action (at 3270). The action is selected from the question and adaptive action bank responsive to the current state (and prior history of the interaction) as well as any commands, preferences, and results already received. This may be completed using a rule based engine, in some embodiments. For example, direct navigational commands may take precedence over alternative actions, but barring a command statement by the user, the model responses may be checked against the current state to determine if the state objective has been met. If so, an action is selected from the repository that meets another objective that has not occurred in the history of the interaction. This action is also modified based on preferences, when possible. Alternatively, the action selection is based on a machine learned model (as opposed to a rule based system).
- The customized action is used to manage the interaction with the client, and also is used to update the current state and historical state activity (at 3280). The process checks if the goals are met, and if the interaction should be concluded (at 3290). If not, then the entire process may be repeated for the new state and historical information, as well as any newly received response data, navigational commands, etc.
- Returning to
FIG. 29 , during interaction (and after interaction completion when required based upon processing demands) the client response data is collected (at 2940). This data includes video/visual information as well as speech/audio information captured by the client device's camera(s) and microphone(s), respectively. Although not discussed in great depth, the collected data may likewise include biometric results via haptic interfaces or the like. The health state is then classified using this collected response data (at 2950). -
FIG. 33 provides a greater detail of the example process for classification. The models are initially retrieved (at 3310) from the model repository. The user data, social data, clinical data and speech and visual data are all provided to the runtime model server(s) for processing (at 3330). The inclusion of the clinical and/or social data sets the present screening or monitoring methodologies apart from prior screening or monitoring methods. - This data is preprocessed to remove artifacts, noise and the like. The preprocessed data is also multiplexed into (at 3330). The preprocessed and multiplexed data is supplied to the models for analysis, as well as to third party ASR systems (at 3340). The ASR output may be consolidated (when multiple ASR systems are employed in concert), and the resulting machine readable speech data is also provided to the models. The data is then processed by the NLP model (at 3350a), the acoustic model (at 3350b), the video model (at 3350c) and for descriptive features (at 3350d). Each of the models operates in parallel, with results from any given model being fed to the others to condition their operations. A determination is made if the modeling is complete (at 3360). Due to the fact that the model results are interdependent upon results of the alternative models, the process of modeling is cyclical, in some cases, whereby the models are conditioned (at 3370) with the results of the other models, and the modeling process repeats until a finalized result is determined.
-
FIG. 34 describes the process of model conditioning in greater detail. Model conditioning essentially includes three sub-processes operating in parallel, or otherwise interleaved. These include the configuration of the NLP model using the results of the acoustic model and video model, in addition to the descriptive features (at 3371), the configuration of the acoustic model using the results of the NLP model and video model, in addition to the descriptive features (at 3372), and configuration of the video model using the results of the acoustic model and NLP model, in addition to the descriptive features (at 3373). As previously discussed, this conditioning is not a clearly ordered process, as intermediate results from the acoustic model for example may be used to condition the NLP model, the output of which may influence the video model, which then in turn conditions the acoustic model, requiring the NLP model to be conditioned based upon updated acoustic model results. This may lead to looped computing processes, wherein each iteration the results are refined to be a little more accurate than the previous iteration. Artificial cutoffs are imposed in such computational loops to avoid infinite cycling and breakdown of the system due to resource drain. These cutoffs are based upon number of loop cycles, or upon the degree of change in a value between one loop cycle and the next. Over time, the results from one loop cycle to the next become increasingly closer to one another. At some point additional looping cycles are not desired due to the diminishing returns to the model accuracy for the processing resources spent. - One example of this kind of conditioning is when the NLP model determines that the user is not speaking. This result is used by the video model to process the individuals facial features based upon mouth bounding and eye bounding. However, when the user is speaking, the video model uses this result to alter the model for emotional recognition to rely less upon the mouth regions of the user and rather rely upon the eye regions of the user's face. This is but a single simplified example of one type of model conditioning, as is not limiting.
- Returning to
FIG. 33 , after modeling is completed, each model is then combined (fused) by weighting the classification results by the time domains (at 3380). This sub process is described in greater detail in relation toFIG. 35 . As noted before, sometimes one model is relied upon more heavily than another model due to the classification confidence, or based upon events in the response. The clearest example of this is that if there is a period of time in which the user is not speaking, then the NLP model classification for this time period should be minimized, whereas the weights for video modeling and acoustic modeling should be afforded a much larger weight. Likewise, if two models are suggesting that the third model is incorrect or false, due to dishonesty or some other dissonance, then the odd model's classification may also be weighted lower than the other models accordingly. - In
FIG. 35 , this weighting process involves starting with a base weight for each model (at 3381). The response is then divided up into discrete time segments (at 3382). The length of these time segments is configurable, and in one embodiment, they are set to a three second value, as most spoken concepts are formed in this length of time. The base weights for each of the models are then modified based upon model confidence levels, for each time period (at 3383). For example, if the NLP model is classified as being 96% confident during the first six seconds, but only 80% confident in the following twelve seconds, a higher weight will be applied to the first two time periods, and a lower weight for the following four time periods. - The system also determines when the user is not speaking, generally by relying upon the ASR outputs (at 3384). During these periods the NLP model is not going to be useful in determining the user's classification, and as such the NLP model weights are reduced for these time periods (at 3385). The degree of reduction may differ based upon configuration, but in some embodiments, the NLP is afforded no weight for periods when the user is not speaking.
- Likewise, periods where the patient exhibits voice-based biomarkers associated with being dishonest may also be identified, based upon features and conclusions from the video and acoustic models (at 3386). Excessive fidgeting, shifting gaze, higher pitch and mumbling may all be correlated with dishonesty, and when multiple features are simultaneously present, the system flags these periods of the interaction as being suspect. During such time periods the NLP model weights is again reduced (at 3387), but only marginally. Even when a user is not being entirely honest, there is still beneficial information contained in the words they speak, especially for depression diagnosis. For example, even if a user is being dishonest about having suicidal thoughts (determined by sematic analysis) syntactical features may still be valid in determining the user's classification. As such, during periods of dishonesty, while the weight is tempered, the reduction is generally a quarter reduction in weight as opposed to a more steep weight reduction.
- After all the weight adjustments have been made, the system performs a weighted average, over the entire response time period, of the models' classification results (at 3388). The final result of this condensation of the classifications over time and across the different component models results in the fused model output.
- Returning to
FIG. 33 , this fused model output generates a final classification (at 3390) for the interaction. This classification, model results, and features are then output in aggregate or in part (at 3399). Returning toFIG. 29 , these results are then presented to the client and other interested stakeholders (at 2960). This may include selecting which results any given entity should receive. For example, a client may be provided only the classification results, whereas a physician for the client will receive features relating to mood, topics of concern, indications of self-harm or suicidal thoughts, and the like. In contrast, an insurance company will receive the classification results, and potentially a sampling of the clinical data as it pertains to the individual's risk factors. - Even after reporting out classification results, the process continues by collecting new information as it becomes available, re-training models to ensure the highest levels of accuracy, and subsequent interactions and analysis of interaction results.
- Turning now to
FIG. 36 , one example substantiation of an acoustic modeling process 3350b is presented in greater detail. It should be noted, that despite the enhanced detail in this example process, this is still a significant simplification of but one of the analysis methodologies, and is intended purely as an illustrative process for the sake of clarity, and does not limit the analyses that are performed on the response data. - In this example process, a variable cutoff value is determined from the training datasets (at 3605). The acoustic signal that is received, in this particular analysis, is converted into a spectrogram image (at 3610), which provides information on the frequency of the audio signal and the amplitude at each of these frequencies. This image also tracks these over time. In this example process, a sample of the spectrogram image is taken that corresponds to a set length of time (at 3615). In some cases, this may be a ten second sample of the spectrogram data.
- The image is converted into a matrix. This matrix is used in an equation to represent a higher order feature. The equation is developed from the training data utilizing machine learning techniques. The equation includes unknown variables, in addition to the input matrix of the high order feature (here the spectrogram image sample). These unknown variables are multiplied, divided, added or subtracted from the feature matrix (or any combination thereof). The solution to the equation is also known, resulting in the need to randomly select values for the unknown variables (at 3620) in an attempt to solve the equation (at 3630) and get a solution that is similar to the known solution.
- The difference between the solved equation values is compared to the known solution value in order to calculate the error (at 3630). This process is repeated thousands or even millions of times until a close approximation of the correct variable values are found, as determined by a sufficiently low error calculation (at 3635). Once these sufficiently accurate values are found, they are compared against the cutoff values that were originally determined from the training data (at 3640). If the values are above or below the cutoffs, this indicates the existence or absence of the classification, based on the equation utilized. In this manner the classification for the spectrogram analysis may be determined (at 3645), which may be subsequently output (at 3650) for incorporation with the other model results.
-
Modeling system logic 5320 includes speech recognition 2210 (FIG. 22 ), which is shown in greater detail in (FIG. 37 ). Speech recognition is specific to the particular language of the speech. Accordingly,speech recognition 2210 includes language-specific speech recognition 3702, which in turn includes a number of language-specificspeech recognition engines 3706A-Z. The particular languages of language-specificspeech recognition engines 3706A-Z shown in (FIG. 14 )7) are merely illustrative examples. -
Speech recognition 2210 also includes atranslation engine 3704. Suppose for example that the patient speaks a language that is recognized by any of language-specificspeech recognition engines 3706A-Z but is not processed by language models 2214 (FIG. 22 ). Language-specific speech recognition 3702 (FIG. 37 ) produces text in the language spoken by the patient, i.e., the patient's language, from the audio signal received from the patient. To enable application oflanguage models 2214, which cannot process text in the patient's language in this illustrative example,translation engine 3704 translates the text from the patient's language to a language that may be processed bylanguage models 2214, e.g., English. Whilelanguage models 2214 may not be as accurate when relying on translation bytranslation engine 3704, accuracy oflanguage models 2214 is quite good with currently available translation techniques. In addition, the importance oflanguage models 2214 is diluted significantly by the incorporation ofacoustic models 2218,visual models 2222, andclinical data 2220 in the creation of composite model 2204. As a result, composite model 2204 is extremely accurate notwithstanding reliance ontranslation engine 3704. -
Modeling system logic 5320 includes language model training 2212 (FIG. 22 ) andlanguage models 2214, which are shown in greater detail inFIGS. 10 and 11 , respectively. Language model training 2212 (FIG. 38 ) includes logic for training respective models oflanguage models 2214. For example, language model training 2212 (FIG. 38 ) includes syntacticlanguage model training 3802, semanticpattern model training 3804, speechfluency model training 3806, andnon-verbal model training 3808 which include logic for trainingsyntactic language model 3902,semantic pattern model 3904, speech fluency model 3906, andnon-verbal model 3908, respectively, oflanguage models 2214. - Each of models 3902-3908 includes deep learning (also known as deep structured learning or hierarchical learning) logic that assesses the patient's depression from text received from
speech recognition 2210. -
Syntactic language model 3902 assesses a patient's depression from syntactic characteristics of the patient's speech. Examples of such syntactic characteristics include sentence length, sentence completion, sentence complexity, and negation. When a patient speaks in shorter sentences, fails to complete sentences, speaks in simple sentences, and/or uses relatively frequent negation (e.g., “no”, “not”, “couldn't”, “won't”, etc.),syntactic language model 3902 determines that the patient is more likely to be depressed. -
Semantic pattern model 3904 assesses a patient's depression from positive and/or negative content of the patient's speech—i.e., from sentiments expressed by the patient. Some research suggests that expression of negative thoughts may indicate depression and expression of positive thoughts may counter-indicate depression. For example, “the commute here was awful” may be interpreted as an indicator for depression while “the commute here was awesome” may be interpreted as a counter-indicator for depression. - Speech fluency model 3906 assesses a patient's depression from fluency characteristics of, i.e., the flow of, the patient's speech. Fluency characteristics may include, for example, word rates, the frequency and duration of pauses in the speech, the prevalence of filler expressions such as “uh” or “umm”, and packet speech patterns. Some research suggests that lower word rates, frequent and/or long pauses in speech, and high occurrence rates of filler expressions may indicate depression. Perhaps more so than others of
language models 2214, speech fluency model 3906 may be specific to the individual patient. For example, rates of speech (word rates) vary widely across geographic regions. The normal rate of speech for a patient from New York City may be significantly greater than the normal rate of speech for a patient from Minnesota. -
Non-verbal model 3908 assesses a patient's depression from non-verbal characteristics of the patient's speech, such as laughter, chuckles, and sighs. Some research suggests that sighs may indicate depression while laughter and chuckling (and other forms of partially repressed laughter such as giggling) may counter-indicate depression. -
Modeling system logic 5320 includes acoustic model training 2216 (FIG. 22 ) andacoustic models 2214, which are shown in greater detail inFIGS. 12 and 13 , respectively. Acoustic model training 2216 (FIG. 40 ) includes logic for training respective models of acoustic models 2218 (FIG. 41 ). For example, acoustic model training 2216 (FIG. 40 ) includes pitch/energy model training 4002, quality/phonation model training 4004, speakingflow model training 4006, and articulatorycoordination model training 4008 which include logic for training pitch/energy model 4102, quality/phonation pattern model 4104, speakingflow model 4106, and articulatory coordination mode11308, respectively, ofacoustic models 2218. - Each of models 4102-4108 includes deep learning (also known as deep structured learning or hierarchical learning) logic that assesses the patient's depression from audio signals representing the patient's speech as received from collected patient data 2206 (
FIG. 22 ) andpreprocessing 2208. - Pitch/
energy model 4102 assesses a patient's depression from pitch and energy of the patient's speech. Examples of energy include loudness and syllable rate, for example. When a patient speaks with a lower pitch, more softly, and/or more slowly, pitch/energy model 4102 determines that the patient is more likely to be depressed. - Quality/
phonation model 4104 assesses a patient's depression from voice quality and phonation aspects of the patient's speech. Different voice source modifications may occur in depression and affect the voicing related aspects of speech, both generally and for specific speech sounds. - Speaking
flow model 4106 assesses a patient's depression from the flow of the patient's speech. Speaking flow characteristics may include, for example, word rates, the frequency and duration of pauses in the speech, the prevalence of filler expressions such as “uh” or “umm”, and packet speech patterns. -
Articulatory coordination model 4108 assesses a patient's depression from articulatory coordination in the patient's speech. Articulatory coordination refers to micro-coordination in timing, among articulators and source characteristics. This coordination becomes worse when the patient is depressed. - Modeling system logic 5320 (
FIG. 53 ) includes visual model training 2224 (FIG. 22 ) andvisual models 2226, which are shown in greater detail inFIGS. 14 and 15 , respectively. Visual model training 2224 (FIG. 42 ) includes logic for training respective models of visual models 226 (FIG. 53 ). For example, visual model training 2224 (FIG. 42 ) includes facialcue model training 4202 and eye/gaze model training 4204 which include logic for trainingfacial cue model 4302 and eye/gaze model 4304, respectively, ofvisual models 2226. - Each of models 4302-4304 includes deep learning (also known as deep structured learning or hierarchical learning) logic that assesses the patient's depression from video signals representing the patient's speech as received from collected patient data 2206 (
FIG. 22 ) andpreprocessing 2208. -
Facial cue model 4302 assesses a patient's depression from facial cues recognized in the video of the patient's speech. Eye/gaze model 4304 assesses a patient's depression from observed and recognized eye movements in the video of the patient's speech. - As described above, composite model builder 2222 (
FIG. 22 ) builds composite model 2204 by combininglanguage models 2214,acoustic models 2218, andvisual models 2226 and training the combined model using bothclinical data 2220 and collectedpatient data 2206. As a result, composite model 2204 assesses depression in a patient using what the patient says, how the patient says it, and contemporaneous facial and eye expressions in combination. Such provides a particularly accurate and effective tool for assessing the patient's depression. - The above description is illustrative only and is not limiting. For example, while the particular mental health condition addressed by the system and methods as described herein, it should be appreciated that the techniques described herein may effectively assess and/or screen for a number of other mental health conditions such as anxiety, post-traumatic stress disorder (PTSD) and stress generally, drug and alcohol addiction, bipolar disorder, among others. In addition, while
assessment test administrator 2202 is described as assessing the mental health of the human subject, who may be a patient, it is appreciated that “assessment” sometimes refers to professional assessments made by professional clinicians. As used herein, the assessment provided byassessment test administrator 2202 may be any type of assessment in the general sense, including screening or monitoring. - The models described herein may produce scores, at various stages of an assessment. The scores produced may be scaled scores or binary scores. Scaled scores may range over a large number of values, while binary scores may be one of two discrete values. The system disclosed may interchange binary and scaled scores at various stages of the assessment, to monitor different mental states, or update particular binary scores and particular scaled scores for particular mental states over the course of an assessment.
- The scores produced by the system, either binary or scaled, may be produced after each response to each query in the assessment, or may be formulated in part based on previous queries. In the latter case, each marginal score acts to fine-tune a prediction of depression, or of another mental state, as well as to make the prediction more robust. Marginal predictions may increase confidence measures for predictions of mental states in this way, after a particular number of queries and responses (correlated with a particular intermediate mental state)
- For scaled scores, the refinement of the score may allow clinicians to determine, with greater precision, seventies of one or more mental states the patient is experiencing. For example, the refinement of the scaled score, when observing multiple intermediate depression states, may allow a clinician to determine whether the patient has mild, moderate, or severe depression. Performing multiple scoring iterations may also assist clinicians and administrators in removing false negatives, by adding redundancy and adding robustness. For example, initial mental state predictions may be noisier, because relatively fewer speech segments are available to analyze, and NLP algorithms may not have enough information to determine semantic context for the patient's recorded speech. Even though a single marginal prediction may itself be a noisy estimate, refining the prediction by adding more measurements may reduce the overall variance in the system, yielding a more precise prediction. The predictions described herein may be more actionable than those which may be obtained by simply administering a survey, as people may have incentive to lie about their conditions. Administering a survey may yield high numbers of false positive and false negative results, enabling patients who need treatment to slip through the cracks. In addition, although trained clinicians may notice voice and face-based biomarkers, they may not be able to analyze the large amount of data the system disclosed is able to analyze.
- The scaled score may be used to describe a severity of a mental state. The scaled score may be, for example, a number between 1 and 5, or between 0 and 100, with larger numbers indicating a more severe or acute form of the patient's experienced mental state. The scaled score may include integers, percentages, or decimals. Conditions for which the scaled score may express severity may include, but are not limited to depression, anxiety, stress, PTSD, phobic disorder, and panic disorder. In one example, a score of 0 on a depression-related aspect of an assessment may indicate no depression, a score of 50 may indicate moderate depression, and a score of 100 may indicate severe depression. The scaled score may be a composition of multiple scores. A mental state may be expressed as a composition of mental sub-states, and a patient's composite mental state may be a weighted average of individual scores from the mental sub-states. For example, a composition score of depression may be a weighted average of individual scores for anger, sadness, self-image, self-worth, stress, loneliness, isolation, and anxiety.
- A scaled score may be produced using a model that uses a multilabel classifier. This classifier may be, for example, a decision tree classifier, a k-nearest neighbors' classifier, or a neural network-based classifier. The classifier may produce multiple labels for a particular patient at an intermediate or final stage of assessment, with the labels indicating seventies or extents of a particular mental state. For example, a multilabel classifier may output multiple numbers, which may be normalized into probabilities using a softmax layer. The label with the largest probability may indicate the severity of the mental state experienced by the patient.
- The scaled score may also be determined using a regression model. The regression model may determine a fit from training examples that are expressed as sums of weighted variables. The fit may be used to extrapolate a score from a patient with known weights. The weights may be based in part on features, which may be in part derived from the audiovisual signal (e.g., voice-based biomarkers) and in part derived from patient information, such as patient demographics. Weights used to predict a final score or an intermediate score may be taken from previous intermediate scores.
- The scaled score may be scaled based on a confidence measure. The confidence measure may be determined based on recording quality, type of model used to analyze the patient's speech from a recording (e.g., audio, visual, semantic), temporal analysis related to which model was used most heavily during a particular period of time, and the point in time of a specific voice-based biomarker within an audiovisual sample. Multiple confidence measures may be taken to determine intermediate scores. Confidence measures during an assessment may be averaged in order to determine a weighting for a particular scaled score.
- The binary score may reflect a binary outcome from the system. For example, the system may classify a user as being either depressed or not depressed. The system may use a classification algorithm to do this, such as a neural network or an ensemble method. The binary classifier may output a number between 0 and 1. If a patient's score is above a threshold (e.g., 0.5), the patient may be classified as “depressed.” If the patient's score is below the threshold, the patient may be classified as “not depressed.” The system may produce multiple binary scores for multiple intermediate states of the assessment. The system may weight and sum the binary scores from intermediate sates of the assessment in order to produce an overall binary score for the assessment.
- The outputs of the models described herein can be converted to a calibrated score, e.g., a score with a unit range. The outputs of the models described herein can additionally or alternatively be converted to a score with a clinical value. A score with a clinical value can be a qualitative diagnosis (e.g., high risk of severe of depression). A score with a clinical value can alternatively be a normalized, qualitative score that is normalized with respect to the general population or a specific sub-population of patients. The normalized, qualitative score may indicate a risk percentage relative to the general population or to the sub-population.
- The systems described herein may be able to identify a mental state of a subject (e.g., a mental disorder or a behavioral disorder) with less error (e.g., 10% less) or a higher accuracy (e.g., 10% more) than a standardized mental health questionnaire or testing tool. The error rate or accuracy may be established relative to a benchmark standard usable by an entity for identifying or assessing one or more medical conditions comprising said mental state. The entity may be a clinician, a healthcare provider, an insurance company, or a government-regulated body. The benchmark standard may be a clinical diagnosis that has been independently verified.
- The models described herein may use confidence measures. A confidence measure may be a measure of how effective the score produced by the machine learning algorithm may be in order of accurately predicting a mental state, such as depression. A confidence measure may depend on conditions under which the score was taken. A confidence measure may be expressed as a whole number, a decimal, or a percentage. Conditions may include a type of recording device, an ambient space in which signals were taken, background noise, patient speech idiosyncrasies, language fluency of a speaker, the length of responses of the patient, an evaluated truthfulness of the responses of the patient, and frequency of unintelligible words and phrases. Under conditions where the quality of the signal or speech makes it more difficult for the speech to be analyzed, the confidence measure may have a smaller value. In some embodiments, the confidence measure may be added to the score calculation, by weighting a calculated binary or scaled score with the confidence measure. In other embodiments, the confidence measure may be provided separately. For example, the system may tell a clinician that the patient has a 0.93 depression score with 75% confidence.
- The confidence level may also be based on the quality of the labels of the training data used to train the models that analyze the patient's speech. For example, if the labels are based on surveys or questionnaires completed by patients rather than official clinical diagnoses, the quality of the labels may be determined to be lower, and the confidence level of the score may thus be lower. In some cases, it may be determined that the surveys or questionnaires have a certain level of untruthfulness. In such cases, the quality of the labels may be determined to be lower, and the confidence level of the score may thus be lower.
- Various measures may be taken by the system in order to improve a confidence measure, especially where the confidence measure is affected by the environment in which the assessment takes place. For example, the system may employ one or more signal processing algorithms to filter out background noise, or use impulse response measurements to determine how to remove effects of reverberations caused by objects and features of the environment in which the speech sample was recorded. The system may also use semantic analysis to find context clues to determine the identities of missing or unintelligible words.
- In addition, the system may use user profiles to group people based on demeanor, ethnic background, gender, age, or other categories. Because people from similar groups may have similar voice-based biomarkers, the system may be able to predict depression with higher confidence, as people who exhibit similar voice-based biomarkers may indicate depression in similar manners.
- For example, depressed people from different backgrounds may be variously categorized by slower speech, monotone pitch or low pitch variability, excessive pausing, vocal timbre (gravelly or hoarse voices), incoherent speech, rambling or loss of focus, terse responses, and stream-of-consciousness narratives. These voice-based biomarkers may belong to one or more segments of patients analyzed.
- Screening system data store 410 (shown in greater detail in
FIG. 44 ) stores and maintains all user and patient data needed for, and collected by, screening or monitoring in the manner described herein. Screeningsystem data store 410 includesdata store logic 4402,label estimation logic 4404, and user andpatient databases 4406.Data store logic 4402 controls access to user andpatient databases 4406. For example,data store logic 4402 stores audiovisual signals of patients' responses and provides patient clinical history data upon request. If the requested patient clinical history data is not available in user andpatient databases 4406,data store logic 4402 retrieves the patient clinical history data fromclinical data server 106. If the requested patient social history data is not available in user andpatient databases 4406,data store logic 4402 retrieves the patient social history data fromsocial data server 108. Users who are not patients include health care service providers and payers. -
Social media server 108 may include a wide variety of patient/subject data including but not limited to retail purchasing records, legal records (including criminal records), income history, as these may provide valuable insights to a person's health. In many instances, these social determinants of disease contribute more to a person's morbidity than medical care. Appendix B depicts a “Health Policy Brief: The Relative Contributions of Multiple Determinants to Health Outcomes”. -
Label estimation logic 4404 includes logic that specifies labels for which the various learning machines of health screening ormonitoring server 102 screen.Label estimation logic 4404 includes a user interface through which human operators of health screening ormonitoring server 102 may configure and tune such labels. -
Label estimation logic 4404 also controls quality of model training by, inter alia, determining whether data stored in user andpatient databases 4406 is of adequate quality for model training.Label estimation logic 4404 includes logic for automatically identifying or modifying labels. In particular, if model training reveals a significant data point that is not already identified as a label,label estimation logic 4404 looks for correlations between the data point and patient records, system predictions, and clinical insights to automatically assign a label to the data point. - While interactive screening or
monitoring server logic 502 is described as conducting an interactive, spoken conversation with the patient to assess the health state of the patient, interactive screening ormonitoring server logic 502 may also act in a passive listening mode. In this passive listening mode, interactive screening ormonitoring server logic 502 passively listens to the patient speaking without directing questions to be asked of the patient. - Passive listening mode, in this illustrative embodiment, has two (2) variants. In the first, “conversational” variant, the patient is engaged in a conversation with another whose part of the conversation is not controlled by interactive screening or
monitoring server logic 502. Examples of conversational passive listening include a patient speaking with a clinician and a patient speaking during a telephone call reminding the patient of an appointment with a clinician or discussing medication with a pharmacist. In the second, “fly-on-the-wall” (FOTW) or “ambient” variant, the patient is speaking alone or in a public, or semi-public, place. Examples of ambient passive listening include people speaking in a public space or a hospital emergency room and a person speaking alone, e.g., in an audio diary or leaving a telephone message. One potentially useful scenario for screening or monitoring a person speaking alone involves interactive screening ormonitoring server logic 502 screening or monitoring calls to police emergency services (i.e., “9-1-1”). Analysis of emergency service callers may distinguish truly urgent callers from less urgent callers. - It should be noted that this detailed description is intended to describe what is technologically possible. Practicing the techniques described herein should comply with legal requirements and limitations that may vary from jurisdiction to jurisdiction, including federal statutes, state laws, and/or local ordinances. For example, some jurisdictions may require explicit notice and/or consent of involved person(s) prior to capturing their speech. In addition, acquisition, storage, and retrieval of clinical records should be practiced in a manner that is in compliance with applicable jurisdictional requirement(s).
- Patient screening or
monitoring system 100B (FIG. 45 ) illustrates a passive listening variation of patient screening or monitoring system 100 (FIG. 1 ). Patient screening ormonitoring system 100B (FIG. 45 ) includes health screening ormonitoring server 102, aclinical data server 106, and asocial data server 108, which are as described above and, also as described above, connected to one another throughWAN 110. - Since the patient and the clinician are in close physical proximity to one another in conversational passive listening, the remainder of the components of patient screening or monitoring system 110B are connected to one another and
WAN 110 through a local area network (LAN) 4510. - There are a number of ways to distinguish the patient's voice from the clinician's.
- A particularly convenient one is to have two (2)
separate listening devices listening devices listening devices clinician device 114B, for example. - In some embodiments, a
single listening device 4514 is used and screening ormonitoring server 102 distinguishes between the patient and the clinician using conventional voice recognition techniques. Accuracy of such voice recognition may be improved by training screening ormonitoring server 102 to recognize the clinician's voice prior to any session with a patient. While the following description refers to a clinician as speaking to the patient, it should be appreciated that the clinician may be replaced with another. For example, in a telephone call made to the patient by a health care office administrator, e.g., support staff for a clinician, the administrator takes on the clinician's role as described in the context of conversational passive listening. Similarly, in a telephone call made by a pharmacy to a patient regarding prescriptions, the person or automated machine caller calling on behalf of the pharmacy takes on this clinician role as described herein. Appendix C depicts an exemplary Question Bank for some of the embodiments in accordance with the present invention. - Processing by interactive health screening or
monitoring logic 402, particularly generalized dialogue flow logic 602 (FIG. 7 ), in conversational passive listening is illustrated by logic flow diagram 4600 (FIG. 46 ).FIG. 46 shows an instantiation of a dynamic mode, in which query content is analyzed in real-time.Loop step 4602 andnext step 4616 define a loop in which generalizeddialogue flow logic 602 processes audiovisual signals of the conversation between the patient and the clinician according to steps 4604-4614. While steps 4604-4614 are shown as discrete, sequential steps, they are performed concurrently with one another in an ongoing basis by generalizeddialogue flow logic 602. The loop of steps 4602 -4616 is initiated and terminated by the clinician using conventional user interface techniques, e.g., usingclinician device 114B (FIG. 45 ) orlistening device 4514. - In step 4604 (
FIG. 46 ), generalizeddialogue flow logic 602 recognizes a question to the patient posed by the clinician and sends the question to runtimemodel server logic 504 for processing and analysis. Generalizeddialogue flow logic 602 receivesresults 1820 for the audiovisual signal of the clinician's utterance, and results 1820 (FIG. 18 ) include a textual representation of the clinician's utterance fromASR logic 1804 along with additional information from descriptive model andanalytics 1812. This additional information includes identification of the various parts of speech of the words in the clinician's utterance. - In step 4606 (
FIG. 46 ), generalizeddialogue flow logic 602 identifies the most similar question in question and dialogue action bank 710 (FIG. 7 ). If the question recognized instep 4604 is not identical to any questions stored in question anddialogue action bank 710, generalizeddialogue flow logic 602 may identify the nearest question in the manner described above with respect to question equivalence logic 1104 (FIG. 11 ) or may identify the question in question and dialogue action bank 710 (FIG. 7 ) that is most similar linguistically. - In step 4608 (
FIG. 46 ), generalizeddialogue flow logic 602 retrieves the quality of the nearest question from question anddialogue action bank 710, i.e., quality 908 (FIG. 9 ). - In step 4610 (
FIG. 46 ), generalizeddialogue flow logic 602 recognizes an audiovisual signal representing the patient's response to the question recognized instep 4604. - The patient's response is recognized as an utterance of the patient immediately following the recognized question. The utterance may be recognized as the patient's by (i) determining that the voice is captured more loudly by listening
device 4512 than by listeningdevice 4514 or (ii) determining that the voice is distinct from a voice previously established and recognized as the clinician's. - In
step 4612, generalizeddialogue flow logic 602 sends the patient's response, along with the context of the clinician's corresponding question, to runtimemodel server logic 504 for analysis and evaluation. The context of the clinician's question is important, particularly if the semantics of the patient's response is unclear in isolation. For example, consider that the patient's answer is simply “Yes.” That response is analyzed and evaluated very differently in response to the question “Were you able to find parking?” versus in response to the question “Do you have thoughts of hurting yourself?” - In
step 4614, generalizeddialogue flow logic 602 reports intermediate analysis received fromresults 1820 to the clinician. In instances in which the clinician is usingclinician device 114B during the conversation, e.g., to review electronic health records of the patient, the report may be in the form of animated gauges indicating intermediate scores related to a number of health states. Examples of animated gauges include steam gauges, i.e., round dial gauges with a moving needle, and dynamic histograms such as those seen on audio equalizers in sound systems. - Upon termination of the conversational passive listening by the clinician, processing according to the loop of steps 4602-4616 completes. In
step 4618, interactive screening ormonitoring server logic 502 sends final analysis of the conversation to the clinician. Generally, in the context ofstep 4618, the “clinician” is always a medical health professional or health records of the patient. - Thus, health screening or
monitoring server 102 may screen patients for any of a number of health states passively during a conversation the patient may engage in regardless without requiring a separate, explicit screening or monitoring interview of the patient. - In ambient passive listening, health screening or
monitoring server 102 listens to and processes ambient speech according to logic flow diagram 4700 (FIG. 47 ). Processing by interactive health screening ormonitoring logic 402, particularly generalized dialogue flow logic 602 (FIG. 7 ), in ambient passive listening is illustrated by logic flow diagram 4700 (FIG. 47 ).Loop step 4702 andnext step 4714 define a loop in which generalizeddialogue flow logic 602 processes audiovisual signals of ambient speech according to steps 4704-4712. While steps 4704-4714 are shown as discrete, sequential steps, they are performed concurrently with one another in an ongoing basis by generalizeddialogue flow logic 602. The loop of steps 4702-4714 is initiated and terminated by a human operator of the listening device(s) involved, e.g.,listening device 4514. - In step 4704 (
FIG. 47 ), generalizeddialogue flow logic 602 captures ambient speech. Intest step 4708, interactive screening ormonitoring server logic 502 determines whether the speech captured instep 4704 is spoken by a voice that is to be analyzed. In ambient passive listening in areas that are at least partially controlled, many people likely to speak in such areas may be registered with health screening ormonitoring server 102 such that their voices may be recognized. In schools, students may have their voices registered with health screening ormonitoring server 102 at admission. - In some embodiments, the people whose voices are to be analyzed are admitted students that are recognized by generalized
dialogue flow logic 602. In hospitals, hospital personnel may have their voices registered with health screening ormonitoring server 102 at hiring. In addition, patients in hospitals may register their voices at first contact, e.g., at an information desk or by hospital personnel in an emergency room. In some embodiments, hospital personnel are excluded from analysis when recognized as the speaker by generalizeddialogue flow logic 602. - In an emergency room environment in which analysis of voices unknown to generalized
dialogue flow logic 602 is important, generalizeddialogue flow logic 602 may still track speaking by unknown speakers. Multiple utterances may be recognized by generalizeddialogue flow logic 602 as emanating from the same individual person. Health screening ormonitoring server 102 may also determine approximate positions of unknown speakers in environments with multiple listening devices, e.g., by triangulation using different relative amplitudes and/or relative timing of arrival of the captured speech at multiple listening devices. - In other embodiments of ambient passive listening in which only one person speaks, the speaker may be asked to identify herself. Alternatively, in some embodiments, the identity of the speaker may be inferred or is not especially important. In an audio diary, the speaker may be authenticated by the device or may be assumed to be used by the device's owner. In police emergency telephone call triage, the identity of the caller is not as important as the location of the speaker and qualities of the speaker's voice such as emotion, energy, and the substantive content of the speaker's speech.
- In these embodiments in which only one person speaks, generalized
dialogue flow logic 602 always determines that the speaker is to be analyzed. - If the speaker is not to be analyzed, generalized
dialogue flow logic 602 sends the captured ambient speech to runtimemodel server logic 504 for processing and analysis for context. Generalizeddialogue flow logic 602 receivesresults 1820 for the audiovisual signal of the captured speech, and results 1820 (FIG. 18 ) include a textual representation of the captured speech fromASR logic 1804 along with additional information from descriptive model andanalytics 1812. This additional information includes identification of the various parts of speech of the words in the clinician's utterance. Generalizeddialogue flow logic 602processes results 1820 for the captured speech to establish a context. - After step 4708 (
FIG. 47 ), processing transfers throughnext step 4714 toloop step 4702 and passive listening accord to the loop of steps 4702-4714 continues. - If in
test step 4706, interactive screening ormonitoring server logic 502 determines that the speech captured instep 4704 is spoken by a voice that is to be analyzed, processing transfers to step 4710. Instep 4710, generalizeddialogue flow logic 602 sends the captured speech, along with any context determined in prior yet contemporary performances ofstep 4708 orstep 4710, to runtimemodel server logic 504 for analysis and evaluation. - In
step 4712, generalizeddialogue flow logic 602 processes any alerts triggered by the resulting analysis from runtimemodel server logic 504 according predetermined alert rules. These predetermined alert rules are analogous to work-flows 4810 described below. In essence, these predetermined alert rules are in the form of if-then-else logic elements that specify logical states and corresponding actions to take in such states. - The following are examples of alert rules that may be implemented by interactive screening or
monitoring server logic 502. In a police emergency system call in which the caller, speaking initially to an automated triage system, whose speech is determined to be highly emotional and anxious and to semantically describe a highly urgent situation, e.g., a car accident with severe injuries, a very high priority may be assigned to the call and taken ahead of less urgent callers. In a school hallway in which interactive screening ormonitoring server logic 502 recognizes frantic speech and screaming and semantic content describing the presence of weapon and/or blatant acts of violence, interactive screening ormonitoring server logic 502 may trigger immediate notification of law enforcement and school personnel. In an audio diary in which a patient is detected to be at least moderately depressed, interactive screening ormonitoring server logic 502 may record the analysis in the patient's clinical records such that the patient's behavioral health care provider may discuss the diary entry when the patient is next seen. In situations in which the triggering condition of the captured speech is particularly serious and urgent, interactive screening ormonitoring server logic 502 may report the location of the speaker if it may be determined. - Processing according to the loop of steps 4702-4714 (
FIG. 47 ) continues until stopped by a human operator of interactive screening ormonitoring server logic 502 or of the involved listening devices. - Thus, health screening or
monitoring server 102 may screen patients for any of a number of health states passively outside the confines of a one-to-one conversation with a health care professional. - As described above with respect to
FIG. 4 , healthcare management logic 408 makes expert recommendations in response to health state analysis of interactive health screening ormonitoring logic 402. Healthcare management logic 408 is shown in greater detail inFIG. 68 . - Health
care management logic 408 includes manual work-flow management logic 4802, automatic work-flow generation logic 4804, work-flow execution logic 4806, and work-flow configuration 4808. Manual work-flow management logic 4802 implements a user interface through which a human administrator may create, modify, and delete work-flows 4810 of work-flow configuration 4808 by physical manipulation of one or more user input devices of a computer system used by the administrator. Automatic work-flow generation logic 4804 performs statistical analysis of patient data stored within screening or monitoringsystem data store 410 to identify work-flows to achieve predetermined goals. Examples of such goals include things like minimizing predicted costs for the next two (2) years of a patient's care and minimizing the cost of an initial referral while also maximizing a reduction in Hemoglobin A1C in one year. - Work-
flow execution logic 4806 processes work-flows 4810 of work-flow configuration 4808, evaluating conditions and performing actions of work-flow elements 4820. - In some embodiments, work-
flow execution logic 4806 processes work-flows 4810 in response to receipt of final results of any screening or monitoring according to logic flow diagram 800 (FIG. 8 ) using those results in processing conditions of the work-flows. - Work-flow configuration 4808 (
FIG. 48 ) includes data representing a number of work-flows 4810. Each work-flow 4810 includes work-flow metadata 4812 and data representing a number of work-flow elements 4820. - Work-
flow metadata 4812 is metadata of work-flow 4810 and includes data representing adescription 4812, anauthor 4816, and aschedule 4818.Description 4812 is information intended to inform any human operator of the nature of work-flow 4810.Author 4816 identifies the entity that created work-flow 4810, whether a human administrator or automatic work-flow generation logic 4804.Schedule 4818 specifies dates and times and/or conditions in which work-flow execution logic 4806 is to process work-flow 4810. - Work-
flow elements 4820 collectively define the behavior of work-flow execution logic 4806 in processing the work-flow. In this illustrative embodiment, work-flow elements are each one of two types: conditions, such as condition 4900 (FIG. 49 ), and actions such as action 5000 (FIG. 50 ). - In this illustrative embodiment,
condition 4900 specifies a Boolean test that includes anoperand 4902, anoperator 4904, and anotheroperand 4906. In this illustrative embodiment,operator 4904 may be any of a number of Boolean test operators, such as =, ≠, >, ≥, <, and ≤, for example.Operands FIG. 18 ) or any portion thereof, a constant, or null. As a result, any results of a given screening or monitoring , e.g., results 1820, any information about a given patient stored in screening or monitoringsystem data store 410, and any combination thereof may be either ofoperands - Next work-flow element(s) 4908 specify one or more work-flow elements to process if the test of
operands operator 4904 evaluate to a Boolean value of true, and next work-flow element(s) 4910 specify one or more work-flow elements to process if the test ofoperands operator 4904 evaluate to a Boolean value of false. - Each of next work-flow element(s) 4908 and 4910 may be any of a condition, an action, or null. By accepting conditions such as
condition 4900 in next work-flow element(s) 1908 and 4910, complex tests with AND and OR operations may be represented in work-flow elements 4820. In alternative embodiments,condition 4900 may include more operands and operators combined with AND, OR, and NOT operations. - Since each of
operands condition 4900 may test for the mere presence or absence of an occurrence in the patient's data. For example, to determine whether a patient has ever had a Hemoglobin A1C blood test,condition 4900 may determine whether the most recent Hemoglobin A1C test results to null. If equal, the patient has not had any Hemoglobin A1C blood test at all. - Action 5000 (
FIG. 50 ) includesaction logic 5002 and one or more next work-flow element(s) 5004.Action logic 5002 represents the substantive action to be taken by work-flow execution logic 4806 and typically makes or recommends a particular course of action in the care of the patient that may range from specific treatment protocols to more holistic paradigms. Examples include referring the patient to a care provider, enrolling the patient in a particular program of care, and recording recommendations to the patient's file such that the patient's clinician sees the recommendation at the next visit. Examples of referring a patient to a care provider include referring the patient to a psychiatrist, a medication management coach, physical therapist, nutritionist, fitness coach, dietitian, social worker, etc. Examples of enrolling the patient in a program include telepsychiatry programs, group therapy programs, etc. - Examples of recommendations recorded to the patient's file include recommended changes to medication, whether a change in the particular drug prescribed or merely in dosage of the drug already prescribed to the patient, and other treatments. In addition, referrals and enrollment may be effected by recommendations for referrals and enrollment in the patient's file, allowing a clinician to make the final decision regarding the patient's care.
- As described above, automatic work-flow generation logic 4804 (
FIG. 48 ) performs statistical analysis of patient data stored within screening or monitoringsystem data store 410 to identify work-flows to achieve predetermined goals. Examples of such goals given above include minimizing predicted costs for the next two (2) years of a patient's care and minimizing the cost of an initial referral while also maximizing a reduction in Hemoglobin A1C in one year. Automatic work-flow generation logic 4804 is described in the illustrative context of the first, namely, minimizing predicted costs for the next two (2) years of a patient's care. - The manner in which automatic work-
flow generation logic 4804 identifies work-flows to achieve predetermined goals is illustrated by logic flow diagram 5100 (FIG. 51 ). - Automatic work-
flow generation logic 4804 includes deep learning machine logic. Instep 5102, human computer engineers configure this deep learning machine logic of automatic work-flow generation logic 4804 to analyze patient data from screening or monitoringsystem data store 410 in the context of labels specified by users, e.g., labels related to costs of the care of each patient over a 2-year period in this illustrative example. Users of health screening ormonitoring server 102 who are not merely patients are typically either health care providers or health care payers. In either case, information regarding events in a given patient's health care history is available and is included in automatic work-flow generation logic 4804 by the human engineers such that automatic work-flow generation logic 4804 may track costs of a patient's care from the patient's medical records. - Further in
step 5102, the human engineers use all relevant data of screening or monitoringsystem data store 410 to train the deep learning machine logic of automatic work-flow generation logic 4804. After such training, the deep learning machine logic of automatic work-flow generation logic 4804 includes an extremely complex decision tree that predicts the costs of each patient over a 2-year period. - In
step 5104, automatic work-flow generation logic 4804 determines which events in a patient's medical history have the most influence over the cost of the patient's care in a 2-year period for statistically significant portions of the patient population. In particular, automatic work-flow generation logic 4804 identifies deep learning machine (DLM) nodes of the decision tree that have the most influence over the predetermined goals, e.g., costs of the care of a patient over a 2-year period. There are several known techniques for making such a determination automatically, and automatic work-flow generation logic 4804 implements one or more of them to identify these significant nodes. Examples of techniques for identifying significantly influential events/decisions (“nodes” in machine learning parlance) in a deep learning machine include random decision forests (supervised or unsupervised), multinomial logistic regression, and naïve Bayes classifiers, for example. These techniques are known and are not described herein. -
Loop step 5106 andnext step 5112 define a loop in which automatic work-flow generation logic 4804 processes each of the influential nodes identified instep 5104. In a given iteration of the loop of steps 5106-5112, the particular node processed by automatic work-flow generation logic 4804 is sometimes referred to as the subject node. - In
step 5108, automatic work-flow generation logic 4804 forms a condition, e.g., condition 4900 (FIG. 49 ), from the internal logic of the subject node. The internal logic of the subject node receives data representing one or more events in a patient's history and/or one or more phenotypes of the patient and makes a decision that represents one or more branches to other nodes. In step 5108 (FIG. 51 ), automatic work-flow generation logic 4804 generalizes the data received by the subject node and the internal logic of the subject node that maps the received data to a decision. - In
step 5110, automatic work-flow generation logic 4804 forms an action, e.g., action 5000 (FIG. 50 ), according to the branch from the subject node that ultimately leads to the best outcome related to the predetermined goal, e.g., to the lowest cost over a 2-year period. The condition formed in step 5108 (FIG. 51 ) and the action formed instep 5110 collectively form a work-flow generated by automatic work-flow generation logic 4804. - Once all influential nodes have been processed according to the loop of steps 5106-5112, processing by automatic work-
flow generation logic 4804 completes, having formed a number of work-flows. - In this illustrative embodiment, the automatically generated work-flows are subject to human ratification prior to actual deployment within health
care management logic 408. In an alternative embodiment, healthcare management logic 408 automatically deploys work-flows generated automatically by automatic work-flow generation logic 4804 but limits actions to only recommendations to health care professionals. It's technically feasible to fully automate work-flow generation and changes to a patient's care without any human supervision. However, such may be counter to health care public policy in place today. - The disclosed system may also be used to evaluate mental health from primary care health interactions. For example, the system may be used to augment inferences about a patient's mental health taken by a trained health provider individual. The system may also be used to evaluate mental health from a preliminary screening or monitoring call (e.g., a call made to a health care provider organization by a prospective patient for the purpose of setting up a medical appointment with a trained mental health professional). For a primary screen, the health care professional may ask specific questions to the patient in a particular order to ascertain mental health treatment needs of the patient. A recording device may record prospective patient responses to one or more of these questions. The prospective patient's consent may be obtained before this occurs.
- The system may perform an audio analysis or a semantic analysis on audio snippets it collects from the prospective patient. For example, the system may determine relative frequencies of words or phrases associated with depression. For example, the system may predict that a user has depression if the user speaks with terms associated with negative thoughts, such as phrases indicating suicidal thoughts, self-harm instincts, phrases indicating a poor body image or self-image, and feelings of anxiety, isolation, or loneliness. The system may also pick up non-lexical or non-linguistic cues for depression, such as pauses, gasps, sighs, and slurred or mumbled speech. These terms and non-lexical cues may be similar to those picked up from training examples, such as patients administered a survey (e.g., the PHQ-9).
- The system may determine information about mental health by probing a user's physical health. For example, a user may feel insecure or sad about his or her physical features or physical fitness. Questions used to elicit information may have to do with vitals, such as blood pressure, resting heart rate, family history of disease, blood sugar, body mass index, body fat percentage, injuries, deformities, weight, height, eyesight, eating disorders, cardiovascular endurance, diet, or physical strength. Patients may provide speech which indicates despondence, exasperation, sadness, or defensiveness. For example, a patient may provide excuses as to why he or she has not gotten a medical procedure performed, why his or her diet is not going well, why he or she has not started an exercise program, or speak negatively about his or her height, weight, or physical features. Expression of such negativity about one's physical health may be correlated to anxiety.
- The models may be continually active or passive. A passive learning model may not change the method by which it learns in response to new information. For example, a passive learner may continually use a specific condition to converge on a prediction, even as new types of feature information are added to the system. But such a model may be limited in effectiveness without a large amount of training data available. An active learning model, by contrast, may employ a human to converge more quickly. The active learner may ask targeted questions to the human in order to do this. For example, a machine learning algorithm may be employed on a large amount of unlabeled audio samples. The algorithm may be able to easily classify some as being indicative of depression, but others may be ambiguous. The algorithm may ask the patient if he or she were feeling depressed when uttering a specific speech segment. Or the algorithm may ask a clinician to classify the samples.
- The system may be able to perform quality assurance of health providers using voice biomarkers. Data from the system may be provided to health care providers in order to assist the health care providers with detecting lexical and non-lexical cues that correspond to depression in patients. The health care providers may be able to use changes in pitch, vocal cadence, and vocal tics to determine how to proceed with care. The system may also allow health care providers to assess which questions elicit reactions from patients that are most predictive for depressions. Health care providers may use data from the system to train one another to search for lexical and non-lexical cues, and monitor care delivery to determine whether it is effective in screening or monitoring patients. For example, a health care provider may be able to observe a second health care provider question a subject to determine whether the second health care provider is asking questions that elicit useful information from the patient. The health care provider may be asking the questions in person or may be doing so remotely, such as from a call center. Health care providers may, using the semantic and audio information produced by the system, produce standardized methods of eliciting information from patients, based on which methods produce the most cues from patients.
- The system may be used to provide a dashboard tabulating voice-based biomarkers observed in patients. For example, health care providers may be able to track the frequencies of specific biomarkers, in order to keep track of patients' conditions. They may be able to track these frequencies in real time to assess how their treatment methods are performing. They may also be able to track these frequencies over time, in order to monitor patients' performances under treatment or recovery progress. Mental health providers may be able to assess each other's performances using this collected data.
- Dashboards may show real-time biomarker data as a snippet is being analyzed. They may show line graphs showing trends in measured biomarkers over time. The dashboards may show predictions taken at various time points, charting a patient's progress with respect to treatment. The dashboard may show patients' responses to treatment by different providers.
- The system may be able to translate one or more of its models across different patient settings. This may be done to account for background audio information in different settings. For example, the system may employ one or more signal processing algorithms to normalize audio input across settings. This may be done by taking impulse response measurements of multiple locations and determining transfer functions of signals collected at those locations in order to normalize audio recordings. The system may also account for training in different locations. For example, a patient may feel more comfortable discussing sensitive issues at home or in a therapist's office than over the phone. Thus, voice-based biomarkers obtained in these settings may differ. The system may be trained in multiple locations, or training data may be labeled by location before it is processed by the system's machine learning algorithms.
- The models may be transferred from location to location, for example, by using signal processing algorithms. They may also be transferred by modifying the questions asked of patients based on their locations. For example, it may be determined which particular questions, or sequences of questions, correspond to particular reactions within a particular location context. The questions may then be administered by the health care providers in such fashion as to provide the same reactions from the patients.
- The system may be able to use standard clinical encounters to train voice biomarker models. The system may collect recordings of clinical encounters for physical complaints. The complaints may be regarding injuries, sicknesses, or chronic conditions. The system may record, with patient permission, conversation patients have with health care providers during appointments. The physical complaints may indicate patients' feelings about their health conditions. In some cases, the physical complaints may be causing patients significant distress, affecting their overall dispositions and possibly causing depression.
- The data may be encrypted as it is collected or while in transit to one or more servers within the system. The data may be encrypted using a symmetric-key encryption scheme, a public-key encryption scheme, or a blockchain encryption method. Calculations performed by the one or more machine learning algorithms may be encrypted using a homomorphic encryption scheme, such as a partially homomorphic encryption scheme or a fully homomorphic encryption scheme.
- The data may be analyzed locally, to protect privacy. The system may analyze data in real-time by implementing a trained machine learning algorithm to operate on speech sample data recorded at the location where the appointment is taking place.
- Alternatively, the data may be stored locally. To preserve privacy, features may be extracted before being stored in the cloud for later analysis. The features may be anonymized to protect privacy. For example, patients may be given identifiers or pseudonyms to hide their true identities. The data may undergo differential privacy to ensure that patient identities are not compromised. Differential privacy may be accomplished by adding noise to a data set. For example, a data set may include 100 records corresponding to 100 usernames and added noise. If an observer has information about 99 records corresponding to 99 users and knows the remaining username, the observer will not be able to match the remaining record to the remaining username, because of the noise present in the system.
- In some embodiments, a local model may be embedded on a user device. The local model may be able to perform limited machine learning or statistical analysis, subject to constraints of device computing power and storage. The model may also be able to perform digital signal processing on audio recordings from patients. The mobile device used may be a smartphone or tablet computer. The mobile device may be able to download algorithms over a network for analysis of local data. The local device may be used to ensure privacy, as data collected and analyzed may not travel over a network.
- Voice-based biomarkers may be associated with lab values or physiological measurements. Voice-based biomarkers may be associated with mental health-related measurements. For example, they may be compared to the effects of psychiatric treatment, or logs taken by healthcare professionals such as therapists. They may be compared to answers to survey questions, to see if the voice-based analysis matches assessments commonly made in the field.
- Voice-based biomarkers may be associated with physical health-related measurements. For example, vocal issues, such as illness, may contribute to a patient producing vocal sounds that need to be accounted for in order to produce actionable predictions. In addition, depression predictions over a time scale in which a patient is recovering from an illness or injury may be compared to the patient's health outcomes over that time scale, to see if treatment is improving the patient's depression or depression-related symptoms. Voice-based biomarkers may be compared with data relating to brain activity collected during multiple time points, in order to determine the clinical efficacy of the system.
- Training of the models may be continuous, so that the model is continuously running while audio data is collected. Voice-based biomarkers may be continually added to the system and used for training during multiple epochs. Models may be updated using the data as it is collected.
- The system may use a reinforcement learning mechanism, where survey questions may be altered dynamically in order to elicit voice-based biomarkers that yield high-confidence depression predictions. For example, the reinforcement learning mechanism may be able to select questions from a group. Based on a previous question or a sequence of previous questions, the reinforcement mechanism may choose a question that may yield a high-confidence prediction of depression.
- The system may be able to determine which questions or sequences of questions may be able to yield particular elicitations from patients. The system may use machine learning to predict a particular elicitation, by producing, for example, a probability. The system may also use a softmax layer to produce probabilities for multiple elicitations. The system may use as features particular questions as well as at what times these questions are asked, how long into a survey they are asked, the time of day in which they are asked, and the point of time within a treatment course within which they are asked.
- For example, a specific question asked at a specific time about a sensitive subject for a patient may elicit crying from a patient. This crying may be associated strongly with depression. The system may, when receiving context that it is the specific time, may recommend presentation of the question to the patient.
- The system may include a method of using a voice-based biomarker to dynamically affect a course of treatment. The system may log elicitations of users over a period of time and determine, from the logged elicitations, whether or not treatment has been effective. For example, if voice-based biomarkers become less indicative of depression over a long time period, this might be evidence that the prescribed treatment is working. On the other hand, if the voice-based biomarkers become more indicative of depression over a long time period, the system may prompt health care providers to pursue a change in treatment, or to pursue the current course of treatment more aggressively.
- The system may spontaneously recommend a change in treatment. In an embodiment where the system is continually processing and analyzing data, the system may detect a sudden increase in voice-based biomarkers indicating depression. This may occur over a relatively short time window in a course of treatment. The system may also be able to spontaneously recommend a change if a course of treatment has been ineffective for a particular time period (e.g., six months, a year).
- The system may be able to track a probability of a particular response to a medication. For example, the system may be able to track voice based biomarkers taken before, during, and after a course of treatment, and analyze changes in scores indicative of depression.
- The system may be able to track a particular patient's probability of response to medication by having been trained on similar patients. The system may use this data to predict a patient's response based on responses of patients from similar demographics. These demographics may include age, gender, weight, height, medical history, or a combination thereof.
- The system may also be able to track a patient's likely adherence to a course of medicine or treatment. For example, the system may be able to predict, based on analysis of time series voice-based biomarkers, whether a treatment is having an effect on a patient. The health care provider may then ask the patient whether he or she is following the treatment.
- In addition, the system may be able to tell, based on surveying the questions, if the patient is following the treatment by analyzing his or her biomarkers. For example, a patient may become defensive, take long pauses, stammer, or act in a manner that the patient is clearly lying about having adhered to a treatment plan. The patient may also express sadness, shame, or regret regarding not having followed the treatment plan.
- The system may be able to predict whether a patient will adhere to a course of treatment or medication. The system may be able to use training data from voice-based biomarkers from many patients in order to make a prediction as to whether a patient will follow a course of treatment. The system may identify particular voice-based biomarkers as predicting adherence. For example, patients with voice-based biomarkers indicating dishonesty may be designated as less likely to adhere to a treatment plan.
- The system may be able to establish a baseline profile for each individual patient.
- An individual patient may have a particular style of speaking, with particular voice-based biomarkers indicating emotions, such as happiness, sadness, anger, and grief. For example, some people may laugh when frustrated or cry when happy. Some people may speak loudly or softly, speak clearly or mumble, have large or small vocabularies, and speak freely or more hesitantly. Some people may have extroverted personalities, while others may be more introverted.
- Some people may be more hesitant to speak than others. Some people may be more guarded about expressing their feelings. Some people may have experienced trauma and abuse. Some people may be in denial about their feelings.
- A person's baseline mood or mental state, and thus the person's voice-based biomarkers, may change over time. The model may be continually trained to account for this. The model may also predict depression less often. The model's predictions over time may be recorded by mental health professionals. These results may be used to show a patient's progress out of a depressive state.
- The system may be able to make a particular number of profiles to account for different types of individuals. These profiles may be related to individuals' genders, ages, ethnicities, languages spoken, and occupations, for example.
- Particular profiles may have similar voice-based biomarkers. For example, older people may have thinner, breathier voices than younger people. Their weaker voices may make it more difficult for microphones to pick up specific biomarkers, and they may speak more slowly than younger people. In addition, older people may stigmatize behavioral therapy, and thus, not share as much information as younger people might.
- Men and women may express themselves differently, which may lead to different biomarkers. For example, men may express negative emotions more aggressively or violently, while women may be better able to articulate their emotions.
- In addition, people from different cultures may have different methods of dealing with or expressing emotions, or may feel guilt and shame when expressing negative emotions. It may be necessary to segment people based on their cultural backgrounds, in order to make the system more effective with respect to picking up idiosyncratic voice-based biomarkers.
- The system may account for people with different personality types by segmenting and clustering by personality type. This may be done manually, as clinicians may be familiar with personality types and how people of those types may express feelings of depression. The clinicians may develop specific survey questions to elicit specific voice-based biomarkers from people from these segmented groups.
- The voice-based biomarkers may be able to be used to determine whether somebody is depressed, even if the person is holding back information or attempting to outsmart testing methods. This is because many of the voice-based biomarkers may be involuntary utterances. For example, the patient may equivocate or the patient's voice may quaver.
- Particular voice-based biomarkers may correlate with particular causes of depression. For example, semantic analysis performed on many patients, in order to find specific words, phrases, or sequences thereof that indicate depression. The system may also track effects of treatment options on users, in order to determine their efficacy. Finally, the system may use reinforcement learning to determine better methods of treatment available.
- Real-
time system 302 is shown in greater detail in (FIG. 52 ). Real-time system 302 includes one or more microprocessors 5202 (collectively referred to as CPU 5202) that retrieve data and/or instructions frommemory 5204 and execute retrieved instructions in a conventional manner.Memory 5204 may include generally any computer-readable medium including, for example, persistent memory such as magnetic and/or optical disks, ROM, and PROM and volatile memory such as RAM. -
CPU 5202 andmemory 5204 are connected to one another through aconventional interconnect 5206, which is a bus in this illustrative embodiment and which connectsCPU 5202 andmemory 5204 to one ormore input devices 5208,output devices 5210, andnetwork access circuitry 5212.Input devices 5208 may include, for example, a keyboard, a keypad, a touch-sensitive screen, a mouse, a microphone, and one or more cameras.Output devices 5210 may include, for example, a display—such as a liquid crystal display (LCD)—and one or more loudspeakers.Network access circuitry 5212 sends and receives data through computer networks such as network 308 (FIG. 3 ). Generally speaking, server computer systems often exclude input and output devices, relying instead on human user interaction through network access circuitry. Accordingly, in some embodiments, real-time system 302 does not include input device 708 andoutput device 5210. - A number of components of real-
time system 302 are stored inmemory 5204. In particular,assessment test administrator 2202 and composite model 2204 are each all or part of one or more computer processes executing withinCPU 5302 frommemory 5304 in this illustrative embodiment but may also be implemented using digital logic circuitry.Assessment test administrator 2202 and composite model 2204 are both logic. As used herein, “logic” refers to (i) logic implemented as computer instructions and/or data within one or more computer processes and/or (ii) logic implemented in electronic circuitry. - Assessment test configuration 5220 is data stored persistently in
memory 5304 and may each be implemented as all or part of one or more databases. - Modeling system 304 (
FIG. 3 ) is shown in greater detail in (FIG. 53 ). Modeling system 304 includes one or more microprocessors 5302 (collectively referred to as CPU 5302),memory 5304, aninterconnect 5306,input devices 5308,output devices 5310, andnetwork access circuitry 5312 that are directly analogous to CPU 5202 (FIG. 52 ),memory 5204,interconnect 5206,input devices 5208,output devices 5210, andnetwork access circuitry 5212, respectively. Being a server computer system, modeling system 304 may omitinput devices 5308 andoutput devices 5310. - A number of components of modeling system 304 (
FIG. 53 ) are stored inmemory 5304. - In particular,
modeling system logic 5320 is all or part of one or more computer processes executing withinCPU 5302 frommemory 5304 in this illustrative embodiment but may also be implemented using digital logic circuitry.Collected patient data 2206,clinical data 2220, andmodeling system configuration 5322 are each data stored persistently inmemory 5304 and may be implemented as all or part of one or more databases. - In this illustrative embodiment, real-
time system 302, modeling system 304, andclinical data server 306 are shown, at least in the Figures, as separate, single server computers. It should be appreciated that logic and data of separate server computers described herein may be combined and implemented in a single server computer and that logic and data of a single server computer described herein may be distributed across multiple server computers. Moreover, it should be appreciated that the distinction between servers and clients is largely an arbitrary one to facilitate human understanding of purpose of a given computer. As used herein, “server” and “client” are primarily labels to assist human categorization and understanding. - Health screening or
monitoring server 102 is shown in greater detail inFIG. 54 . As noted above, it should be appreciated that the behavior of health screening ormonitoring server 102 described herein may be distributed across multiple computer systems using conventional distributed processing techniques. Health screening ormonitoring server 102 includes one or more microprocessors 5402 (collectively referred to as CPU 5402) that retrieve data and/or instructions frommemory 5404 and execute retrieved instructions in a conventional manner.Memory 5404 may include generally any computer-readable medium including, for example, persistent memory such as magnetic, solid state and/or optical disks, ROM, and PROM and volatile memory such as RAM. -
CPU 5402 andmemory 5404 are connected to one another through a conventional interconnect 5406, which is a bus in this illustrative embodiment and which connectsCPU 5402 andmemory 5404 to one ormore input devices 5408,output devices 5410, andnetwork access circuitry 5412.Input devices 5408 may include, for example, a keyboard, a keypad, a touch-sensitive screen, a mouse, a microphone, and one or more cameras.Output devices 5410 may include, for example, a display—such as a liquid crystal display (LCD)—and one or more loudspeakers.Network access circuitry 5412 sends and receives data through computer networks such as WAN 110 (FIG. 1 ). Server computer systems often exclude input and output devices, relying instead on human user interaction through network access circuitry exclusively. - Accordingly, in some embodiments, health screening or
monitoring server 102 does not includeinput devices 5408 andoutput devices 5410. - A number of components of health screening or
monitoring server 102 are stored inmemory 5404. In particular, interactive health screening ormonitoring logic 402 and healthcare management logic 408 are each all or part of one or more computer processes executing withinCPU 5402 frommemory 5404. As used herein, “logic” refers to (i) logic implemented as computer instructions and/or data within one or more computer processes and/or (ii) logic implemented in electronic circuitry. - Screening
system data store 410 andmodel repository 416 are each data stored persistently inmemory 5404 and may be implemented as all or part of one or more databases. Screeningsystem data store 410 also includes logic as described above. - It should be appreciated that the distinction between servers and clients is largely an arbitrary one to facilitate human understanding of purpose of a given computer. As used herein, “server” and “client” are primarily labels to assist human categorization and understanding.
- The above description is illustrative only and is not limiting. For example, while much of the description above pertains to depression and anxiety, it should be appreciated that the techniques described herein may effectively estimate and/or screen for a number of other health conditions such as post-traumatic stress disorder (PTSD) and stress generally, drug and alcohol addiction, and bipolar disorder, among others. Moreover, while the majority of the health states for which health screening or
monitoring server 102 screens as described herein are mental health states or behavioral health ailments, health screening ormonitoring server 102 may screen for health states unrelated to mental or behavior health. Examples include Parkinson's disease, Alzheimer's disease, chronic obstructive pulmonary disease, liver failure, Crohn's disease, myasthenia gravis, amyotrophic lateral sclerosis (ALS) and decompensated heart failure. - Moreover, many modifications of and/or additions to the above described embodiment(s) are possible. For example, with patient consent, corroborative patient data for mental illness diagnostics may be extracted from one or more of the patient's biometrics including heart rate, blood pressure, respiration, perspiration, body temperature. It may also be possible to use audio without words, for privacy or for cross-language analysis. It is also possible to use acoustics modeling without visual cues.
- The present invention is defined solely by the claims which follow and their full range of equivalents. It is intended that the following appended claims be interpreted as including all such alterations, modifications, permutations, and substitute equivalents as fall within the true spirit and scope of the present invention.
- Now that the systems and methods for screening or monitoring for a health condition, namely depression in a number of the embodiments, have been described, attention shall now be focused upon examples of systems capable of executing the above functions. To facilitate this discussion,
FIGS. 57 and 58 illustrate aComputer System 5700, which is suitable for implementing embodiments of the present invention.FIG. 57 shows one possible physical form of theComputer System 5700. Of course, theComputer System 5700 may have many physical forms ranging from a printed circuit board, an integrated circuit, and a small handheld device up to a huge super computer, and a collection of networked computers (or computing components operating in a distributed network).Computer system 5700 may include aMonitor 5702, aDisplay 5704, aHousing 5706, aDisk Drive 5708, aKeyboard 5710, and aMouse 5712.Storage medium 5714 is a computer-readable medium used to transfer data to and fromComputer System 5700. -
FIG. 58 is an example of a block diagram 5800 forComputer System 5700. Attached toSystem Bus 5720 are a wide variety of subsystems. Processor(s) 5722 (also referred to as central processing units, or CPUs) are coupled to storage devices, includingMemory 5724.Memory 5724 includes random access memory (RAM) and read-only memory (ROM). As is well known in the art, ROM acts to transfer data and instructions uni-directionally to the CPU, and RAM is used typically to transfer data and instructions in a bi-directional manner. Both of these types of memories may include any suitable of the computer-readable media described below. A Fixed medium 5726 may also be coupled bi-directionally to theProcessor 5722; it provides additional data storage capacity and may also include any of the computer-readable media described below. Fixed medium 5726 may be used to store programs, data, and the like and is typically a secondary storage medium (such as a hard disk) that is slower than primary storage. It will be appreciated that the information retained within Fixed medium 5726 may, in appropriate cases, be incorporated in standard fashion as virtual memory inMemory 5724. Removable medium 5714 may take the form of any of the computer-readable media described below. -
Processor 5722 is also coupled to a variety of input/output devices, such asDisplay 5704,Keyboard 5710,Mouse 5712 andSpeakers 5730. In general, an input/output device may be any of: video displays, track balls, mice, keyboards, microphones, touch-sensitive displays, transducer card readers, magnetic or paper tape readers, tablets, styluses, voice or handwriting recognizers, biometrics readers, motion sensors, motion trackers, brain wave readers, or other computers.Processor 5722 optionally may be coupled to another computer or telecommunications network usingNetwork Interface 5740. With such aNetwork Interface 5740, it is contemplated that theProcessor 5722 might receive information from the network or might output information to the network in the course of performing the above-described health screening or monitoring. Furthermore, method embodiments of the present invention may execute solely uponProcessor 5722 or may execute over a network such as the Internet in conjunction with a remote CPU that shares a portion of the processing. - Software is typically stored in the non-volatile memory and/or the drive unit. Indeed, for large programs, it may not even be possible to store the entire program in the memory. Nevertheless, it should be understood that for software to run, if necessary, it is moved to a computer readable location appropriate for processing, and for illustrative purposes, that location is referred to as the memory in this disclosure. Even when software is moved to the memory for execution, the processor will typically make use of hardware registers to store values associated with the software, and local cache that, ideally, serves to speed up execution. As used herein, a software program is assumed to be stored at any known or convenient location (from non-volatile storage to hardware registers) when the software program is referred to as “implemented in a computer-readable medium.” A processor is considered to be “configured to execute a program” when at least one value associated with the program is stored in a register readable by the processor.
- In operation, the
computer system 5700 may be controlled by operating system software that includes a file management system, such as a disk operating system. One example of operating system software with associated file management system software is the family of operating systems known as Windows® from Microsoft Corporation of Redmond, Wash., and their associated file management systems. Another example of operating system software with its associated file management system software is the Linux operating system and its associated file management system. The file management system is typically stored in the non-volatile memory and/or drive unit and causes the processor to execute the various acts required by the operating system to input and output data and to store data in the memory, including storing files on the non-volatile memory and/or drive unit. - Some portions of the detailed description may be presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the approaches used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is, here and, generally, conceived to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.
- The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Various general-purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct more specialized apparatus to perform the methods of some embodiments. In addition, the techniques are not described with reference to any particular programming language, and various embodiments may, thus, be implemented using a variety of programming languages.
- In alternative embodiments, the machine operates as a standalone device or may be connected (e.g., networked) to other machines. In a networked deployment, the machine may operate in the capacity of a server or a client machine in a client-server network environment or as a peer machine in a peer-to-peer (or distributed) network environment.
- The machine may be a server computer, a client computer, a virtual machine, a personal computer (PC), a tablet PC, a laptop computer, a set-top box (STB), a personal digital assistant (PDA), a cellular telephone, an iPhone, a Blackberry, a processor, a telephone, a web appliance, a network router, switch or bridge, or any machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- While the machine-readable medium or machine-readable storage medium is shown in an exemplary embodiment to be a single medium, the term “machine-readable medium” and “machine-readable storage medium” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions. The term “machine-readable medium” and “machine-readable storage medium” shall also be taken to include any medium that is capable of storing, encoding or carrying a set of instructions for execution by the machine and that cause the machine to perform any one or more of the methodologies of the presently disclosed technique and innovation.
- In general, the routines executed to implement the embodiments of the disclosure may be implemented as part of an operating system or a specific application, component, program, object, module or sequence of instructions referred to as “computer programs.” The computer programs typically comprise one or more instructions set at various times in various memory and storage devices in a computer, and when read and executed by one or more processing units or processors in a computer, cause the computer to perform operations to execute elements involving the various aspects of the disclosure.
- Moreover, while embodiments have been described in the context of fully functioning computers and computer systems, those skilled in the art will appreciate that the various embodiments are capable of being distributed as a program product in a variety of forms, and that the disclosure applies equally regardless of the particular type of machine or computer-readable media used to actually affect the distribution.
- The systems disclosed herein may be used to augment care provided by healthcare providers. For example, one or more of the systems disclosed may be used to facilitate handoffs of patients to patient care providers. If the system, following an assessment, produces a score above a threshold for a particular mental state, the system may refer the patient to a specialist for further investigation and analysis. The patient may be referred before the assessment has been completed, for example, if the patient is receiving treatment in a telemedicine system or if the specialist is co-located with the patient. For example, the patient may be receiving treatment in a clinic with one or more specialists.
- The system disclosed may be able to direct clinical processes for patients, following scoring. For example, if the patient were taking the assessment using a client device, the patient may, following completion of the assessment, be referred to cognitive behavioral therapy (CBT) services. They may also be referred to health care providers, or have appointments with health care providers made by the system. The system disclosed may suggest one or more medications.
-
FIG. 59 shows an instantiation of a precision case management use case for the system. In a first step, the patient has a conversation with a case manager. In a second step, one or more entities passively record the conversation, with consent of the patient. The conversation may be a face-to-face conversation. In another embodiment, the case manager may perform the conversation remotely. For example, the conversation may be a conversation using a telemedicine platform. In a third step, real time results are passed to a payer. The real time results may include a score corresponding to a mental state. In a fourth step, the case manager may update a care plan based on the real time results. For example, a particular score that exceeds a particular threshold may influence a future interaction between a care provider and a patient and may cause the provider to ask different questions of the patient. The score may even trigger the system to suggest particular questions associated with the score. The conversation may be repeated with the updated care plan. -
FIG. 60 shows an instantiation of a primary care screening or monitoring use case for the system. In a first step, the patient visits with a primary care provider. In a second step, speech may be captured by the primary care provider's organization for e-transcription and the system may provide a copy for analysis. In a third step, the primary care provider, from the analysis, may receive a real-time vital sign informing the care pathway. This may facilitate a warm handoff to a behavioral health specialist or may be used to direct a primary care provider on a specific care pathway. -
FIG. 61 shows an example system for enhanced employee assistance plan (EAP) navigation and triage. In a first step, the patient may call the EAP line. In a second step, the system may record audiovisual data and screen the patient. The real time screening or monitoring results may be delivered to the provider in real time. The provider may be able to adaptively screen the patient about high risk topics, based on the collected real-time results. The real-time screening or monitoring data may also be provided to other entities. For example, the real-time screening or monitoring data may be provided to a clinician-on-call, used to schedule referrals, used for education purposes, or for other purposes. The interaction between the patient and EAP may be in-person or may be remote. A person staffing an EAP line may be alerted in real-time that a patient has a positive screen and may be able to help direct the patient to a proper level of therapy. An EAP may also be directed to ask questions based on a result of an assessment administered to a patient, for example, a score corresponding to a patient's mental state. - Speech data as described herein may be collected and analyzed in real-time, or it may be data that is recorded and then analyzed later.
- The system disclosed herein may be used to monitor interactions between unlicensed coaches and patients. The system may request consent from the patients before monitoring. The coaches may be used to administer questions. The coaches in tandem with the assessment may be able to provide an interaction with the patient that provides actionable predictions to clinicians and health care professionals, without being as costly as using the services of a clinician or health care. The assessment may be able to add rigor and robustness to judgments made by the unlicensed coaches. The assessment may also allow more people to take jobs as coaches, as it provides a method for validating coaches' methods.
- While this invention has been described in terms of several embodiments, there are alterations, modifications, permutations, and substitute equivalents, which fall within the scope of this invention. Although sub-section titles have been provided to aid in the description of the invention, these titles are merely illustrative and are not intended to limit the scope of the present invention. It should also be noted that there are many alternative ways of implementing the methods and apparatuses of the present invention. It is therefore intended that the following appended claims be interpreted as including all such alterations, modifications, permutations, and substitute equivalents as fall within the true spirit and scope of the present invention.
- Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.
- As used herein, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Any reference to “or” herein is intended to encompass “and/or” unless otherwise stated.
- As used herein, the term “about” refers to an amount that is near the stated amount by 10%, 5%, or 1%, including increments therein.
- As used herein, the term “about” in reference to a percentage refers to an amount that is greater or less the stated percentage by 10%, 5%, or 1%, including increments therein.
- As used herein, the phrases “at least one”, “one or more”, and “and/or” are open-ended expressions that are both conjunctive and disjunctive in operation. For example, each of the expressions “at least one of A, B and C”, “at least one of A, B, or C”, “one or more of A, B, and C”, “one or more of A, B, or C” and “A, B, and/or C” means A alone, B alone, C alone, A and B together, A and C together, B and C together, or A, B and C together.
- Whenever the term “at least,” “greater than,” or “greater than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “at least,” “greater than” or “greater than or equal to” applies to each of the numerical values in that series of numerical values. For example, greater than or equal to 1, 2, or 3 is equivalent to greater than or equal to 1, greater than or equal to 2, or greater than or equal to 3.
- Whenever the term “no more than,” “less than,” or “less than or equal to” precedes the first numerical value in a series of two or more numerical values, the term “no more than,” “less than,” or “less than or equal to” applies to each of the numerical values in that series of numerical values. For example, less than or equal to 3, 2, or 1 is equivalent to less than or equal to 3, less than or equal to 2, or less than or equal to 1.
- Computer systems
- The present disclosure provides computer systems that are programmed to implement methods of the disclosure.
FIG. 62 shows acomputer system 6201 that is programmed or otherwise configured to assess a mental state of a subject in a single session or over multiple different sessions. Thecomputer system 6201 can regulate various aspects of assessing a mental state of a subject in a single session or over multiple different sessions of the present disclosure, such as, for example, presenting queries, retrieving data, and processing data. Thecomputer system 6201 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device. The electronic device can be a mobile electronic device. - The
computer system 6201 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 6205, which can be a single core or multi core processor, or a plurality of processors for parallel processing. Thecomputer system 6201 also includes memory or memory location 6210 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 6215 (e.g., hard disk), communication interface 6220 (e.g., network adapter) for communicating with one or more other systems, andperipheral devices 6225, such as cache, other memory, data storage and/or electronic display adapters. Thememory 6210,storage unit 6215,interface 6220 andperipheral devices 6225 are in communication with theCPU 6205 through a communication bus (solid lines), such as a motherboard. Thestorage unit 6215 can be a data storage unit (or data repository) for storing data. Thecomputer system 6201 can be operatively coupled to a computer network (“network”) 6230 with the aid of thecommunication interface 6220. Thenetwork 6230 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet. Thenetwork 6230 in some cases is a telecommunication and/or data network. Thenetwork 6230 can include one or more computer servers, which can enable distributed computing, such as cloud computing. Thenetwork 6230, in some cases with the aid of thecomputer system 6201, can implement a peer-to-peer network, which may enable devices coupled to thecomputer system 6201 to behave as a client or a server. - The
CPU 6205 can execute a sequence of machine-readable instructions, which can be embodied in a program or software. The instructions may be stored in a memory location, such as thememory 6210. The instructions can be directed to theCPU 6205, which can subsequently program or otherwise configure theCPU 6205 to implement methods of the present disclosure. Examples of operations performed by theCPU 6205 can include fetch, decode, execute, and writeback. - The
CPU 6205 can be part of a circuit, such as an integrated circuit. One or more other components of thesystem 6201 can be included in the circuit. In some cases, the circuit is an application specific integrated circuit (ASIC). - The
storage unit 6215 can store files, such as drivers, libraries and saved programs. Thestorage unit 6215 can store user data, e.g., user preferences and user programs. Thecomputer system 6201 in some cases can include one or more additional data storage units that are external to thecomputer system 6201, such as located on a remote server that is in communication with thecomputer system 6201 through an intranet or the Internet. - The
computer system 6201 can communicate with one or more remote computer systems through thenetwork 6230. For instance, thecomputer system 6201 can communicate with a remote computer system of a user (e.g., the clinician). Examples of remote computer systems include personal computers (e.g., portable PC), slate or tablet PC's (e.g., Apple® iPad, Samsung® Galaxy Tab), telephones, Smart phones (e.g., Apple® iPhone, Android-enabled device, Blackberry®), or personal digital assistants. The user can access thecomputer system 6201 via thenetwork 6230. - Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the
computer system 6201, such as, for example, on thememory 6210 orelectronic storage unit 6215. The machine executable or machine readable code can be provided in the form of software. During use, the code can be executed by theprocessor 6205. In some cases, the code can be retrieved from thestorage unit 6215 and stored on thememory 6210 for ready access by theprocessor 6205. In some situations, theelectronic storage unit 6215 can be precluded, and machine-executable instructions are stored onmemory 6210. - The code can be pre-compiled and configured for use with a machine having a processer adapted to execute the code, or can be compiled during runtime. The code can be supplied in a programming language that can be selected to enable the code to execute in a pre-compiled or as-compiled fashion.
- Aspects of the systems and methods provided herein, such as the
computer system 6201, can be embodied in programming. Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium. Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk. “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server. Thus, another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links. The physical elements that carry such waves, such as wired or wireless links, optical links or the like, also may be considered as media bearing the software. As used herein, unless restricted to non-transitory, tangible “storage” media, terms such as computer or machine “readable medium” refer to any medium that participates in providing instructions to a processor for execution. - Hence, a machine readable medium, such as computer-executable code, may take many forms, including but not limited to, a tangible storage medium, a carrier wave medium or physical transmission medium. Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings. Volatile storage media include dynamic memory, such as main memory of such a computer platform. Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications. Common forms of computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- The
computer system 6201 can include or be in communication with anelectronic display 6235 that comprises a user interface (UI) 6240 for providing, for example, an assessment to a patient. Examples of UI's include, without limitation, a graphical user interface (GUI) and web-based user interface. - Methods and systems of the present disclosure can be implemented by way of one or more algorithms. An algorithm can be implemented by way of software upon execution by the
central processing unit 6205. The algorithm can, for example, analyze speech using natural language processing. - While preferred embodiments of the present invention have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. It is not intended that the invention be limited by the specific examples provided within the specification. While the invention has been described with reference to the aforementioned specification, the descriptions and illustrations of the embodiments herein are not meant to be construed in a limiting sense. Numerous variations, changes, and substitutions will now occur to those skilled in the art without departing from the invention. Furthermore, it shall be understood that all aspects of the invention are not limited to the specific depictions, configurations or relative proportions set forth herein which depend upon a variety of conditions and variables. It should be understood that various alternatives to the embodiments of the invention described herein may be employed in practicing the invention. It is therefore contemplated that the invention shall also cover any such alternatives, modifications, variations or equivalents. It is intended that the following claims define the scope of the invention and that methods and structures within the scope of these claims and their equivalents be covered thereby.
Claims (22)
1.-20. (canceled)
21. A method, comprising:
(a) obtaining first speech data from a subject at a first time point;
(b) using an acoustic model and a natural language processing (NLP) model, processing said first speech data from said subject to generate a first metric that is indicative of whether said subject has said behavioral or mental condition at said first time point;
(c) obtaining second speech data from said subject at a second time point, wherein said second time point is after said first time point;
(d) using said acoustic model and said NLP model, processing said second speech data from said subject to generate a second metric that is indicative of whether said subject has said behavioral or mental condition at said second time point;
(e) displaying, in a graphical user interface of an electronic device, a comparison of said first metric and said second metric to enable said subject or another user to track a progression of said behavioral or mental health condition over time.
22. The method of claim 21 , wherein said comparison comprises a plot comprising said first metric and said second metric.
23. The method of claim 21 , further comprising displaying on said graphical user interface a qualitative assessment associated with said first metric or said second metric.
24. The method of claim 21 , further comprising displaying one or more topics identified in said first speech data or said second speech data.
25. The method of claim 21 , further comprising displaying a personalized health recommendation on said graphical user interface of said electronic device.
26. The method of claim 25 , wherein said personalized health recommendation is a referral to a healthcare provider.
27. The method of claim 21 , further comprising transmitting an alert to a healthcare provider in response to said second metric satisfying a condition.
28. The method of claim 21 , further comprising establishing a baseline profile for said subject.
29. The method of claim 28 , further comprising calibrating said first metric or said second metric based on said baseline profile.
30. The method of claim 21 , wherein said acoustic model or said NLP model is personalized for a demographic group to which said subject belongs.
31. The method of claim 21 , wherein (a) or (c) comprises prompting said subject to an answer a question related to a mental state or mood of said subject.
32. The method of claim 21 , wherein (a) or (c) is performed before, during, or after a clinical encounter with a healthcare provider.
33. The method of claim 21 , wherein said acoustic model and said NLP model are deep neural networks.
34. The method of claim 33 , wherein said deep neural networks are trained on a plurality of speech samples generated by a plurality of other subjects.
35. The method of claim 34 , wherein each of said plurality of speech samples comprises a label that indicates that the other subject that generated said speech sample (i) has, to some level, said behavioral or mental condition or (ii) does not have said behavioral or mental condition.
36. The method of claim 35 , wherein said label is based on a clinical diagnosis.
37. The method of claim 35 , wherein said label is based on a clinically validated survey or questionnaire.
38. The method of claim 21 , wherein said first metric and said second metric are scaled scores.
39. The method of claim 21 , wherein said another user is a healthcare provider.
40. The method of claim 39 , wherein said healthcare provider is a psychologist, a psychiatrist, or a therapist.
41. The method of claim 21 , further comprising connecting said subject to a healthcare provider through said graphical user interface of said electronic device in response to said first metric or said second metric satisfying a condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/130,649 US20210110895A1 (en) | 2018-06-19 | 2020-12-22 | Systems and methods for mental health assessment |
Applications Claiming Priority (18)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862687176P | 2018-06-19 | 2018-06-19 | |
US201862733568P | 2018-09-19 | 2018-09-19 | |
US201862733552P | 2018-09-19 | 2018-09-19 | |
US201862749113P | 2018-10-22 | 2018-10-22 | |
US201862749669P | 2018-10-23 | 2018-10-23 | |
US201862749663P | 2018-10-23 | 2018-10-23 | |
US201862749654P | 2018-10-23 | 2018-10-23 | |
US201862749672P | 2018-10-24 | 2018-10-24 | |
US201862754547P | 2018-11-01 | 2018-11-01 | |
US201862754541P | 2018-11-01 | 2018-11-01 | |
US201862754534P | 2018-11-01 | 2018-11-01 | |
US201862755356P | 2018-11-02 | 2018-11-02 | |
US201862755361P | 2018-11-02 | 2018-11-02 | |
PCT/US2019/037953 WO2019246239A1 (en) | 2018-06-19 | 2019-06-19 | Systems and methods for mental health assessment |
US16/523,298 US20190385711A1 (en) | 2018-06-19 | 2019-07-26 | Systems and methods for mental health assessment |
US16/560,720 US10748644B2 (en) | 2018-06-19 | 2019-09-04 | Systems and methods for mental health assessment |
US202016918624A | 2020-07-01 | 2020-07-01 | |
US17/130,649 US20210110895A1 (en) | 2018-06-19 | 2020-12-22 | Systems and methods for mental health assessment |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US202016918624A Continuation | 2018-06-19 | 2020-07-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210110895A1 true US20210110895A1 (en) | 2021-04-15 |
Family
ID=68984344
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/129,859 Active US11120895B2 (en) | 2018-06-19 | 2020-12-21 | Systems and methods for mental health assessment |
US17/130,649 Abandoned US20210110895A1 (en) | 2018-06-19 | 2020-12-22 | Systems and methods for mental health assessment |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/129,859 Active US11120895B2 (en) | 2018-06-19 | 2020-12-21 | Systems and methods for mental health assessment |
Country Status (4)
Country | Link |
---|---|
US (2) | US11120895B2 (en) |
EP (1) | EP3811245A4 (en) |
JP (1) | JP2021529382A (en) |
WO (1) | WO2019246239A1 (en) |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200320414A1 (en) * | 2019-04-02 | 2020-10-08 | Kpn Innovations, Llc. | Artificial intelligence advisory systems and methods for vibrant constitutional guidance |
US20200357515A1 (en) * | 2019-05-10 | 2020-11-12 | Tencent America LLC | System and method for clinical decision support system with inquiry based on reinforcement learning |
US20210104240A1 (en) * | 2018-09-27 | 2021-04-08 | Panasonic Intellectual Property Management Co., Ltd. | Description support device and description support method |
US20210295172A1 (en) * | 2020-03-20 | 2021-09-23 | International Business Machines Corporation | Automatically Generating Diverse Text |
US20210383921A1 (en) * | 2018-01-26 | 2021-12-09 | Hitachi High-Tech Solutions Corporation | Controlling devices to achieve medical outcomes |
US20220015687A1 (en) * | 2020-07-15 | 2022-01-20 | Seoul National University R&Db Foundation | Method for Screening Psychiatric Disorder Based On Conversation and Apparatus Therefor |
US20220039741A1 (en) * | 2018-12-18 | 2022-02-10 | Szegedi Tudományegyetem | Automatic Detection Of Neurocognitive Impairment Based On A Speech Sample |
US11298062B2 (en) * | 2017-02-01 | 2022-04-12 | Conflu3Nce Ltd | Multi-purpose interactive cognitive platform |
US20220114273A1 (en) * | 2020-10-14 | 2022-04-14 | Philip Chidi Njemanze | Method and System for Mental Performance Computing Using Artificial Intelligence and Blockchain |
US11335461B1 (en) * | 2017-03-06 | 2022-05-17 | Cerner Innovation, Inc. | Predicting glycogen storage diseases (Pompe disease) and decision support |
US20220165390A1 (en) * | 2020-11-20 | 2022-05-26 | Blue Note Therapeutics, Inc. | Digital therapeutic for treatment of psychological aspects of an oncological condition |
US20220181004A1 (en) * | 2020-12-08 | 2022-06-09 | Happify Inc. | Customizable therapy system and process |
US20220207392A1 (en) * | 2020-12-31 | 2022-06-30 | International Business Machines Corporation | Generating summary and next actions in real-time for multiple users from interaction records in natural language |
US20220223241A1 (en) * | 2021-01-11 | 2022-07-14 | juli, Inc. | Methods and systems for generating personalized recommendations and predictions of a level of effectiveness of the personalized recommendations for a user |
US11395124B2 (en) * | 2020-05-06 | 2022-07-19 | Kant AI Solutions LLC | Artificial intelligence for emergency assistance |
US20220246011A1 (en) * | 2021-02-03 | 2022-08-04 | NC Seven Mountains, LLC | Methods, devices, and systems for round-the-clock health and wellbeing monitoring of incarcerated individuals and/or individuals under twenty-four-hour-seven-day-a-week (24/7) supervision |
WO2022272147A1 (en) * | 2021-06-24 | 2022-12-29 | The Regents Of The University Of California | Artificial intelligence modeling for multi-linguistic diagnostic and screening of medical disorders |
US20230018077A1 (en) * | 2021-07-13 | 2023-01-19 | Canon Medical Systems Corporation | Medical information processing system, medical information processing method, and storage medium |
US11559232B1 (en) | 2022-02-27 | 2023-01-24 | King Abdulaziz University | GRU based real-time mental stress assessment |
US11562135B2 (en) * | 2018-10-16 | 2023-01-24 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US20230162835A1 (en) * | 2021-11-24 | 2023-05-25 | Wendy B. Ward | System and Method for Collecting and Analyzing Mental Health Data Using Computer Assisted Qualitative Data Analysis Software |
WO2023096867A1 (en) * | 2021-11-23 | 2023-06-01 | Compass Pathfinder Limited | Intelligent transcription and biomarker analysis |
US20230317274A1 (en) * | 2022-03-31 | 2023-10-05 | Matrixcare, Inc. | Patient monitoring using artificial intelligence assistants |
WO2023235564A1 (en) * | 2022-06-03 | 2023-12-07 | aiberry, Inc. | Multimodal (audio/text/video) screening and monitoring of mental health conditions |
WO2023235527A1 (en) * | 2022-06-03 | 2023-12-07 | aiberry, Inc. | Multimodal (audio/text/video) screening and monitoring of mental health conditions |
US11854538B1 (en) * | 2019-02-15 | 2023-12-26 | Amazon Technologies, Inc. | Sentiment detection in audio data |
US11861319B2 (en) | 2019-02-13 | 2024-01-02 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
WO2024026272A1 (en) * | 2022-07-26 | 2024-02-01 | Compass Pathfinder Limited | Predicting response to psilocybin therapy for treatment resistant depression |
US11923048B1 (en) | 2017-10-03 | 2024-03-05 | Cerner Innovation, Inc. | Determining mucopolysaccharidoses and decision support tool |
US11942194B2 (en) | 2018-06-19 | 2024-03-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
WO2024068953A1 (en) * | 2022-09-30 | 2024-04-04 | Presage | System for remote monitoring of a potentially elderly individual in an everyday environment |
US12020820B1 (en) | 2017-03-03 | 2024-06-25 | Cerner Innovation, Inc. | Predicting sphingolipidoses (fabry's disease) and decision support |
WO2024130331A1 (en) * | 2022-12-22 | 2024-06-27 | Redenlab Pty. Ltd. | "systems and methods for assessing brain health" |
US20240223705A1 (en) * | 2022-12-28 | 2024-07-04 | Motorola Solutions, Inc. | Device, system, and method to initiate electronic actions on calls and manage call-taking resources |
Families Citing this family (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10813572B2 (en) | 2015-12-11 | 2020-10-27 | Electronic Caregiver, Inc. | Intelligent system for multi-function electronic caregiving to facilitate advanced health diagnosis, health monitoring, fall and injury prediction, health maintenance and support, and emergency response |
US20220359091A1 (en) * | 2021-05-04 | 2022-11-10 | Electronic Caregiver, Inc. | Clinical Pathway Integration and Clinical Decision Support |
GB2567826B (en) * | 2017-10-24 | 2023-04-26 | Cambridge Cognition Ltd | System and method for assessing physiological state |
CA3080399A1 (en) | 2017-10-30 | 2019-05-09 | The Research Foundation For The State University Of New York | System and method associated with user authentication based on an acoustic-based echo-signature |
WO2019173283A1 (en) | 2018-03-05 | 2019-09-12 | Marquette University | Method and apparatus for non-invasive hemoglobin level prediction |
WO2019246239A1 (en) | 2018-06-19 | 2019-12-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
GB201816532D0 (en) * | 2018-10-10 | 2018-11-28 | Leso Digital Health Ltd | Methods, systems and apparatus for improved therapy delivery and monitoring |
US10847177B2 (en) | 2018-10-11 | 2020-11-24 | Cordio Medical Ltd. | Estimating lung volume by speech analysis |
US11011188B2 (en) * | 2019-03-12 | 2021-05-18 | Cordio Medical Ltd. | Diagnostic techniques based on speech-sample alignment |
US11024327B2 (en) | 2019-03-12 | 2021-06-01 | Cordio Medical Ltd. | Diagnostic techniques based on speech models |
US20230148945A1 (en) * | 2019-05-04 | 2023-05-18 | Intraneuron, Llc | Dynamic neuropsychological assessment tool |
EP4026142A4 (en) * | 2019-09-04 | 2023-09-27 | The Research Institute at Nationwide Children's Hospital | Computerized screening tool for behavioral health |
EP4048140A4 (en) * | 2019-10-25 | 2024-02-28 | Ellipsis Health, Inc. | Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions |
US11417330B2 (en) | 2020-02-21 | 2022-08-16 | BetterUp, Inc. | Determining conversation analysis indicators for a multiparty conversation |
WO2021226372A1 (en) * | 2020-05-08 | 2021-11-11 | Yamaha Motor Corporation, Usa | Progressive individual assessments using collected inputs |
WO2021247792A1 (en) * | 2020-06-04 | 2021-12-09 | Healmed Solutions Llc | Systems and methods for mental health care delivery via artificial intelligence |
US11657058B2 (en) * | 2020-07-15 | 2023-05-23 | Citrix Systems, Inc. | Systems and methods of enhancing mental health and performance |
US11860944B2 (en) | 2020-07-27 | 2024-01-02 | International Business Machines Corporation | State-aware interface |
JP2023536738A (en) * | 2020-08-04 | 2023-08-29 | エスアルファセラピューティクス,インコーポレーテッド | Digital devices and applications for the treatment of social communication disorders |
CN111956244B (en) * | 2020-08-26 | 2024-02-09 | 北京心灵力量科技有限公司 | Psychological test method and psychological test device |
US12009083B2 (en) | 2020-11-16 | 2024-06-11 | Electronic Caregiver, Inc. | Remote physical therapy and assessment of patients |
EP4002384A1 (en) * | 2020-11-16 | 2022-05-25 | Emocog Co., Ltd. | Device and method for voice-based trauma screening using deep-learning |
EP4024395B1 (en) * | 2020-12-30 | 2024-08-07 | audEERING GmbH | Speech analyser and related method |
CA3113414A1 (en) * | 2021-03-29 | 2022-09-29 | Mind Cure Health Inc. | Psychedelics protocol computer systems and methods |
GB202105656D0 (en) * | 2021-04-20 | 2021-06-02 | Syndi Ltd | Systems and methods for improving wellbeing through the generation of personalised app recommendations |
GB2620893A (en) * | 2021-05-06 | 2024-01-24 | Optimum Health Ltd | Systems and methods for real-time determinations of mental health disorders using multi-tier machine learning models based on user interactions with computer |
DE102021205548A1 (en) | 2021-05-31 | 2022-12-01 | VitaFluence.ai GmbH | Software-based, voice-driven, and objective diagnostic tool for use in the diagnosis of a chronic neurological disorder |
EP4109461A1 (en) * | 2021-06-22 | 2022-12-28 | Electronic Caregiver, Inc. | Atmospheric mirroring and dynamically varying three-dimensional assistant addison interface for external environments |
EP4109459A1 (en) * | 2021-06-22 | 2022-12-28 | Electronic Caregiver, Inc. | Atmospheric mirroring and dynamically varying three-dimensional assistant addison interface for behavioral environments |
EP4109460A1 (en) * | 2021-06-22 | 2022-12-28 | Electronic Caregiver, Inc. | Atmospheric mirroring and dynamically varying three-dimensional assistant addison interface for interior environments |
US20230036171A1 (en) * | 2021-07-29 | 2023-02-02 | Peer Collective Inc. | Systems and methods for rapid vetting of counselors via graded simulated exchanges |
WO2023018325A1 (en) * | 2021-08-09 | 2023-02-16 | Naluri Hidup Sdn Bhd | Systems and methods for conducting and assessing remote psychotherapy sessions |
US20230070665A1 (en) * | 2021-09-09 | 2023-03-09 | GenoEmote LLC | Method and system for validation of disease condition reprogramming based on personality to disease condition mapping |
AU2022348455A1 (en) * | 2021-09-15 | 2024-03-28 | OPTT Health, Inc. | Systems and methods for automating delivery of mental health therapy |
WO2023052928A1 (en) * | 2021-09-28 | 2023-04-06 | Sandeep Vohra | A machine learning and artificial intelligence based tool for screening emotional & mental health for an individual or a group or masses |
US20230197293A1 (en) * | 2021-12-22 | 2023-06-22 | Morgan State University | System and method for communications between patients and mental health providers |
US12107699B2 (en) | 2022-03-11 | 2024-10-01 | Read AI, Inc. | Systems and methods for creation and application of interaction analytics |
US11799679B2 (en) * | 2022-03-11 | 2023-10-24 | Read AI, Inc. | Systems and methods for creation and application of interaction analytics |
KR102702389B1 (en) * | 2022-04-13 | 2024-09-04 | 주식회사 제네시스랩 | Method, Server and Computer-readable Medium for Diagnosing Mental Illness based on User Interaction |
US20230411008A1 (en) * | 2022-06-03 | 2023-12-21 | The Covid Detection Foundation D/B/A Virufy | Artificial intelligence and machine learning techniques using input from mobile computing devices to diagnose medical issues |
US20240095445A1 (en) * | 2022-07-14 | 2024-03-21 | Cadence Solutions, Inc. | Systems and methods for language modeling with textual clincal data |
US20240021196A1 (en) * | 2022-07-15 | 2024-01-18 | ItsAllAbout, Inc. | Machine learning-based interactive conversation system |
WO2024026393A1 (en) * | 2022-07-27 | 2024-02-01 | Indr, Inc. | Methods and apparatus for ensemble machine learning models and natural language processing for predicting persona based on input patterns |
GB202211386D0 (en) * | 2022-08-04 | 2022-09-21 | Tutto Ltd | Devices, methods and artificial intelligence systems to monitor and improve physical, mental and financial health |
US20240069858A1 (en) * | 2022-08-26 | 2024-02-29 | ItsAllAbout, Inc. | Machine learning-based interactive conversation system with topic-specific state machines |
JP7550901B2 (en) | 2022-11-09 | 2024-09-13 | キヤノンメディカルシステムズ株式会社 | Clinical support system and clinical support device |
US20240324922A1 (en) * | 2023-02-24 | 2024-10-03 | Worcester Polytechnic Institute | System for detecting health experience from eye movement |
WO2024189917A1 (en) * | 2023-03-16 | 2024-09-19 | 富士通株式会社 | Workflow generation method, information processing device, and workflow generation program |
WO2024205061A1 (en) * | 2023-03-29 | 2024-10-03 | 의료법인 성광의료재단 | Mindfulness-based personalized cognitive therapy device and control method thereof |
CN117409964A (en) * | 2023-04-21 | 2024-01-16 | 云启智慧科技有限公司 | Comprehensive psychological evaluation method based on student in-school behavior analysis |
US11874843B1 (en) | 2023-05-01 | 2024-01-16 | Strategic Coach | Apparatus and methods for tracking progression of measured phenomena |
CN116807476B (en) * | 2023-08-25 | 2023-12-26 | 北京智精灵科技有限公司 | Multi-mode psychological health assessment system and method based on interface type emotion interaction |
CN117373658B (en) * | 2023-12-08 | 2024-03-08 | 北京回龙观医院(北京心理危机研究与干预中心) | Data processing-based auxiliary diagnosis and treatment system and method for depression |
CN118506987B (en) * | 2024-07-17 | 2024-09-20 | 四川大学华西医院 | Psychological assessment method, system, equipment and medium based on machine learning |
Family Cites Families (533)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US1015298A (en) | 1905-05-23 | 1912-01-23 | Brewer Griffin G | Spring-motor. |
US1015267A (en) | 1906-06-27 | 1912-01-16 | Packard Motor Car Co | Universal joint. |
US1024497A (en) | 1907-12-27 | 1912-04-30 | Charles G Buchanan | Crushing-machine. |
US1003776A (en) | 1908-11-23 | 1911-09-19 | Robb Engineering Company Ltd | Steam-boiler. |
US1002577A (en) | 1909-01-14 | 1911-09-05 | James Gayley | Method of drying air. |
US1019263A (en) | 1909-10-21 | 1912-03-05 | Pressed Steel Car Co | Dump-car. |
US1032507A (en) | 1909-11-30 | 1912-07-16 | Levi L Rowe | Chocolate-urn. |
US1026944A (en) | 1909-12-01 | 1912-05-21 | United Shoe Machinery Ab | Tacking mechanism. |
US1014979A (en) | 1910-01-22 | 1912-01-16 | Alfred Sykes | Percussive drill, hammer, or like percussive tool or machine. |
US1005205A (en) | 1910-01-29 | 1911-10-10 | Carl Hartmann | Aeroplane. |
US1011659A (en) | 1910-02-26 | 1911-12-12 | Edmund Risoliere Burrell | Manufacture of varnishes. |
US1012792A (en) | 1910-04-28 | 1911-12-26 | Louis Camille Arnaud | Frame for frictional change-of-speed gearing. |
US1026501A (en) | 1910-05-09 | 1912-05-14 | William H Etter | Window-screen. |
US1007438A (en) | 1910-07-02 | 1911-10-31 | Andrew Feja | Trolley-pole support. |
US1006806A (en) | 1910-07-18 | 1911-10-24 | Company Anker Werke A G Vormals Hengstenberg & Co | Automatic intermittent coupling. |
US1004966A (en) | 1910-08-19 | 1911-10-03 | Frederic R Barker | Demountable rim. |
US1020464A (en) | 1910-09-28 | 1912-03-19 | Winfield Scott Temple | Automobile-tire. |
US1006867A (en) | 1910-10-04 | 1911-10-24 | Arthur E Merkel | Transmission mechanism. |
US1025074A (en) | 1910-10-14 | 1912-04-30 | Giles E Ripley | Projecting apparatus for moving-picture machines. |
US1026502A (en) | 1910-10-28 | 1912-05-14 | Joseph P Hodgson | Ore-loader. |
US1031198A (en) | 1910-11-19 | 1912-07-02 | A E Cromwell | Trammel. |
US1022686A (en) | 1910-12-17 | 1912-04-09 | John Loftus | Gas-engine. |
US1035405A (en) | 1910-12-20 | 1912-08-13 | Telepost Company | Telegraphic-tape perforator. |
US1021086A (en) | 1910-12-20 | 1912-03-26 | United Shoe Machinery Ab | Lasting-machine. |
US1005609A (en) | 1911-01-03 | 1911-10-10 | William A Crawford-Frost | Combined aeroplane and parachute. |
US1006805A (en) | 1911-01-21 | 1911-10-24 | Gen Electric | Mercury-vapor device. |
US1017629A (en) | 1911-01-28 | 1912-02-13 | Metallurg Res Company | Manufacturing and refining copper alloys. |
US1010234A (en) | 1911-02-07 | 1911-11-28 | Thomas Christopherson | Automobile-tire. |
US1021672A (en) | 1911-03-24 | 1912-03-26 | Thomas Henry Hart | Permutation-lock. |
US1022345A (en) | 1911-04-17 | 1912-04-02 | Gustav H Hanna | Sidewalk-flusher. |
US1016982A (en) | 1911-04-19 | 1912-02-13 | Edward F Carpenter | Wall-supported incubator. |
US1001407A (en) | 1911-05-05 | 1911-08-22 | George E Humphrey | Railroad-tie. |
US1017689A (en) | 1911-05-06 | 1912-02-20 | Charles H Rettmann | Electrowhistle-alarm. |
US1009631A (en) | 1911-05-23 | 1911-11-21 | Jefferson Barry | Snap-hook. |
US1014248A (en) | 1911-05-31 | 1912-01-09 | Jeffrey Mfg Co | Lubricating device for conveyer-chains. |
US1026868A (en) | 1911-06-24 | 1912-05-21 | Daniel Kaber | Sign character. |
US1031186A (en) | 1911-06-28 | 1912-07-02 | Jean Baptiste Laplace | Dish-washing machine. |
US1027618A (en) | 1911-08-11 | 1912-05-28 | Celluloid Co | Solvent for nitrocellulose. |
US1027619A (en) | 1911-08-25 | 1912-05-28 | Celluloid Co | Solvent for acetyl cellulose. |
US1024275A (en) | 1911-09-13 | 1912-04-23 | Frances Morton | Spectacles. |
US1027626A (en) | 1911-10-21 | 1912-05-28 | Myron S Pelton | Photographic-process basket. |
US1026937A (en) | 1912-02-29 | 1912-05-21 | William J Taylor | Wheelbarrow. |
US1047811A (en) | 1912-03-30 | 1912-12-17 | Tuble J Hudson | Shaft-coupling. |
US1074864A (en) | 1912-09-06 | 1913-10-07 | Thomas H Howell | Twine-cutter. |
US1083995A (en) | 1913-02-03 | 1914-01-13 | William Russell Davis | Siphon-spillway. |
US5961332A (en) * | 1992-09-08 | 1999-10-05 | Joao; Raymond Anthony | Apparatus for processing psychological data and method of use thereof |
US6334778B1 (en) | 1994-04-26 | 2002-01-01 | Health Hero Network, Inc. | Remote psychological diagnosis and monitoring system |
US6206829B1 (en) | 1996-07-12 | 2001-03-27 | First Opinion Corporation | Computerized medical diagnostic and treatment advice system including network access |
US6725209B1 (en) | 1993-12-29 | 2004-04-20 | First Opinion Corporation | Computerized medical diagnostic and treatment advice system and method including mental status examination |
US5660176A (en) | 1993-12-29 | 1997-08-26 | First Opinion Corporation | Computerized medical diagnostic and treatment advice system |
US6578019B1 (en) | 1994-11-08 | 2003-06-10 | Canon Kabushiki Kaisha | Information processing system which understands information and acts accordingly and method therefor |
CN1194045A (en) * | 1995-07-25 | 1998-09-23 | 好乐思治疗公司 | Computer assisted methods for diagnosing diseases |
CA2249646C (en) | 1996-03-27 | 2010-07-27 | Michael Hersh | Application of multi-media technology to psychological and educational assessment tools |
GB9620082D0 (en) | 1996-09-26 | 1996-11-13 | Eyretel Ltd | Signal monitoring apparatus |
US20150199488A1 (en) | 1997-03-14 | 2015-07-16 | Best Doctors, Inc. | Systems and Methods for Interpreting Medical Information |
US6256613B1 (en) | 1997-03-14 | 2001-07-03 | Health Resources And Technology Inc. | Medical consultation management system |
US7756721B1 (en) | 1997-03-14 | 2010-07-13 | Best Doctors, Inc. | Health care management system |
JPH10289006A (en) | 1997-04-11 | 1998-10-27 | Yamaha Motor Co Ltd | Method for controlling object to be controlled using artificial emotion |
US20050005266A1 (en) | 1997-05-01 | 2005-01-06 | Datig William E. | Method of and apparatus for realizing synthetic knowledge processes in devices for useful applications |
FI981508A (en) | 1998-06-30 | 1999-12-31 | Nokia Mobile Phones Ltd | A method, apparatus, and system for evaluating a user's condition |
US20100299154A1 (en) | 1998-11-13 | 2010-11-25 | Anuthep Benja-Athon | Intelligent computer-biological electronic-neural health-care system |
US6466232B1 (en) | 1998-12-18 | 2002-10-15 | Tangis Corporation | Method and system for controlling presentation of information to a user based on the user's condition |
US7073129B1 (en) | 1998-12-18 | 2006-07-04 | Tangis Corporation | Automated selection of appropriate information based on a computer user's context |
US6801223B1 (en) | 1998-12-18 | 2004-10-05 | Tangis Corporation | Managing interactions between computer users' context models |
US7231439B1 (en) | 2000-04-02 | 2007-06-12 | Tangis Corporation | Dynamically swapping modules for determining a computer user's context |
US6842877B2 (en) | 1998-12-18 | 2005-01-11 | Tangis Corporation | Contextual responses based on automated learning techniques |
US7225229B1 (en) | 1998-12-18 | 2007-05-29 | Tangis Corporation | Automated pushing of computer user's context data to clients |
IL128000A0 (en) | 1999-01-11 | 1999-11-30 | Univ Ben Gurion | A method for the diagnosis of thought states by analysis of interword silences |
IL129399A (en) | 1999-04-12 | 2005-03-20 | Liberman Amir | Apparatus and methods for detecting emotions in the human voice |
US7429243B2 (en) | 1999-06-03 | 2008-09-30 | Cardiac Intelligence Corporation | System and method for transacting an automated patient communications session |
US6347261B1 (en) | 1999-08-04 | 2002-02-12 | Yamaha Hatsudoki Kabushiki Kaisha | User-machine interface system for enhanced interaction |
US6665644B1 (en) | 1999-08-10 | 2003-12-16 | International Business Machines Corporation | Conversational data mining |
US7222075B2 (en) | 1999-08-31 | 2007-05-22 | Accenture Llp | Detecting emotions using voice signal analysis |
US6658388B1 (en) | 1999-09-10 | 2003-12-02 | International Business Machines Corporation | Personality generator for conversational systems |
US6523008B1 (en) | 2000-02-18 | 2003-02-18 | Adam Avrunin | Method and system for truth-enabling internet communications via computer voice stress analysis |
AU2001250844A1 (en) | 2000-03-15 | 2001-09-24 | Stephen Faris | Apparatus for and method of assessing, monitoring, and reporting on behavioral health disorders |
US20010049597A1 (en) | 2000-03-16 | 2001-12-06 | Matthew Klipstein | Method and system for responding to a user based on a textual input |
US7917366B1 (en) | 2000-03-24 | 2011-03-29 | Exaudios Technologies | System and method for determining a personal SHG profile by voice analysis |
AU2001249768A1 (en) | 2000-04-02 | 2001-10-15 | Tangis Corporation | Soliciting information based on a computer user's context |
WO2001077952A1 (en) | 2000-04-06 | 2001-10-18 | Bindler Paul R | Automated and intelligent networked-based psychological services |
US7765113B2 (en) | 2000-06-02 | 2010-07-27 | Qualitymetric, Inc. | Method and system for health assessment and monitoring |
AU2001275604B2 (en) | 2000-07-27 | 2004-11-25 | Cogstate, Ltd | Psychological testing method and apparatus |
TWI221574B (en) | 2000-09-13 | 2004-10-01 | Agi Inc | Sentiment sensing method, perception generation method and device thereof and software |
US6731307B1 (en) | 2000-10-30 | 2004-05-04 | Koninklije Philips Electronics N.V. | User interface/entertainment device that simulates personal interaction and responds to user's mental state and/or personality |
US7953219B2 (en) | 2001-07-19 | 2011-05-31 | Nice Systems, Ltd. | Method apparatus and system for capturing and analyzing interaction based content |
IL144818A (en) | 2001-08-09 | 2006-08-20 | Voicesense Ltd | Method and apparatus for speech analysis |
EP1298645A1 (en) | 2001-09-26 | 2003-04-02 | Sony International (Europe) GmbH | Method for detecting emotions in speech, involving linguistic correlation information |
DE60115653T2 (en) | 2001-10-05 | 2006-08-10 | Sony Deutschland Gmbh | Method for detecting emotions using subgroup specialists |
US20060240393A1 (en) | 2001-11-08 | 2006-10-26 | Reeves Dennis L | System, method, and computer program product for an automated neuropsychological test |
WO2003043483A2 (en) | 2001-11-20 | 2003-05-30 | Avi Peled | System and method for diagnosis of mental disorders |
US7314444B2 (en) | 2002-01-25 | 2008-01-01 | Albert Einstein College Of Medicine Of Yeshiva University | Memory assessment by retrieval speed and uses thereof |
US7315821B2 (en) * | 2002-01-31 | 2008-01-01 | Sanyo Electric Co., Ltd. | System and method for health care information processing based on acoustic features |
EP1336956B1 (en) | 2002-02-13 | 2006-07-19 | Sony Deutschland GmbH | Method, system and computer program for recognizing speech/speaker using emotional state change to govern unsupervised adaptation of the recognition process |
US20030163311A1 (en) | 2002-02-26 | 2003-08-28 | Li Gong | Intelligent social agents |
DE10210799B4 (en) | 2002-03-12 | 2006-04-27 | Siemens Ag | Adaptation of a human-machine interface depending on a psycho-profile and a current state of a user |
US20030180698A1 (en) | 2002-03-22 | 2003-09-25 | Alen Salerian | Mental disorder screening tool and method of screening subjects for mental disorders |
US9049571B2 (en) | 2002-04-24 | 2015-06-02 | Ipventure, Inc. | Method and system for enhanced messaging |
JP2003330490A (en) | 2002-05-15 | 2003-11-19 | Fujitsu Ltd | Audio conversation device |
US7665024B1 (en) | 2002-07-22 | 2010-02-16 | Verizon Services Corp. | Methods and apparatus for controlling a user interface based on the emotional state of a user |
US20040092809A1 (en) | 2002-07-26 | 2004-05-13 | Neurion Inc. | Methods for measurement and analysis of brain activity |
WO2004030532A1 (en) | 2002-10-03 | 2004-04-15 | The University Of Queensland | Method and apparatus for assessing psychiatric or physical disorders |
US6993381B2 (en) | 2002-10-25 | 2006-01-31 | Connolly John F | Linking neurophysiological and neuropsychological measures for cognitive function assessment in a patient |
US8321427B2 (en) | 2002-10-31 | 2012-11-27 | Promptu Systems Corporation | Method and apparatus for generation and augmentation of search terms from external and internal sources |
US7822611B2 (en) | 2002-11-12 | 2010-10-26 | Bezar David B | Speaker intent analysis system |
US7874983B2 (en) | 2003-01-27 | 2011-01-25 | Motorola Mobility, Inc. | Determination of emotional and physiological states of a recipient of a communication |
US7347818B2 (en) | 2003-02-24 | 2008-03-25 | Neurotrax Corporation | Standardized medical cognitive assessment tool |
US7280968B2 (en) | 2003-03-25 | 2007-10-09 | International Business Machines Corporation | Synthetically generated speech responses including prosodic characteristics of speech inputs |
US20040210159A1 (en) | 2003-04-15 | 2004-10-21 | Osman Kibar | Determining a psychological state of a subject |
US20040243443A1 (en) | 2003-05-29 | 2004-12-02 | Sanyo Electric Co., Ltd. | Healthcare support apparatus, health care support system, health care support method and health care support program |
EP1669172B1 (en) | 2003-08-12 | 2013-10-02 | Advanced Telecommunications Research Institute International | Communication robot control system |
KR20050027361A (en) | 2003-09-15 | 2005-03-21 | 주식회사 팬택앤큐리텔 | Method of monitoring psychology condition of talkers in the communication terminal |
US7272559B1 (en) | 2003-10-02 | 2007-09-18 | Ceie Specs, Inc. | Noninvasive detection of neuro diseases |
US7933226B2 (en) | 2003-10-22 | 2011-04-26 | Palo Alto Research Center Incorporated | System and method for providing communication channels that each comprise at least one property dynamically changeable during social interactions |
US20050209181A1 (en) | 2003-11-05 | 2005-09-22 | Huda Akil | Compositions and methods for diagnosing and treating mental disorders |
US7983920B2 (en) | 2003-11-18 | 2011-07-19 | Microsoft Corporation | Adaptive computing environment |
JP2005157494A (en) | 2003-11-20 | 2005-06-16 | Aruze Corp | Conversation control apparatus and conversation control method |
US7356168B2 (en) * | 2004-04-23 | 2008-04-08 | Hitachi, Ltd. | Biometric verification system and method utilizing a data classifier and fusion model |
US20050246165A1 (en) | 2004-04-29 | 2005-11-03 | Pettinelli Eugene E | System and method for analyzing and improving a discourse engaged in by a number of interacting agents |
US20060052674A1 (en) | 2004-09-04 | 2006-03-09 | Steven Eisenstein | Software method of determining and treating psychiatric disorders |
US7392187B2 (en) | 2004-09-20 | 2008-06-24 | Educational Testing Service | Method and system for the automatic generation of speech features for scoring high entropy speech |
DE102004050785A1 (en) | 2004-10-14 | 2006-05-04 | Deutsche Telekom Ag | Method and arrangement for processing messages in the context of an integrated messaging system |
WO2006059325A1 (en) | 2004-11-30 | 2006-06-08 | Oded Sarel | Method and system of indicating a condition of an individual |
US8214214B2 (en) | 2004-12-03 | 2012-07-03 | Phoenix Solutions, Inc. | Emotion detection device and method for use in distributed systems |
US20060122834A1 (en) | 2004-12-03 | 2006-06-08 | Bennett Ian M | Emotion detection device & method for use in distributed systems |
JP4423327B2 (en) | 2005-02-08 | 2010-03-03 | 日本電信電話株式会社 | Information communication terminal, information communication system, information communication method, information communication program, and recording medium recording the same |
US7398213B1 (en) | 2005-05-17 | 2008-07-08 | Exaudios Technologies | Method and system for diagnosing pathological phenomenon using a voice signal |
US7995717B2 (en) | 2005-05-18 | 2011-08-09 | Mattersight Corporation | Method and system for analyzing separated voice data of a telephonic communication between a customer and a contact center by applying a psychological behavioral model thereto |
KR101248353B1 (en) | 2005-06-09 | 2013-04-02 | 가부시키가이샤 에이.지.아이 | Speech analyzer detecting pitch frequency, speech analyzing method, and speech analyzing program |
US7940897B2 (en) | 2005-06-24 | 2011-05-10 | American Express Travel Related Services Company, Inc. | Word recognition system and method for customer and employee assessment |
WO2007017853A1 (en) | 2005-08-08 | 2007-02-15 | Nice Systems Ltd. | Apparatus and methods for the detection of emotions in audio interactions |
EP1758398A1 (en) | 2005-08-23 | 2007-02-28 | Syneola SA | Multilevel semiotic and fuzzy logic user and metadata interface means for interactive multimedia system having cognitive adaptive capability |
US20070055524A1 (en) | 2005-09-08 | 2007-03-08 | Zhen-Hai Cao | Speech dialog method and device |
US8209182B2 (en) | 2005-11-30 | 2012-06-26 | University Of Southern California | Emotion recognition system |
WO2007072485A1 (en) | 2005-12-22 | 2007-06-28 | Exaudios Technologies Ltd. | System for indicating emotional attitudes through intonation analysis and methods thereof |
US20070192108A1 (en) | 2006-02-15 | 2007-08-16 | Alon Konchitsky | System and method for detection of emotion in telecommunications |
CA2640748C (en) | 2006-02-28 | 2010-04-20 | Phenomenome Discoveries Inc. | Methods for the diagnosis of dementia and other neurological disorders |
WO2007138944A1 (en) | 2006-05-26 | 2007-12-06 | Nec Corporation | Information giving system, information giving method, information giving program, and information giving program recording medium |
US20070288266A1 (en) | 2006-06-02 | 2007-12-13 | Suzanne Sysko | System and methods for chronic disease management and health assessment |
US7849115B2 (en) | 2006-06-05 | 2010-12-07 | Bruce Reiner | Method and apparatus for adapting computer-based systems to end-user profiles |
US7962342B1 (en) | 2006-08-22 | 2011-06-14 | Avaya Inc. | Dynamic user interface for the temporarily impaired based on automatic analysis for speech patterns |
US20160260189A9 (en) | 2006-09-08 | 2016-09-08 | American Well Corporation | Reverse Provider Practice |
US20160210636A9 (en) | 2006-09-08 | 2016-07-21 | American Well Corporation | Verification processing for brokered engagements |
US20160210005A9 (en) | 2006-09-08 | 2016-07-21 | American Well Corporation | Quick-Connection for Brokered Engagements |
WO2008030855A1 (en) | 2006-09-08 | 2008-03-13 | American Well Inc. | Connecting consumers with service providers |
US20090113312A1 (en) | 2006-09-08 | 2009-04-30 | American Well Systems | Connecting Providers of Legal Services |
US7848937B2 (en) | 2006-09-08 | 2010-12-07 | American Well Corporation | Connecting consumers with service providers |
US8719047B2 (en) | 2008-06-17 | 2014-05-06 | American Well Corporation | Patient directed integration of remotely stored medical information with a brokerage system |
US20090138317A1 (en) | 2006-09-08 | 2009-05-28 | Roy Schoenberg | Connecting Providers of Financial Services |
US7590550B2 (en) | 2006-09-08 | 2009-09-15 | American Well Inc. | Connecting consumers with service providers |
AU2012216577B2 (en) | 2006-09-08 | 2013-10-24 | American Well Corporation | Connecting consumers with service providers |
EP2063416B1 (en) | 2006-09-13 | 2011-11-16 | Nippon Telegraph And Telephone Corporation | Feeling detection method, feeling detection device, feeling detection program containing the method, and recording medium containing the program |
WO2008055078A2 (en) | 2006-10-27 | 2008-05-08 | Vivometrics, Inc. | Identification of emotional states using physiological responses |
DE102006055864A1 (en) | 2006-11-22 | 2008-05-29 | Deutsche Telekom Ag | Dialogue adaptation and dialogue system for implementation |
US8160210B2 (en) | 2007-01-08 | 2012-04-17 | Motorola Solutions, Inc. | Conversation outcome enhancement method and apparatus |
WO2008092473A1 (en) | 2007-01-31 | 2008-08-07 | Telecom Italia S.P.A. | Customizable method and system for emotional recognition |
US20080208015A1 (en) | 2007-02-09 | 2008-08-28 | Morris Margaret E | System, apparatus and method for real-time health feedback on a mobile device based on physiological, contextual and self-monitored indicators of mental and physical health states |
WO2008103827A1 (en) | 2007-02-22 | 2008-08-28 | Welldoc Communications, Inc. | System and method for providing treatment recommendations based on models |
US8838513B2 (en) | 2011-03-24 | 2014-09-16 | WellDoc, Inc. | Adaptive analytical behavioral and health assistant system and related method of use |
US8219406B2 (en) | 2007-03-15 | 2012-07-10 | Microsoft Corporation | Speech-centric multimodal user interface design in mobile technology |
US7844609B2 (en) * | 2007-03-16 | 2010-11-30 | Expanse Networks, Inc. | Attribute combination discovery |
WO2008115927A2 (en) | 2007-03-20 | 2008-09-25 | Cogito Health Inc. | Methods and systems for performing a clinical assessment |
US20080243005A1 (en) | 2007-03-30 | 2008-10-02 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Computational user-health testing |
US20090005654A1 (en) | 2007-03-30 | 2009-01-01 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Computational user-health testing |
US20090024050A1 (en) | 2007-03-30 | 2009-01-22 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Computational user-health testing |
US20080242950A1 (en) | 2007-03-30 | 2008-10-02 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Computational user-health testing |
US20080319276A1 (en) | 2007-03-30 | 2008-12-25 | Searete Llc, A Limited Liability Corporation Of The State Of Delaware | Computational user-health testing |
US20090018407A1 (en) | 2007-03-30 | 2009-01-15 | Searete Llc, A Limited Corporation Of The State Of Delaware | Computational user-health testing |
US20080294014A1 (en) | 2007-05-21 | 2008-11-27 | Barry Goodfield | Process for diagnosing and treating a psychological condition or assigning a personality classification to an individual |
US7935307B2 (en) | 2007-05-31 | 2011-05-03 | EOS Health, DVC. | Disposable, refillable glucometer with cell phone interface for transmission of results |
US9578152B2 (en) | 2007-06-15 | 2017-02-21 | American Well Corporation | Telephonic-based engagements |
US8166109B2 (en) | 2007-06-21 | 2012-04-24 | Cisco Technology, Inc. | Linking recognized emotions to non-visual representations |
WO2010013228A1 (en) | 2008-07-31 | 2010-02-04 | Ginger Software, Inc. | Automatic context sensitive language generation, correction and enhancement using an internet corpus |
CN101802812B (en) | 2007-08-01 | 2015-07-01 | 金格软件有限公司 | Automatic context sensitive language correction and enhancement using an internet corpus |
US7945456B2 (en) | 2007-10-01 | 2011-05-17 | American Well Corporation | Documenting remote engagements |
US7933783B2 (en) | 2007-10-01 | 2011-04-26 | American Well Corporation | Medical listener |
US7937275B2 (en) | 2007-10-02 | 2011-05-03 | American Well Corporation | Identifying clinical trial candidates |
US8504382B2 (en) | 2007-10-02 | 2013-08-06 | American Well Corporation | Identifying trusted providers |
US7840418B2 (en) | 2007-10-02 | 2010-11-23 | American Well Corporation | Tracking the availability of service providers across multiple platforms |
US7890351B2 (en) | 2007-10-02 | 2011-02-15 | American Well Corporation | Managing utilization |
US8521553B2 (en) | 2007-10-02 | 2013-08-27 | American Well Corporation | Identification of health risks and suggested treatment actions |
US20090089147A1 (en) | 2007-10-02 | 2009-04-02 | American Well Inc. | Provider supply & consumer demand management |
US20090099848A1 (en) | 2007-10-16 | 2009-04-16 | Moshe Lerner | Early diagnosis of dementia |
US7818183B2 (en) | 2007-10-22 | 2010-10-19 | American Well Corporation | Connecting consumers with service providers |
US9513699B2 (en) | 2007-10-24 | 2016-12-06 | Invention Science Fund I, LL | Method of selecting a second content based on a user's reaction to a first content |
JP2011502564A (en) * | 2007-11-02 | 2011-01-27 | シーグベルト ワーケンチン, | System and method for assessment of brain dysfunction induced by aging brain and brain disease by speech analysis |
US20090150252A1 (en) | 2007-12-10 | 2009-06-11 | American Well Inc. | Connecting Service Providers And Consumers Of Services Independent Of Geographical Location |
EP2245568A4 (en) * | 2008-02-20 | 2012-12-05 | Univ Mcmaster | Expert system for determining patient treatment response |
US8346680B2 (en) | 2008-03-31 | 2013-01-01 | Intuit Inc. | Method and system for dynamic adaptation of user experience in an application |
US7912737B2 (en) | 2008-04-07 | 2011-03-22 | American Well Corporation | Continuity of medical care |
JP5474933B2 (en) | 2008-04-16 | 2014-04-16 | ジンジャー ソフトウェア、インコーポレイティッド | A system for teaching writing based on the user's past writing |
US7890345B2 (en) | 2008-04-18 | 2011-02-15 | American Well Corporation | Establishment of a telephone based engagement |
US8066640B2 (en) | 2008-04-22 | 2011-11-29 | EOS Health, Inc. | Cellular GPRS-communication linked glucometer—pedometer |
US7801686B2 (en) | 2008-04-24 | 2010-09-21 | The Invention Science Fund I, Llc | Combination treatment alteration methods and systems |
WO2009134755A2 (en) | 2008-04-28 | 2009-11-05 | Alexandria Investment Research And Technology, Llc | Adaptive knowledge platform |
EP2124223B1 (en) | 2008-05-16 | 2018-03-28 | Beyond Verbal Communication Ltd. | Methods and systems for diagnosing a pathological phenomenon using a voice signal |
US8055591B2 (en) | 2008-05-23 | 2011-11-08 | The Invention Science Fund I, Llc | Acquisition and association of data indicative of an inferred mental state of an authoring user |
US7904507B2 (en) | 2008-05-23 | 2011-03-08 | The Invention Science Fund I, Llc | Determination of extent of congruity between observation of authoring user and observation of receiving user |
US9192300B2 (en) | 2008-05-23 | 2015-11-24 | Invention Science Fund I, Llc | Acquisition and particular association of data indicative of an inferred mental state of an authoring user |
US8615664B2 (en) | 2008-05-23 | 2013-12-24 | The Invention Science Fund I, Llc | Acquisition and particular association of inference data indicative of an inferred mental state of an authoring user and source identity data |
US9101263B2 (en) | 2008-05-23 | 2015-08-11 | The Invention Science Fund I, Llc | Acquisition and association of data indicative of an inferred mental state of an authoring user |
US8086563B2 (en) | 2008-05-23 | 2011-12-27 | The Invention Science Fund I, Llc | Acquisition and particular association of data indicative of an inferred mental state of an authoring user |
IL192013A (en) | 2008-06-05 | 2015-10-29 | Yoram Levanon | Method and system for diagnosing a patient using a voice-based data analysis |
US8195460B2 (en) | 2008-06-17 | 2012-06-05 | Voicesense Ltd. | Speaker characterization through speech analysis |
US20090313076A1 (en) | 2008-06-17 | 2009-12-17 | Roy Schoenberg | Arranging remote engagements |
US8260729B2 (en) | 2008-11-21 | 2012-09-04 | The Invention Science Fund I, Llc | Soliciting data indicating at least one subjective user state in response to acquisition of data indicating at least one objective occurrence |
WO2010068882A2 (en) | 2008-12-11 | 2010-06-17 | Nortel Networks Limited | Automated text-based messaging interaction using natural language understanding technologies |
US20130035563A1 (en) | 2010-01-26 | 2013-02-07 | Angelides Kimon J | Progressively Personalized Wireless-Based Interactive Diabetes Treatment |
US20100191075A1 (en) | 2009-01-26 | 2010-07-29 | Kimon Angelides | Progressively Personalized Decision-Support Menu for Controlling Diabetes |
US8812244B2 (en) | 2009-01-26 | 2014-08-19 | EOS Health, Inc. | Personalized wireless-based interactive diabetes treatment |
US10332054B2 (en) | 2009-02-09 | 2019-06-25 | Mandometer Ab | Method, generator device, computer program product and system for generating medical advice |
US20100223341A1 (en) | 2009-02-27 | 2010-09-02 | Microsoft Corporation | Electronic messaging tailored to user interest |
US9603564B2 (en) | 2009-02-27 | 2017-03-28 | The Forbes Consulting Group, Llc | Methods and systems for assessing psychological characteristics |
US20110118555A1 (en) | 2009-04-29 | 2011-05-19 | Abhijit Dhumne | System and methods for screening, treating, and monitoring psychological conditions |
US20100292545A1 (en) | 2009-05-14 | 2010-11-18 | Advanced Brain Monitoring, Inc. | Interactive psychophysiological profiler method and system |
US9015609B2 (en) | 2009-05-18 | 2015-04-21 | American Well Corporation | Provider to-provider consultations |
WO2010148141A2 (en) | 2009-06-16 | 2010-12-23 | University Of Florida Research Foundation, Inc. | Apparatus and method for speech analysis |
US8463620B2 (en) | 2009-07-08 | 2013-06-11 | American Well Corporation | Connecting consumers with service providers |
WO2011011413A2 (en) | 2009-07-20 | 2011-01-27 | University Of Florida Research Foundation, Inc. | Method and apparatus for evaluation of a subject's emotional, physiological and/or physical state with the subject's physiological and/or acoustic data |
US20110054985A1 (en) | 2009-08-25 | 2011-03-03 | Cisco Technology, Inc. | Assessing a communication style of a person to generate a recommendation concerning communication by the person in a particular communication environment |
US8777630B2 (en) | 2009-09-16 | 2014-07-15 | Cerebral Assessment Systems, Inc. | Method and system for quantitative assessment of facial emotion sensitivity |
US8500635B2 (en) | 2009-09-17 | 2013-08-06 | Blife Inc. | Mobile system and method for addressing symptoms related to mental health conditions |
US20140247989A1 (en) | 2009-09-30 | 2014-09-04 | F. Scott Deaver | Monitoring the emotional state of a computer user by analyzing screen capture images |
US20110119079A1 (en) | 2009-11-19 | 2011-05-19 | American Well Corporation | Connecting Consumers with Service Providers |
CA2787390A1 (en) | 2010-02-01 | 2011-08-04 | Ginger Software, Inc. | Automatic context sensitive language correction using an internet corpus particularly for small keyboard devices |
US9767470B2 (en) | 2010-02-26 | 2017-09-19 | Forbes Consulting Group, Llc | Emotional survey |
WO2011115956A1 (en) | 2010-03-15 | 2011-09-22 | Mcw Research Foundation, Inc. | Systems and methods for detection and prediction of brain disorders based on neural network interaction |
JP2011209787A (en) | 2010-03-29 | 2011-10-20 | Sony Corp | Information processor, information processing method, and program |
JP5834449B2 (en) | 2010-04-22 | 2015-12-24 | 富士通株式会社 | Utterance state detection device, utterance state detection program, and utterance state detection method |
US20110263946A1 (en) * | 2010-04-22 | 2011-10-27 | Mit Media Lab | Method and system for real-time and offline analysis, inference, tagging of and responding to person(s) experiences |
WO2011140113A1 (en) | 2010-05-03 | 2011-11-10 | Lark Technologies, Inc. | System and method for providing sleep quality feedback |
US9634855B2 (en) | 2010-05-13 | 2017-04-25 | Alexander Poltorak | Electronic personal interactive device that determines topics of interest using a conversational agent |
US8595005B2 (en) | 2010-05-31 | 2013-11-26 | Simple Emotion, Inc. | System and method for recognizing emotional state from a speech signal |
US20140200463A1 (en) | 2010-06-07 | 2014-07-17 | Affectiva, Inc. | Mental state well being monitoring |
US10628985B2 (en) | 2017-12-01 | 2020-04-21 | Affectiva, Inc. | Avatar image animation using translation vectors |
US20170095192A1 (en) | 2010-06-07 | 2017-04-06 | Affectiva, Inc. | Mental state analysis using web servers |
US10628741B2 (en) | 2010-06-07 | 2020-04-21 | Affectiva, Inc. | Multimodal machine learning for emotion metrics |
US20130115582A1 (en) * | 2010-06-07 | 2013-05-09 | Affectiva, Inc. | Affect based concept testing |
US9058816B2 (en) | 2010-07-06 | 2015-06-16 | Rmit University | Emotional and/or psychiatric state detection |
US9002773B2 (en) * | 2010-09-24 | 2015-04-07 | International Business Machines Corporation | Decision-support application and system for problem solving using a question-answering system |
JP5271330B2 (en) | 2010-09-29 | 2013-08-21 | 株式会社東芝 | Spoken dialogue system, method, and program |
US8784311B2 (en) | 2010-10-05 | 2014-07-22 | University Of Florida Research Foundation, Incorporated | Systems and methods of screening for medical states using speech and other vocal behaviors |
JP5494468B2 (en) | 2010-12-27 | 2014-05-14 | 富士通株式会社 | Status detection device, status detection method, and program for status detection |
US20130018663A1 (en) | 2011-07-12 | 2013-01-17 | American Well Corporation | Connecting Consumers with Providers |
US20130036153A1 (en) | 2011-08-05 | 2013-02-07 | Roy Schoenberg | Mobile Applications to Interface to a Brokerage System |
US20130046550A1 (en) | 2011-08-17 | 2013-02-21 | American Well Corporation | Tracking Status of Service Providers Across Plural Provider Practices |
US9762719B2 (en) | 2011-09-09 | 2017-09-12 | Qualcomm Incorporated | Systems and methods to enhance electronic communications with emotional context |
US8311973B1 (en) | 2011-09-24 | 2012-11-13 | Zadeh Lotfi A | Methods and systems for applications for Z-numbers |
US8392585B1 (en) | 2011-09-26 | 2013-03-05 | Theranos, Inc. | Methods and systems for facilitating network connectivity |
JP5772448B2 (en) | 2011-09-27 | 2015-09-02 | 富士ゼロックス株式会社 | Speech analysis system and speech analysis apparatus |
EP2575064A1 (en) | 2011-09-30 | 2013-04-03 | General Electric Company | Telecare and/or telehealth communication method and system |
US9220453B2 (en) | 2011-10-20 | 2015-12-29 | Cogcubed Corporation | Apparatus for mounting a wireless sensor on a human for diagnosing and treating cognitive disorders |
US9014614B2 (en) | 2011-10-20 | 2015-04-21 | Cogcubed Corporation | Cognitive assessment and treatment platform utilizing a distributed tangible-graphical user interface device |
US9324241B2 (en) | 2011-10-20 | 2016-04-26 | Cogcubed Corporation | Predictive executive functioning models using interactive tangible-graphical interface devices |
US9867562B2 (en) | 2011-10-20 | 2018-01-16 | Teladoc, Inc. | Vector space methods towards the assessment and improvement of neurological conditions |
US20150282752A1 (en) | 2011-10-20 | 2015-10-08 | Cogcubed Corporation | Spatial positioning surface for neurological assessment and treatment |
US9443205B2 (en) | 2011-10-24 | 2016-09-13 | President And Fellows Of Harvard College | Enhancing diagnosis of disorder through artificial intelligence and mobile health technologies without compromising accuracy |
US20130110551A1 (en) | 2011-10-28 | 2013-05-02 | WellDoc, Inc. | Systems and methods for managing chronic conditions |
US20140214442A1 (en) | 2011-11-03 | 2014-07-31 | Sean Patrick Duffy | Systems and Methods for Tracking Participants in a Health Improvement Program |
US20130117040A1 (en) | 2011-11-03 | 2013-05-09 | Omada Health, Inc. | Method and System for Supporting a Health Regimen |
US20140214443A1 (en) | 2011-11-03 | 2014-07-31 | Sean Patrick Duffy | Systems and Methods for Displaying Metrics Associated With a Health Improvement Program |
US20170344726A1 (en) | 2011-11-03 | 2017-11-30 | Omada Health, Inc. | Method and system for supporting a health regimen |
US20140222454A1 (en) | 2011-11-03 | 2014-08-07 | Sean Patrick Duffy | Systems and Methods That Administer a Health Improvement Program and an Adjunct Medical Treatment |
US20130124631A1 (en) | 2011-11-04 | 2013-05-16 | Fidelus Technologies, Llc. | Apparatus, system, and method for digital communications driven by behavior profiles of participants |
US9819711B2 (en) | 2011-11-05 | 2017-11-14 | Neil S. Davey | Online social interaction, education, and health care by analysing affect and cognitive features |
US20130123583A1 (en) * | 2011-11-10 | 2013-05-16 | Erica L. Hill | System and method for analyzing digital media preferences to generate a personality profile |
US10176299B2 (en) | 2011-11-11 | 2019-01-08 | Rutgers, The State University Of New Jersey | Methods for the diagnosis and treatment of neurological disorders |
US20130130789A1 (en) | 2011-11-18 | 2013-05-23 | Sri International | User-interactive application framework for electronic devices |
USD676555S1 (en) | 2011-12-02 | 2013-02-19 | EosHealth, Inc. | Glucometer |
US20130159228A1 (en) | 2011-12-16 | 2013-06-20 | Microsoft Corporation | Dynamic user experience adaptation and services provisioning |
US9208661B2 (en) | 2012-01-06 | 2015-12-08 | Panasonic Corporation Of North America | Context dependent application/event activation for people with various cognitive ability levels |
US8803690B2 (en) | 2012-01-06 | 2014-08-12 | Panasonic Corporation Of North America | Context dependent application/event activation for people with various cognitive ability levels |
US9239989B2 (en) | 2012-03-28 | 2016-01-19 | General Electric Company | Computer-implemented system with adaptive cognitive features and method of using the same |
US20130297536A1 (en) | 2012-05-01 | 2013-11-07 | Bernie Almosni | Mental health digital behavior monitoring support system and method |
US20140121540A1 (en) | 2012-05-09 | 2014-05-01 | Aliphcom | System and method for monitoring the health of a user |
US10116598B2 (en) | 2012-08-15 | 2018-10-30 | Imvu, Inc. | System and method for increasing clarity and expressiveness in network communications |
US9425974B2 (en) | 2012-08-15 | 2016-08-23 | Imvu, Inc. | System and method for increasing clarity and expressiveness in network communications |
US10748645B2 (en) | 2012-08-16 | 2020-08-18 | Ginger.io, Inc. | Method for providing patient indications to an entity |
US10276260B2 (en) | 2012-08-16 | 2019-04-30 | Ginger.io, Inc. | Method for providing therapy to an individual |
US20140052465A1 (en) | 2012-08-16 | 2014-02-20 | Ginger.io, Inc. | Method for modeling behavior and health changes |
US20170004260A1 (en) | 2012-08-16 | 2017-01-05 | Ginger.io, Inc. | Method for providing health therapeutic interventions to a user |
US10068670B2 (en) | 2012-08-16 | 2018-09-04 | Ginger.io, Inc. | Method for modeling behavior and depression state |
US10102341B2 (en) | 2012-08-16 | 2018-10-16 | Ginger.io, Inc. | Method for managing patient quality of life |
US10741285B2 (en) | 2012-08-16 | 2020-08-11 | Ginger.io, Inc. | Method and system for providing automated conversations |
US10265028B2 (en) | 2012-08-16 | 2019-04-23 | Ginger.io, Inc. | Method and system for modeling behavior and heart disease state |
US10068060B2 (en) | 2012-08-16 | 2018-09-04 | Ginger.io, Inc. | Method for modeling behavior and psychotic disorders |
US10740438B2 (en) | 2012-08-16 | 2020-08-11 | Ginger.io, Inc. | Method and system for characterizing and/or treating poor sleep behavior |
US10650920B2 (en) | 2012-08-16 | 2020-05-12 | Ginger.io, Inc. | Method and system for improving care determination |
WO2014037937A2 (en) | 2012-09-06 | 2014-03-13 | Beyond Verbal Communication Ltd | System and method for selection of data according to measurement of physiological parameters |
US9536049B2 (en) | 2012-09-07 | 2017-01-03 | Next It Corporation | Conversational virtual healthcare assistant |
US20150170531A1 (en) | 2012-10-08 | 2015-06-18 | Lark Technologies, Inc. | Method for communicating wellness-related communications to a user |
WO2014058894A1 (en) | 2012-10-08 | 2014-04-17 | Lark Technologies, Inc. | Method for delivering behavior change directives to a user |
US20150294595A1 (en) | 2012-10-08 | 2015-10-15 | Lark Technologies, Inc. | Method for providing wellness-related communications to a user |
US9579056B2 (en) * | 2012-10-16 | 2017-02-28 | University Of Florida Research Foundation, Incorporated | Screening for neurological disease using speech articulation characteristics |
US9202520B1 (en) | 2012-10-17 | 2015-12-01 | Amazon Technologies, Inc. | Systems and methods for determining content preferences based on vocal utterances and/or movement by a user |
US9031293B2 (en) | 2012-10-19 | 2015-05-12 | Sony Computer Entertainment Inc. | Multi-modal sensor based emotion recognition and emotional interface |
JP2016502422A (en) | 2012-10-22 | 2016-01-28 | コグキューブド コーポレーション | Cognitive assessment treatment platform using distributed tangible graphical user interface devices |
US20140122109A1 (en) | 2012-10-29 | 2014-05-01 | Consuli, Inc. | Clinical diagnosis objects interaction |
US9830423B2 (en) | 2013-03-13 | 2017-11-28 | Abhishek Biswas | Virtual communication platform for healthcare |
WO2014069076A1 (en) | 2012-10-31 | 2014-05-08 | 日本電気株式会社 | Conversation analysis device and conversation analysis method |
US20150304381A1 (en) | 2012-11-02 | 2015-10-22 | Fidelus Technologies, Llc | Apparatus, system, and method for digital communications driven by behavior profiles of participants |
US9679553B2 (en) | 2012-11-08 | 2017-06-13 | Nec Corporation | Conversation-sentence generation device, conversation-sentence generation method, and conversation-sentence generation program |
US9570064B2 (en) | 2012-11-08 | 2017-02-14 | Nec Corporation | Conversation-sentence generation device, conversation-sentence generation method, and conversation-sentence generation program |
KR102011495B1 (en) | 2012-11-09 | 2019-08-16 | 삼성전자 주식회사 | Apparatus and method for determining user's mental state |
EP2932899A4 (en) | 2012-12-15 | 2016-08-10 | Tokyo Inst Tech | Apparatus for evaluating human mental state |
US20140195255A1 (en) | 2013-01-08 | 2014-07-10 | Robert Bosch Gmbh | System And Method For Assessment Of Patient Health Using Patient Generated Data |
KR20140094336A (en) | 2013-01-22 | 2014-07-30 | 삼성전자주식회사 | A electronic device for extracting a emotion of user and method for extracting a emotion of user in the electronic device |
US9817949B2 (en) | 2013-02-07 | 2017-11-14 | Christian Poulin | Text based prediction of psychological cohorts |
USD727941S1 (en) | 2013-02-19 | 2015-04-28 | Livongo Health, Inc. | Glucometer with an accelerometer screen graphical user interface |
USD726757S1 (en) | 2013-02-22 | 2015-04-14 | Lwongo Health, Inc. | Glucometer with a pattern tracking screen graphical user interface |
USD726205S1 (en) | 2013-02-22 | 2015-04-07 | Livongo Health, Inc. | Glucometer with a sound and alarm screen graphical user interface |
USD726751S1 (en) | 2013-02-22 | 2015-04-14 | Livongo Health, Inc. | Glucometer with a blood glucose check screen graphical user interface |
USD733172S1 (en) | 2013-02-22 | 2015-06-30 | Livongo Health | Glucometer screen for checking ketone levels with graphical user interface |
USD726754S1 (en) | 2013-02-22 | 2015-04-14 | Livongo Health, Inc. | Glucometer uploading screen with a graphical user interface |
USD726752S1 (en) | 2013-02-22 | 2015-04-14 | Livongo Health, Inc. | Glucometer with an upload screen graphical user interface |
USD726206S1 (en) | 2013-02-22 | 2015-04-07 | Livongohealh, Inc | Glucometer with a high blood glucose level screen graphical user interface |
USD726756S1 (en) | 2013-02-22 | 2015-04-14 | Livongo Health, Inc. | Glucometer with a logbook screen graphical user interface |
USD726755S1 (en) | 2013-02-22 | 2015-04-14 | Livongo Health, Inc. | Glucometer with a trends screen graphical user interface |
USD726753S1 (en) | 2013-02-22 | 2015-04-14 | Livongo Health, Inc. | Glucometer with a pedometer check screen graphical user interface |
USD727942S1 (en) | 2013-02-26 | 2015-04-28 | Livongo Health, Inc. | Glucometer with an activity tracking screen graphical user interface |
DE102013101871A1 (en) | 2013-02-26 | 2014-08-28 | PSYWARE GmbH | Word-based speech analysis and speech analysis facility |
USD726207S1 (en) | 2013-02-26 | 2015-04-07 | Livongo Health, Inc. | Glucometer with a data tracking screen graphical user interface |
JP6268717B2 (en) | 2013-03-04 | 2018-01-31 | 富士通株式会社 | State estimation device, state estimation method, and computer program for state estimation |
US20140257852A1 (en) | 2013-03-05 | 2014-09-11 | Clinton Colin Graham Walker | Automated interactive health care application for patient care |
USD726210S1 (en) | 2013-03-06 | 2015-04-07 | Livongo Health, Inc. | Glucometer with a battery charge screen graphical user interface |
USD728601S1 (en) | 2013-03-06 | 2015-05-05 | Livongo Health, Inc. | Glucometer with a data sharing screen graphical user interface |
USD726209S1 (en) | 2013-03-06 | 2015-04-07 | Kimon Angelides | Glucometer with an error screen graphical user interface |
US20140258037A1 (en) | 2013-03-11 | 2014-09-11 | American Well Corporation | Transparency in Processing of Wait Times for Brokered Engagements |
US9191510B2 (en) | 2013-03-14 | 2015-11-17 | Mattersight Corporation | Methods and system for analyzing multichannel electronic communication data |
US20140272897A1 (en) | 2013-03-14 | 2014-09-18 | Oliver W. Cummings | Method and system for blending assessment scores |
US20140378810A1 (en) | 2013-04-18 | 2014-12-25 | Digimarc Corporation | Physiologic data acquisition and analysis |
CN103226606B (en) | 2013-04-28 | 2016-08-10 | 浙江核新同花顺网络信息股份有限公司 | Inquiry choosing method and system |
JP2014219594A (en) | 2013-05-09 | 2014-11-20 | ソフトバンクモバイル株式会社 | Conversation processing system and program |
US10265012B2 (en) | 2013-05-20 | 2019-04-23 | Beyond Verbal Communication Ltd. | Method and system for determining a pre-multisystem failure condition using time integrated voice analysis |
US20160203729A1 (en) | 2015-01-08 | 2016-07-14 | Happify, Inc. | Dynamic interaction system and method |
US9750433B2 (en) | 2013-05-28 | 2017-09-05 | Lark Technologies, Inc. | Using health monitor data to detect macro and micro habits with a behavioral model |
US20140363797A1 (en) | 2013-05-28 | 2014-12-11 | Lark Technologies, Inc. | Method for providing wellness-related directives to a user |
US9427185B2 (en) * | 2013-06-20 | 2016-08-30 | Microsoft Technology Licensing, Llc | User behavior monitoring on a computerized device |
WO2014210210A1 (en) | 2013-06-25 | 2014-12-31 | Lark Technologies, Inc. | Method for classifying user motion |
US9536053B2 (en) | 2013-06-26 | 2017-01-03 | WellDoc, Inc. | Systems and methods for managing medication adherence |
US10204642B2 (en) | 2013-08-06 | 2019-02-12 | Beyond Verbal Communication Ltd | Emotional survey according to voice categorization |
CN105339926A (en) | 2013-08-06 | 2016-02-17 | 英特尔公司 | Emotion-related query processing |
WO2015023952A1 (en) | 2013-08-16 | 2015-02-19 | Affectiva, Inc. | Mental state analysis using an application programming interface |
US20150056595A1 (en) | 2013-08-23 | 2015-02-26 | The Curators Of The University Of Missouri | Systems and methods for diagnosis and treatment of psychiatric disorders |
EP3049961A4 (en) | 2013-09-25 | 2017-03-22 | Intel Corporation | Improving natural language interactions using emotional modulation |
US9298766B2 (en) | 2013-10-09 | 2016-03-29 | International Business Machines Corporation | Empathy injection for question-answering systems |
US10405786B2 (en) | 2013-10-09 | 2019-09-10 | Nedim T. SAHIN | Systems, environment and methods for evaluation and management of autism spectrum disorder using a wearable data collection device |
US9881136B2 (en) | 2013-10-17 | 2018-01-30 | WellDoc, Inc. | Methods and systems for managing patient treatment compliance |
US10561361B2 (en) | 2013-10-20 | 2020-02-18 | Massachusetts Institute Of Technology | Using correlation structure of speech dynamics to detect neurological changes |
US9420970B2 (en) | 2013-10-22 | 2016-08-23 | Mindstrong, LLC | Method and system for assessment of cognitive function based on mobile device usage |
US9474481B2 (en) | 2013-10-22 | 2016-10-25 | Mindstrong, LLC | Method and system for assessment of cognitive function based on electronic device usage |
US9396437B2 (en) | 2013-11-11 | 2016-07-19 | Mera Software Services, Inc. | Interface apparatus and method for providing interaction of a user with network entities |
US9361589B2 (en) | 2013-11-28 | 2016-06-07 | Akademia Gorniczo-Hutnicza Im. Stanislawa Staszira W. Krakowie | System and a method for providing a dialog with a user |
US20150154721A1 (en) * | 2013-12-02 | 2015-06-04 | Talksession, Inc. | System, apparatus and method for user to obtain service from professional |
US9413891B2 (en) | 2014-01-08 | 2016-08-09 | Callminer, Inc. | Real-time conversational analytics facility |
WO2015116678A1 (en) | 2014-01-28 | 2015-08-06 | Simple Emotion, Inc. | Methods for adaptive voice interaction |
US10567444B2 (en) | 2014-02-03 | 2020-02-18 | Cogito Corporation | Tele-communication system and methods |
WO2015123058A1 (en) | 2014-02-13 | 2015-08-20 | Omada Health, Inc. | Systems and methods for tracking participants in a health improvement program |
EP3109774A4 (en) | 2014-02-19 | 2017-11-01 | Teijin Limited | Information processing device and information processing method |
US20150242593A1 (en) | 2014-02-21 | 2015-08-27 | MAP Health Management, LLC | System and method for generating survey questions |
WO2015130457A1 (en) | 2014-02-25 | 2015-09-03 | Omada Health, Inc. | Systems and methods for displaying metrics associated with a health improvement program |
US9947342B2 (en) | 2014-03-12 | 2018-04-17 | Cogito Corporation | Method and apparatus for speech behavior visualization and gamification |
US9348812B2 (en) | 2014-03-14 | 2016-05-24 | Splice Software Inc. | Method, system and apparatus for assembling a recording plan and data driven dialogs for automated communications |
CN103905296A (en) | 2014-03-27 | 2014-07-02 | 华为技术有限公司 | Emotion information processing method and device |
GB2524583B (en) | 2014-03-28 | 2017-08-09 | Kaizen Reaux-Savonte Corey | System, architecture and methods for an intelligent, self-aware and context-aware digital organism-based telecommunication system |
US9230542B2 (en) | 2014-04-01 | 2016-01-05 | Zoom International S.R.O. | Language-independent, non-semantic speech analytics |
WO2015153127A1 (en) | 2014-04-04 | 2015-10-08 | Omada Health, Inc. | Systems and methods that administer a health improvement program and an adjunct medical treatment |
WO2015167652A1 (en) | 2014-04-29 | 2015-11-05 | Future Life, LLC | Remote assessment of emotional status of a person |
WO2015168606A1 (en) | 2014-05-02 | 2015-11-05 | The Regents Of The University Of Michigan | Mood monitoring of bipolar disorder using speech analysis |
US20150332021A1 (en) | 2014-05-15 | 2015-11-19 | ThoroughCare, Inc. | Guided Patient Interview and Health Management Systems |
US9508360B2 (en) | 2014-05-28 | 2016-11-29 | International Business Machines Corporation | Semantic-free text analysis for identifying traits |
WO2015191562A1 (en) | 2014-06-09 | 2015-12-17 | Revon Systems, Llc | Systems and methods for health tracking and management |
US9390706B2 (en) | 2014-06-19 | 2016-07-12 | Mattersight Corporation | Personality-based intelligent personal assistant system and methods |
WO2015198317A1 (en) * | 2014-06-23 | 2015-12-30 | Intervyo R&D Ltd. | Method and system for analysing subjects |
US9807559B2 (en) | 2014-06-25 | 2017-10-31 | Microsoft Technology Licensing, Llc | Leveraging user signals for improved interactions with digital personal assistant |
GB201411912D0 (en) | 2014-07-03 | 2014-08-20 | Realeyes O | Method of collecting computer user data |
US20160004299A1 (en) | 2014-07-04 | 2016-01-07 | Intelligent Digital Avatars, Inc. | Systems and methods for assessing, verifying and adjusting the affective state of a user |
CN105266775A (en) | 2014-07-11 | 2016-01-27 | 中兴通讯股份有限公司 | Method and system for acquiring physical and mental states of user and terminal |
US10874340B2 (en) | 2014-07-24 | 2020-12-29 | Sackett Solutions & Innovations, LLC | Real time biometric recording, information analytics and monitoring systems for behavioral health management |
WO2016028495A1 (en) | 2014-08-22 | 2016-02-25 | Sri International | Systems for speech-based assessment of a patient's state-of-mind |
WO2016031650A1 (en) | 2014-08-26 | 2016-03-03 | 東洋紡株式会社 | Method for assessing depressive state and device for assessing depressive state |
US20160063874A1 (en) | 2014-08-28 | 2016-03-03 | Microsoft Corporation | Emotionally intelligent systems |
US20160188792A1 (en) | 2014-08-29 | 2016-06-30 | Washington University In St. Louis | Methods and Compositions for the Detection, Classification, and Diagnosis of Schizophrenia |
USD754705S1 (en) | 2014-08-31 | 2016-04-26 | Livongo Health, Inc. | Glucometer display screen with a reminder screen graphical user interface |
USD754179S1 (en) | 2014-08-31 | 2016-04-19 | Livongo Health, Inc. | Glucometer display screen with a sound control graphical user interface |
US10052056B2 (en) | 2014-09-01 | 2018-08-21 | Beyond Verbal Communication Ltd | System for configuring collective emotional architecture of individual and methods thereof |
WO2016035070A2 (en) | 2014-09-01 | 2016-03-10 | Beyond Verbal Communication Ltd | Social networking and matching communication platform and methods thereof |
KR101641424B1 (en) | 2014-09-11 | 2016-07-20 | 엘지전자 주식회사 | Terminal and operating method thereof |
US20160086088A1 (en) | 2014-09-24 | 2016-03-24 | Raanan Yonatan Yehezkel | Facilitating dynamic affect-based adaptive representation and reasoning of user behavior on computing devices |
US10311869B2 (en) | 2014-10-21 | 2019-06-04 | Robert Bosch Gmbh | Method and system for automation of response selection and composition in dialog systems |
US9721004B2 (en) | 2014-11-12 | 2017-08-01 | International Business Machines Corporation | Answering questions via a persona-based natural language processing (NLP) system |
US20160164813A1 (en) | 2014-12-04 | 2016-06-09 | Intel Corporation | Conversation agent |
US9786299B2 (en) | 2014-12-04 | 2017-10-10 | Microsoft Technology Licensing, Llc | Emotion type classification for interactive dialog system |
US20160166191A1 (en) | 2014-12-12 | 2016-06-16 | King Fahd University Of Petroleum And Minerals | Process, system and computer program product for evaluating psychological status |
US10176163B2 (en) | 2014-12-19 | 2019-01-08 | International Business Machines Corporation | Diagnosing autism spectrum disorder using natural language processing |
US20160174889A1 (en) | 2014-12-20 | 2016-06-23 | Ziv Yekutieli | Smartphone text analyses |
US20160188822A1 (en) * | 2014-12-30 | 2016-06-30 | Cerner Innovation, Inc. | Clinical decision support rule generation and modification system and methods |
US10223459B2 (en) | 2015-02-11 | 2019-03-05 | Google Llc | Methods, systems, and media for personalizing computerized services based on mood and/or behavior information from multiple data sources |
US9930102B1 (en) | 2015-03-27 | 2018-03-27 | Intuit Inc. | Method and system for using emotional state data to tailor the user experience of an interactive software system |
GB2538698B (en) | 2015-04-02 | 2019-05-15 | Cambridge Cognition Ltd | Systems and methods for assessing cognitive function |
US20160322065A1 (en) | 2015-05-01 | 2016-11-03 | Smartmedical Corp. | Personalized instant mood identification method and system |
WO2016179428A2 (en) | 2015-05-05 | 2016-11-10 | Dart Neuroscience, Llc | Cognitive test execution and control |
US20180184963A1 (en) | 2015-05-19 | 2018-07-05 | Beyond Verbal Communication Ltd | System and method for improving emotional well-being by vagal nerve stimulation |
WO2016195474A1 (en) | 2015-05-29 | 2016-12-08 | Charles Vincent Albert | Method for analysing comprehensive state of a subject |
WO2016193839A1 (en) | 2015-06-03 | 2016-12-08 | Koninklijke Philips N.V. | System and method for generating an adaptive embodied conversational agent configured to provide interactive virtual coaching to a subject |
US10529328B2 (en) | 2015-06-22 | 2020-01-07 | Carnegie Mellon University | Processing speech signals in voice-based profiling |
US20170004269A1 (en) | 2015-06-30 | 2017-01-05 | BWW Holdings, Ltd. | Systems and methods for estimating mental health assessment results |
US9959328B2 (en) | 2015-06-30 | 2018-05-01 | Microsoft Technology Licensing, Llc | Analysis of user text |
US10438593B2 (en) | 2015-07-22 | 2019-10-08 | Google Llc | Individualized hotword detection models |
WO2017027709A1 (en) | 2015-08-11 | 2017-02-16 | Cognoa, Inc. | Methods and apparatus to determine developmental progress with artificial intelligence and user input |
WO2017024553A1 (en) | 2015-08-12 | 2017-02-16 | 浙江核新同花顺网络信息股份有限公司 | Information emotion analysis method and system |
US10127929B2 (en) | 2015-08-19 | 2018-11-13 | Massachusetts Institute Of Technology | Assessing disorders through speech and a computational model |
WO2017031350A1 (en) | 2015-08-19 | 2017-02-23 | Massachusetts Instutute Of Technology | Assessing disorders through speech and a computational model |
US10709371B2 (en) * | 2015-09-09 | 2020-07-14 | WellBrain, Inc. | System and methods for serving a custom meditation program to a patient |
US20170076630A1 (en) | 2015-09-11 | 2017-03-16 | LIvongo Heatlh, Inc. | Optimizing Messages Sent to Diabetic Patients in an Interactive System Based on Estimated HbA1c Levels |
WO2017048730A1 (en) | 2015-09-14 | 2017-03-23 | Cogito Corporation | Systems and methods for identifying human emotions and/or mental health states based on analyses of audio inputs and/or behavioral data collected from computing devices |
US20180077095A1 (en) | 2015-09-14 | 2018-03-15 | X Development Llc | Augmentation of Communications with Emotional Data |
WO2017048729A1 (en) | 2015-09-14 | 2017-03-23 | Cogito Corporation | Systems and methods for managing, analyzing, and providing visualizations of multi-party dialogs |
WO2017057022A1 (en) | 2015-09-28 | 2017-04-06 | デルタ工業株式会社 | Biological state estimation device, biological state estimation method, and computer program |
US10572626B2 (en) * | 2015-10-05 | 2020-02-25 | Ricoh Co., Ltd. | Advanced telemedicine system with virtual doctor |
EP3363348A4 (en) | 2015-10-15 | 2019-05-15 | Daikin Industries, Ltd. | Physiological state determination device and physiological state determination method |
CN113612677A (en) | 2015-10-20 | 2021-11-05 | 索尼公司 | Information processing system and information processing method |
PL414836A1 (en) | 2015-11-18 | 2017-05-22 | Assistech Spółka Z Ograniczona Odpowiedzialnością | Method and the system to assist the assessment of neurological condition and to assist carrying out neurological rehabilitation, preferably the cognitive and/or language functions |
WO2017085714A2 (en) | 2015-11-19 | 2017-05-26 | Beyond Verbal Communication Ltd | Virtual assistant for generating personal suggestions to a user based on intonation analysis of the user |
US20170143246A1 (en) | 2015-11-20 | 2017-05-25 | Gregory C Flickinger | Systems and methods for estimating and predicting emotional states and affects and providing real time feedback |
US10835168B2 (en) | 2016-11-15 | 2020-11-17 | Gregory Charles Flickinger | Systems and methods for estimating and predicting emotional states and affects and providing real time feedback |
US20170154637A1 (en) | 2015-11-29 | 2017-06-01 | International Business Machines Corporation | Communication pattern monitoring and behavioral cues |
US10325070B2 (en) | 2015-12-14 | 2019-06-18 | The Live Network Inc | Treatment intelligence and interactive presence portal for telehealth |
CN108780663B (en) | 2015-12-18 | 2022-12-13 | 科格诺亚公司 | Digital personalized medical platform and system |
CN106910513A (en) | 2015-12-22 | 2017-06-30 | 微软技术许可有限责任公司 | Emotional intelligence chat engine |
US10142483B2 (en) | 2015-12-22 | 2018-11-27 | Intel Corporation | Technologies for dynamic audio communication adjustment |
CN105681546A (en) | 2015-12-30 | 2016-06-15 | 宇龙计算机通信科技(深圳)有限公司 | Voice processing method, device and terminal |
US20170193171A1 (en) | 2016-01-05 | 2017-07-06 | Lyra Health, Inc. | Personalized multi-dimensional health care provider-patient matching |
US10268689B2 (en) | 2016-01-28 | 2019-04-23 | DISH Technologies L.L.C. | Providing media content based on user state detection |
US20170221336A1 (en) | 2016-01-28 | 2017-08-03 | Flex Ltd. | Human voice feedback system |
US10452816B2 (en) | 2016-02-08 | 2019-10-22 | Catalia Health Inc. | Method and system for patient engagement |
US10799186B2 (en) * | 2016-02-12 | 2020-10-13 | Newton Howard | Detection of disease conditions and comorbidities |
US20190172363A1 (en) | 2016-02-16 | 2019-06-06 | Nfactorial Analytical Sciences Pvt. Ltd | Real-time assessment of an emotional state |
US20170249434A1 (en) | 2016-02-26 | 2017-08-31 | Daniela Brunner | Multi-format, multi-domain and multi-algorithm metalearner system and method for monitoring human health, and deriving health status and trajectory |
US20170262609A1 (en) | 2016-03-08 | 2017-09-14 | Lyra Health, Inc. | Personalized adaptive risk assessment service |
US9711056B1 (en) | 2016-03-14 | 2017-07-18 | Fuvi Cognitive Network Corp. | Apparatus, method, and system of building and processing personal emotion-based computer readable cognitive sensory memory and cognitive insights for enhancing memorization and decision making skills |
US10548534B2 (en) | 2016-03-21 | 2020-02-04 | Sonde Health Inc. | System and method for anhedonia measurement using acoustic and contextual cues |
US10515629B2 (en) | 2016-04-11 | 2019-12-24 | Sonde Health, Inc. | System and method for activation of voice interactive services based on user state |
US11455985B2 (en) | 2016-04-26 | 2022-09-27 | Sony Interactive Entertainment Inc. | Information processing apparatus |
US11016534B2 (en) | 2016-04-28 | 2021-05-25 | International Business Machines Corporation | System, method, and recording medium for predicting cognitive states of a sender of an electronic message |
WO2017197333A1 (en) | 2016-05-13 | 2017-11-16 | WellDoc, Inc. | Database management and graphical user interfaces for measurements collected by analyzing blood |
WO2017199433A1 (en) | 2016-05-20 | 2017-11-23 | 三菱電機株式会社 | Information provision control device, navigation device, equipment inspection operation assistance device, interactive robot control device, and information provision control method |
CN106055537B (en) | 2016-05-23 | 2019-03-12 | 王立山 | A kind of natural language machine identification method and system |
US9792825B1 (en) | 2016-05-27 | 2017-10-17 | The Affinity Project, Inc. | Triggering a session with a virtual companion |
CN106649421A (en) | 2016-05-29 | 2017-05-10 | 陈勇 | Human-computer conversation platform |
US11011266B2 (en) | 2016-06-03 | 2021-05-18 | Lyra Health, Inc. | Health provider matching service |
JP6132378B1 (en) | 2016-06-09 | 2017-05-24 | 真由美 稲場 | A program that realizes a function that supports communication by understanding the other person's personality and preferences |
US10593349B2 (en) | 2016-06-16 | 2020-03-17 | The George Washington University | Emotional interaction apparatus |
US9818406B1 (en) | 2016-06-23 | 2017-11-14 | Intuit Inc. | Adjusting user experience based on paralinguistic information |
JP7006597B2 (en) | 2016-07-22 | 2022-01-24 | 日本電気株式会社 | Mental and physical condition measuring device, mental and physical condition measuring method, mental and physical condition measuring program and storage medium |
US20190151603A1 (en) | 2016-07-25 | 2019-05-23 | Pavel Pavlovich Horoshutin | Method of providing remote psychological aid |
US20190180859A1 (en) | 2016-08-02 | 2019-06-13 | Beyond Verbal Communication Ltd. | System and method for creating an electronic database using voice intonation analysis score correlating to human affective states |
US11116403B2 (en) | 2016-08-16 | 2021-09-14 | Koninklijke Philips N.V. | Method, apparatus and system for tailoring at least one subsequent communication to a user |
US10354009B2 (en) | 2016-08-24 | 2019-07-16 | Microsoft Technology Licensing, Llc | Characteristic-pattern analysis of text |
US10832684B2 (en) | 2016-08-31 | 2020-11-10 | Microsoft Technology Licensing, Llc | Personalization of experiences with digital assistants in communal settings through voice and query processing |
US10244975B2 (en) | 2016-10-17 | 2019-04-02 | Morehouse School Of Medicine | Mental health assessment method and kiosk-based system for implementation |
WO2018074996A1 (en) | 2016-10-17 | 2018-04-26 | Morehouse School Of Medicine | Mental health assessment method and kiosk-based system for implementation |
JP6761598B2 (en) | 2016-10-24 | 2020-09-30 | 富士ゼロックス株式会社 | Emotion estimation system, emotion estimation model generation system |
US10706964B2 (en) | 2016-10-31 | 2020-07-07 | Lyra Health, Inc. | Constrained optimization for provider groups |
US10475530B2 (en) | 2016-11-10 | 2019-11-12 | Sonde Health, Inc. | System and method for activation and deactivation of cued health assessment |
AU2017371391A1 (en) | 2016-12-05 | 2019-06-27 | Cogniant Pty Ltd | Mental health assessment system and method |
WO2018106481A1 (en) | 2016-12-09 | 2018-06-14 | Basil Leaf Technologies, Llc | Computer-implemented methods, systems, and computer-readable media for diagnosing a condition |
JP6371366B2 (en) | 2016-12-12 | 2018-08-08 | ダイキン工業株式会社 | Mental illness determination device |
US20180174055A1 (en) | 2016-12-19 | 2018-06-21 | Giridhar S. Tirumale | Intelligent conversation system |
CN107053191B (en) | 2016-12-31 | 2020-05-08 | 华为技术有限公司 | Robot, server and man-machine interaction method |
WO2018132483A1 (en) | 2017-01-10 | 2018-07-19 | Akili Interactive Labs, Inc. | Cognitive platform configured for determining the presence or likelihood of onset of a neuropsychological deficit or disorder |
US10037767B1 (en) | 2017-02-01 | 2018-07-31 | Wipro Limited | Integrated system and a method of identifying and learning emotions in conversation utterances |
US10497360B2 (en) | 2017-02-21 | 2019-12-03 | Sony Corporation | Personalized assistance system based on emotion |
CN106956271B (en) | 2017-02-27 | 2019-11-05 | 华为技术有限公司 | Predict the method and robot of affective state |
EP3379472A1 (en) | 2017-03-21 | 2018-09-26 | Koninklijke Philips N.V. | Method and apparatus for sending a message to a subject |
JP2018161703A (en) | 2017-03-24 | 2018-10-18 | 株式会社 ゼンショーホールディングス | Dialogue control device and robot control system |
JP2020099367A (en) | 2017-03-28 | 2020-07-02 | 株式会社Seltech | Emotion recognition device and emotion recognition program |
US10424288B2 (en) | 2017-03-31 | 2019-09-24 | Wipro Limited | System and method for rendering textual messages using customized natural voice |
CN107194151B (en) | 2017-04-20 | 2020-04-03 | 华为技术有限公司 | Method for determining emotion threshold value and artificial intelligence equipment |
US10250532B2 (en) | 2017-04-28 | 2019-04-02 | Microsoft Technology Licensing, Llc | Systems and methods for a personality consistent chat bot |
WO2018204935A1 (en) | 2017-05-05 | 2018-11-08 | Canary Speech, LLC | Medical assessment based on voice |
US20180329984A1 (en) | 2017-05-11 | 2018-11-15 | Gary S. Aviles | Methods and systems for determining an emotional condition of a user |
CN109564783A (en) | 2017-05-11 | 2019-04-02 | 微软技术许可有限责任公司 | Psychotherapy is assisted in automatic chatting |
WO2018212134A1 (en) | 2017-05-15 | 2018-11-22 | 株式会社Aikomi | Dementia care system |
US10623431B2 (en) | 2017-05-15 | 2020-04-14 | Forcepoint Llc | Discerning psychological state from correlated user behavior and contextual information |
US11170184B2 (en) | 2017-05-27 | 2021-11-09 | Mohan Dewan | Computer implemented system and method for automatically generating messages |
WO2018223172A1 (en) | 2017-06-06 | 2018-12-13 | Howarth Gail | A system and a method for determining psychological state of a person |
US10838967B2 (en) | 2017-06-08 | 2020-11-17 | Microsoft Technology Licensing, Llc | Emotional intelligence for a conversational chatbot |
US10665123B2 (en) | 2017-06-09 | 2020-05-26 | International Business Machines Corporation | Smart examination evaluation based on run time challenge response backed by guess detection |
US10276190B2 (en) | 2017-06-19 | 2019-04-30 | International Business Machines Corporation | Sentiment analysis of mental health disorder symptoms |
US11429833B2 (en) | 2017-06-19 | 2022-08-30 | Kyndryl, Inc. | Cognitive communication assistant services |
US20190013092A1 (en) | 2017-07-05 | 2019-01-10 | Koninklijke Philips N.V. | System and method for facilitating determination of a course of action for an individual |
US11315560B2 (en) | 2017-07-14 | 2022-04-26 | Cognigy Gmbh | Method for conducting dialog between human and computer |
TWI684160B (en) | 2017-07-31 | 2020-02-01 | 陳兆煒 | System and method for analysis demonstration |
US20190043606A1 (en) | 2017-08-04 | 2019-02-07 | Teladoc, Inc. | Patient-provider healthcare recommender system |
US20190050774A1 (en) | 2017-08-08 | 2019-02-14 | General Electric Company | Methods and apparatus to enhance emotional intelligence using digital technology |
US20190052724A1 (en) | 2017-08-14 | 2019-02-14 | Ivan Tumbocon Dancel | Systems and methods for establishing a safe online communication network and for alerting users of the status of their mental health |
WO2019035007A1 (en) | 2017-08-15 | 2019-02-21 | American Well Corporation | Methods and apparatus for remote camera control with intention based controls and machine learning vision state management |
US11082456B2 (en) | 2017-08-17 | 2021-08-03 | Avctechnologies Usa Inc. | Automated agent for a web communication feature |
CN108346436B (en) | 2017-08-22 | 2020-06-23 | 腾讯科技(深圳)有限公司 | Voice emotion detection method and device, computer equipment and storage medium |
US20200293563A1 (en) | 2017-08-28 | 2020-09-17 | Sony Corporation | Information processor and information processing method |
US11004461B2 (en) | 2017-09-01 | 2021-05-11 | Newton Howard | Real-time vocal features extraction for automated emotional or mental state assessment |
US20190079916A1 (en) | 2017-09-11 | 2019-03-14 | International Business Machines Corporation | Using syntactic analysis for inferring mental health and mental states |
RU2673010C1 (en) | 2017-09-13 | 2018-11-21 | Дмитрий Владимирович Истомин | Method for monitoring behavior of user during their interaction with content and system for its implementation |
US20180018634A1 (en) | 2017-09-14 | 2018-01-18 | Altimetrik Corp. | Systems and methods for assessing an individual in a computing environment |
EP3681678A4 (en) | 2017-09-18 | 2020-11-18 | Samsung Electronics Co., Ltd. | Method for dynamic interaction and electronic device thereof |
US10410655B2 (en) | 2017-09-18 | 2019-09-10 | Fujitsu Limited | Estimating experienced emotions |
US20200261013A1 (en) | 2017-09-27 | 2020-08-20 | Ilan Ben-Oren | Cognitive and physiological monitoring and analysis for correlation for management of cognitive impairment related conditions |
US20190102511A1 (en) | 2017-10-02 | 2019-04-04 | Blackthorn Therapeutics, Inc. | Methods and tools for detecting, diagnosing, predicting, prognosticating, or treating a neurobehavioral phenotype in a subject |
WO2019069955A1 (en) | 2017-10-03 | 2019-04-11 | 株式会社国際電気通信基礎技術研究所 | Differentiation device, differentiation method for depression symptoms, determination method for level of depression symptoms, stratification method for depression patients, determination method for effects of treatment of depression symptoms, and brain activity training device |
US10516701B2 (en) | 2017-10-05 | 2019-12-24 | Accenture Global Solutions Limited | Natural language processing artificial intelligence network and data security system |
WO2019068203A1 (en) | 2017-10-06 | 2019-04-11 | Dynamicly Inc. | System and method for a hybrid conversational and graphical user interface |
EP3697288A4 (en) | 2017-10-12 | 2021-11-10 | Moon, Jorlin | Systems and methods for measuring, quantifying, displaying and/or otherwise handling/reporting health status data and/or risks via self-directed health screening, information, and processing information regarding associated professional advice |
CA3021197A1 (en) | 2017-10-17 | 2019-04-17 | Royal Bank Of Canada | Auto-teleinterview solution |
KR20200074951A (en) | 2017-10-17 | 2020-06-25 | 새티쉬 라오 | Machine learning-based system for identification and monitoring of nervous system disorders |
US20190122661A1 (en) | 2017-10-23 | 2019-04-25 | GM Global Technology Operations LLC | System and method to detect cues in conversational speech |
GB2567826B (en) | 2017-10-24 | 2023-04-26 | Cambridge Cognition Ltd | System and method for assessing physiological state |
US20190130077A1 (en) | 2017-11-01 | 2019-05-02 | The Curators Of The University Of Missouri | Sensor system and method for cognitive health assessment |
WO2019098423A1 (en) | 2017-11-17 | 2019-05-23 | 라인 가부시키가이샤 | Method and system for identifying conversation flow of message, and non-transitory computer-readable recording medium |
US11663182B2 (en) | 2017-11-21 | 2023-05-30 | Maria Emma | Artificial intelligence platform with improved conversational ability and personality development |
WO2019103484A1 (en) | 2017-11-24 | 2019-05-31 | 주식회사 제네시스랩 | Multi-modal emotion recognition device, method and storage medium using artificial intelligence |
US10818396B2 (en) | 2017-12-09 | 2020-10-27 | Jane Doerflinger | Method and system for natural language processing for the evaluation of pathological neurological states |
US10424186B2 (en) | 2017-12-28 | 2019-09-24 | Sony Corporation | System and method for customized message playback |
US11024294B2 (en) | 2017-12-29 | 2021-06-01 | DMAI, Inc. | System and method for dialogue management |
US11222632B2 (en) | 2017-12-29 | 2022-01-11 | DMAI, Inc. | System and method for intelligent initiation of a man-machine dialogue based on multi-modal sensory inputs |
US20190206402A1 (en) | 2017-12-29 | 2019-07-04 | DMAI, Inc. | System and Method for Artificial Intelligence Driven Automated Companion |
WO2019133698A1 (en) | 2017-12-29 | 2019-07-04 | DMAI, Inc. | System and method for personalizing dialogue based on user's appearances |
SG11202004014QA (en) | 2017-12-30 | 2020-05-28 | Kaha Pte Ltd | Method and system for monitoring emotions |
US20180204107A1 (en) | 2018-03-14 | 2018-07-19 | Christopher Allen Tucker | Cognitive-emotional conversational interaction system |
JP7165207B2 (en) | 2018-05-01 | 2022-11-02 | ブラックソーン セラピューティクス インコーポレイテッド | machine learning based diagnostic classifier |
WO2019246239A1 (en) | 2018-06-19 | 2019-12-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US20190385711A1 (en) | 2018-06-19 | 2019-12-19 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US20200321082A1 (en) | 2019-04-05 | 2020-10-08 | Ellipsis Health, Inc. | Confidence evaluation to measure trust in behavioral health survey results |
US10902351B1 (en) * | 2019-08-05 | 2021-01-26 | Kpn Innovations, Llc | Methods and systems for using artificial intelligence to analyze user activity data |
-
2019
- 2019-06-19 WO PCT/US2019/037953 patent/WO2019246239A1/en unknown
- 2019-06-19 EP EP19823035.1A patent/EP3811245A4/en active Pending
- 2019-06-19 JP JP2020571611A patent/JP2021529382A/en active Pending
-
2020
- 2020-12-21 US US17/129,859 patent/US11120895B2/en active Active
- 2020-12-22 US US17/130,649 patent/US20210110895A1/en not_active Abandoned
Cited By (42)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11298062B2 (en) * | 2017-02-01 | 2022-04-12 | Conflu3Nce Ltd | Multi-purpose interactive cognitive platform |
US12020820B1 (en) | 2017-03-03 | 2024-06-25 | Cerner Innovation, Inc. | Predicting sphingolipidoses (fabry's disease) and decision support |
US11335461B1 (en) * | 2017-03-06 | 2022-05-17 | Cerner Innovation, Inc. | Predicting glycogen storage diseases (Pompe disease) and decision support |
US11923048B1 (en) | 2017-10-03 | 2024-03-05 | Cerner Innovation, Inc. | Determining mucopolysaccharidoses and decision support tool |
US20210383921A1 (en) * | 2018-01-26 | 2021-12-09 | Hitachi High-Tech Solutions Corporation | Controlling devices to achieve medical outcomes |
US11538583B2 (en) * | 2018-01-26 | 2022-12-27 | Hitachi High-Tech Solutions Corporation | Controlling devices to achieve medical outcomes |
US11942194B2 (en) | 2018-06-19 | 2024-03-26 | Ellipsis Health, Inc. | Systems and methods for mental health assessment |
US11942086B2 (en) * | 2018-09-27 | 2024-03-26 | Panasonic Intellectual Property Management Co., Ltd. | Description support device and description support method |
US20210104240A1 (en) * | 2018-09-27 | 2021-04-08 | Panasonic Intellectual Property Management Co., Ltd. | Description support device and description support method |
US11562135B2 (en) * | 2018-10-16 | 2023-01-24 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US11720749B2 (en) | 2018-10-16 | 2023-08-08 | Oracle International Corporation | Constructing conclusive answers for autonomous agents |
US20220039741A1 (en) * | 2018-12-18 | 2022-02-10 | Szegedi Tudományegyetem | Automatic Detection Of Neurocognitive Impairment Based On A Speech Sample |
US11861319B2 (en) | 2019-02-13 | 2024-01-02 | Oracle International Corporation | Chatbot conducting a virtual social dialogue |
US11854538B1 (en) * | 2019-02-15 | 2023-12-26 | Amazon Technologies, Inc. | Sentiment detection in audio data |
US20200320414A1 (en) * | 2019-04-02 | 2020-10-08 | Kpn Innovations, Llc. | Artificial intelligence advisory systems and methods for vibrant constitutional guidance |
US11763944B2 (en) * | 2019-05-10 | 2023-09-19 | Tencent America LLC | System and method for clinical decision support system with inquiry based on reinforcement learning |
US20200357515A1 (en) * | 2019-05-10 | 2020-11-12 | Tencent America LLC | System and method for clinical decision support system with inquiry based on reinforcement learning |
US20210295172A1 (en) * | 2020-03-20 | 2021-09-23 | International Business Machines Corporation | Automatically Generating Diverse Text |
US11741371B2 (en) * | 2020-03-20 | 2023-08-29 | International Business Machines Corporation | Automatically generating diverse text |
US11889398B2 (en) * | 2020-05-06 | 2024-01-30 | Kant AI Solutions LLC | Artificial intelligence for emergency assistance with human feedback for machine learning |
US20220353664A1 (en) * | 2020-05-06 | 2022-11-03 | Kant AI Solutions LLC | Artificial intelligence for emergency assistance with human feedback for machine learning |
US11395124B2 (en) * | 2020-05-06 | 2022-07-19 | Kant AI Solutions LLC | Artificial intelligence for emergency assistance |
US20220015687A1 (en) * | 2020-07-15 | 2022-01-20 | Seoul National University R&Db Foundation | Method for Screening Psychiatric Disorder Based On Conversation and Apparatus Therefor |
US11487891B2 (en) * | 2020-10-14 | 2022-11-01 | Philip Chidi Njemanze | Method and system for mental performance computing using artificial intelligence and blockchain |
US20220114273A1 (en) * | 2020-10-14 | 2022-04-14 | Philip Chidi Njemanze | Method and System for Mental Performance Computing Using Artificial Intelligence and Blockchain |
US20220165390A1 (en) * | 2020-11-20 | 2022-05-26 | Blue Note Therapeutics, Inc. | Digital therapeutic for treatment of psychological aspects of an oncological condition |
US20220181004A1 (en) * | 2020-12-08 | 2022-06-09 | Happify Inc. | Customizable therapy system and process |
US20220207392A1 (en) * | 2020-12-31 | 2022-06-30 | International Business Machines Corporation | Generating summary and next actions in real-time for multiple users from interaction records in natural language |
US20220223241A1 (en) * | 2021-01-11 | 2022-07-14 | juli, Inc. | Methods and systems for generating personalized recommendations and predictions of a level of effectiveness of the personalized recommendations for a user |
US20220246011A1 (en) * | 2021-02-03 | 2022-08-04 | NC Seven Mountains, LLC | Methods, devices, and systems for round-the-clock health and wellbeing monitoring of incarcerated individuals and/or individuals under twenty-four-hour-seven-day-a-week (24/7) supervision |
WO2022272147A1 (en) * | 2021-06-24 | 2022-12-29 | The Regents Of The University Of California | Artificial intelligence modeling for multi-linguistic diagnostic and screening of medical disorders |
US20230018077A1 (en) * | 2021-07-13 | 2023-01-19 | Canon Medical Systems Corporation | Medical information processing system, medical information processing method, and storage medium |
WO2023096867A1 (en) * | 2021-11-23 | 2023-06-01 | Compass Pathfinder Limited | Intelligent transcription and biomarker analysis |
US20230162835A1 (en) * | 2021-11-24 | 2023-05-25 | Wendy B. Ward | System and Method for Collecting and Analyzing Mental Health Data Using Computer Assisted Qualitative Data Analysis Software |
US11559232B1 (en) | 2022-02-27 | 2023-01-24 | King Abdulaziz University | GRU based real-time mental stress assessment |
US20230317274A1 (en) * | 2022-03-31 | 2023-10-05 | Matrixcare, Inc. | Patient monitoring using artificial intelligence assistants |
WO2023235527A1 (en) * | 2022-06-03 | 2023-12-07 | aiberry, Inc. | Multimodal (audio/text/video) screening and monitoring of mental health conditions |
WO2023235564A1 (en) * | 2022-06-03 | 2023-12-07 | aiberry, Inc. | Multimodal (audio/text/video) screening and monitoring of mental health conditions |
WO2024026272A1 (en) * | 2022-07-26 | 2024-02-01 | Compass Pathfinder Limited | Predicting response to psilocybin therapy for treatment resistant depression |
WO2024068953A1 (en) * | 2022-09-30 | 2024-04-04 | Presage | System for remote monitoring of a potentially elderly individual in an everyday environment |
WO2024130331A1 (en) * | 2022-12-22 | 2024-06-27 | Redenlab Pty. Ltd. | "systems and methods for assessing brain health" |
US20240223705A1 (en) * | 2022-12-28 | 2024-07-04 | Motorola Solutions, Inc. | Device, system, and method to initiate electronic actions on calls and manage call-taking resources |
Also Published As
Publication number | Publication date |
---|---|
EP3811245A1 (en) | 2021-04-28 |
JP2021529382A (en) | 2021-10-28 |
US20210110894A1 (en) | 2021-04-15 |
EP3811245A4 (en) | 2022-03-09 |
WO2019246239A1 (en) | 2019-12-26 |
US11120895B2 (en) | 2021-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11942194B2 (en) | Systems and methods for mental health assessment | |
US11120895B2 (en) | Systems and methods for mental health assessment | |
US20220328064A1 (en) | Acoustic and natural language processing models for speech-based screening and monitoring of behavioral health conditions | |
US11545173B2 (en) | Automatic speech-based longitudinal emotion and mood recognition for mental health treatment | |
US11881221B2 (en) | Health monitoring system and appliance | |
Schuller et al. | A review on five recent and near-future developments in computational processing of emotion in the human voice | |
US20230052573A1 (en) | System and method for autonomously generating personalized care plans | |
US20210345925A1 (en) | A data processing system for detecting health risks and causing treatment responsive to the detection | |
Aloshban et al. | What you say or how you say it? depression detection through joint modeling of linguistic and acoustic aspects of speech | |
Constâncio et al. | Deception detection with machine learning: A systematic review and statistical analysis | |
US20240087752A1 (en) | Systems and methods for multi-language adaptive mental health risk assessment from spoken and written language | |
US20210110924A1 (en) | System and method for monitoring system compliance with measures to improve system health | |
US11547345B2 (en) | Dynamic neuropsychological assessment tool | |
US20230148945A1 (en) | Dynamic neuropsychological assessment tool | |
WO2020206178A1 (en) | Dialogue timing control in health screening dialogues for improved modeling of responsive speech | |
CN117412702A (en) | System and method for psychological treatment using artificial intelligence | |
KR20230047104A (en) | Digital Devices and Applications for the Treatment of Social Communication Disorders | |
Assan et al. | Machine learning for mental health detection | |
US20240355470A1 (en) | System for condition tracking and management and a method thereof | |
Hesson | Medically speaking: Co-variation as stylistic clustering within physician recommendations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |