WO2023054632A1 - 嚥下障害の判定装置および判定方法 - Google Patents
嚥下障害の判定装置および判定方法 Download PDFInfo
- Publication number
- WO2023054632A1 WO2023054632A1 PCT/JP2022/036558 JP2022036558W WO2023054632A1 WO 2023054632 A1 WO2023054632 A1 WO 2023054632A1 JP 2022036558 W JP2022036558 W JP 2022036558W WO 2023054632 A1 WO2023054632 A1 WO 2023054632A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dysphagia
- analysis
- determination device
- subject
- speech
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/103—Measuring devices for testing the shape, pattern, colour, size or movement of the body or parts thereof, for diagnostic purposes
- A61B5/11—Measuring movement of the entire body or parts thereof, e.g. head or hand tremor or mobility of a limb
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/09—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being zero crossing rates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/15—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being formant information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/18—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/66—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
Definitions
- This disclosure relates to a device and method for determining dysphagia (excluding medical practice).
- dysphagia a condition in which the ability to swallow is weakened, increases the risk of suffocation, which causes choking due to difficulty in eating, and aspiration, in which swallowed water or food enters the trachea. If dysphagia is left untreated in this way, malnutrition continues, leading to frailty and requiring nursing care, and may even lead to life crises such as suffocation and aspiration pneumonia. Therefore, it is desirable to determine dysphagia early and take appropriate measures.
- US Pat. No. 6,200,003 provides a device for determining the presence or absence of dysphagia by positioning a biaxial accelerometer on the neck of a subject.
- Patent Document 1 imposes a physical burden on the subject, such as attaching an accelerometer to the subject's neck with double-sided tape.
- a heavy burden on the person in charge of testing the subject such as ensuring that the mounting position of the sensor device does not deviate.
- a device that can more easily determine dysphagia such as reducing the burden on subjects and those in charge of examination.
- the present disclosure has been made in view of such circumstances, and aims to provide a determination device that can more easily determine dysphagia.
- the inventors discovered that there is a correlation between the speech data of the subject's speech and the degree of progression of the subject's dysphagia. In addition, it was discovered that there is a significant difference in predetermined acoustic feature values between groups classified based on the degree of progression of dysphagia. Based on these findings, the inventors succeeded in creating a device for determining dysphagia based on speech analysis. That is, the details of the determination device of the present disclosure are as follows.
- a determination device for determining dysphagia by voice analysis comprising: Input means for inputting voice data uttered by the subject; analysis means for analyzing voice data input by the input means; Determination means for determining dysphagia of the subject based on the analysis result by the analysis means;
- a determination device comprising:
- the determination device performs the input of the voice data of the subject at least twice using the same phrase, and based on the difference or average value of the analysis results analyzed by the same phrase, dysphagia
- the determination device according to any one of [1] to [4], which determines the degree of progress of
- the input means allows the subject to The determination device according to any one of [1] to [6], which inputs speech data including at least one sound of
- a method for determining dysphagia by voice analysis comprising: an input step of inputting voice data uttered by the subject; an analysis step of analyzing the voice data input in the input step; a determination step of determining dysphagia of the subject based on the analysis results of the analysis step; A method.
- the determination device determines dysphagia by voice analysis. Further, it is possible to provide a determination method for determining dysphagia by voice analysis. Furthermore, the determination device can determine not only the presence or absence of dysphagia in the subject, but also the degree of progression of dysphagia.
- FIG. 1 is a diagram for explaining the configuration of a determination device.
- FIG. 2 is a flow chart showing an example of processing executed by the determination device.
- FIG. 3 is a diagram showing an example of audio data input to the input/output unit 12.
- FIG. 4 is an ROC curve showing the results of one example of the present invention.
- FIG. 5 is an ROC curve showing the results of one example of the present invention.
- FIG. 6 is a box and whisker diagram showing the results of one example of the present invention.
- FIG. 7 is a diagram showing variations in formant analysis values when the same utterance content is uttered a plurality of times.
- FIG. 8 is a diagram showing the relationship between the value of formant analysis and the degree of progression of dysphagia.
- FIG. 8 is a diagram showing the relationship between the value of formant analysis and the degree of progression of dysphagia.
- FIG. 9A is a diagram showing the relationship between the value of 13th-order Mel frequency cepstrum analysis and the degree of progression of dysphagia.
- FIG. 9B is a diagram showing the relationship between the value of the 13th-order Mel-frequency cepstrum analysis and the degree of progression of dysphagia.
- FIG. 10 is a diagram showing the relationship between phoneme classes and phonemes to be activated.
- FIG. 11 shows the results of phoneme analysis of "pa" uttered by a healthy person without dysphagia and "pa” uttered by a person with moderate or higher dysphagia.
- FIG. 12 is a box-and-whisker diagram for verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 12 is a box-and-whisker diagram for verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 13 is a box-and-whisker diagram for verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 14 is a box-and-whisker diagram verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 15 is a box-and-whisker diagram verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 16 is a box-and-whisker diagram for verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 17 is a box-and-whisker diagram for verifying the classification performance according to the severity of dysphagia using acoustic features.
- FIG. 18 is a diagram showing audio data.
- FIG. 19 shows experimental results showing the correlation between the speech intensity analysis results and the dysphagia evaluation results.
- FIG. 20 shows the result of a box-and-whisker diagram of each acoustic feature quantity.
- FIG. 21 shows the result of a box-and-whisker diagram of each acoustic feature quantity.
- FIG. 22 shows the result of the ROC curve when dysphagia of mild or higher severity is determined by a machine learning model using an acoustic feature quantity set created including the method of phoneme analysis and strength analysis.
- FIG. 23 shows the result of the ROC curve when dysphagia of moderate or higher severity was determined by a machine learning model using an acoustic feature quantity set created including the method of phoneme analysis and strength analysis.
- the dysphagia determination device of the present disclosure includes analysis means and determination means as main components.
- the analysis means performs acoustic analysis using an acoustic feature quantity (hereinafter sometimes referred to as "F(a)”) that can analyze the degree of progression of dysphagia.
- F(a) acoustic feature quantity
- the determination means may be subjected to machine learning processing such that, when the analysis result acquired by the analysis means is input, the degree of progression of dysphagia is output.
- the configuration of the determination device (hereinafter sometimes referred to as "information processing device") will be described using FIG.
- the information processing apparatus 10 includes a control unit 11 that controls the overall operation, an input/output unit 12 that performs various inputs and outputs, a storage unit 13 that stores various data and programs, a communication unit 14 that communicates with the outside, and a It has an internal bus 15 that connects the blocks so that they can communicate with each other.
- the information processing device 10 is, for example, a computer, and may be a device that can be carried by the subject, such as a smartphone, PDA, tablet, or laptop computer, or a computer that is fixed at an installation position without being carried by the subject. good.
- PDA is an abbreviation for Personal Digital Assistant.
- the control unit 11 is a device called, for example, a CPU, MCU, or MPU, and executes programs stored in the storage unit 13, for example.
- CPU is an abbreviation for Central Processing Unit.
- MCU is an abbreviation for Micro Controller Unit.
- MPU is an abbreviation for Micro Processor Unit.
- the input/output unit 12 is a device that performs input/output with respect to the subject who operates the information processing device 10 .
- the input/output unit 12 inputs and outputs information and signals using a display, keyboard, mouse, button, touch panel, printer, microphone, speaker, and the like.
- the input/output unit 12 functions at least as a microphone, and inputs audio data through this microphone.
- the input/output unit 12 serves at least as a display, and displays the determination result of dysphagia, which will be described later, on this display.
- the storage unit 13 is, for example, a device such as ROM, RAM, HDD, or flash memory, and stores programs to be executed by the control unit 11 and various data.
- ROM is an abbreviation for Read Only Memory.
- RAM is an abbreviation for Random Access Memory.
- HDD is an abbreviation for Hard Disk Drive.
- the communication unit 16 communicates with the outside. Communication by the communication unit 16 may be wired communication or wireless communication. Any communication method may be used for communication by the communication unit 16 .
- the control unit 11 can transmit and receive various data such as voice data through the communication unit 16 .
- the control unit 11 may transmit the determination result of dysphagia, which will be described later, to the external device through the communication unit 16 .
- step S ⁇ b>201 the control unit 11 inputs voice data of the subject through the input/output unit 12 .
- step S202 the calculation unit (or analysis unit) calculates the acoustic feature amount from the voice data.
- step S203 the estimation unit (or determination unit) estimates (or determines) the presence or absence of dysphagia and the degree of progression.
- the estimation result (or determination result) is output to the input/output unit 12, and the flow ends.
- the input/output unit 12 in step S201 may use a microphone.
- the subject speaks into the microphone and inputs voice data. Audio data recorded in advance may be used.
- phrases selected for voice input are phrases suitable for voice analysis of the degree of progress of dysphagia of the subject. As the dysphagia progresses, the degree of movement of the tongue, the position of the tongue in the front and back, the degree of opening of the jaw, the state of occlusion of the teeth, the number of teeth, the amount of saliva secreted, the weakening of the muscles, etc. The condition of the resonance of sound inside is also affected.
- a phrase that is suitable for analyzing the degree of progression of dysphagia is a phrase that facilitates discovering, for example, the degree of resonance of sound that changes as the dysphagia progresses.
- FIG. 3 shows an example of audio data input to the input/output unit 12. As shown in FIG. Figure 3 shows This is an example of a compilation of phrases including the pronunciation of , so the voice input of the present disclosure is not limited to these phrases.
- the subject can select at least one of Phrase01 (Ph01) to Phrase10 (Ph10) shown in FIG. 3 as a phrase to be input by voice. Of course, some of Phrase01 (Ph01) to Phrase10 (Ph10) may be combined for voice input.
- Ph01 is voice data for uttering "Pa”.
- Ph02 is voice data for uttering "ma”.
- Ph03 is voice data for uttering "ta”.
- Ph04 is voice data for uttering "ra”.
- Ph05 is voice data for uttering "ka”.
- Ph06 is voice data for uttering "Go”.
- Ph07 is voice data for uttering "Panda's Treasure”.
- Ph08 is voice data for uttering "egg”.
- Ph09 is voice data in which "banana banana banana banana banana banana banana” is repeatedly uttered five times or more as fast as possible.
- Ph10 is speech data in which "kimono glyphono kimono kimono kimono glyphono glyphono" is repeatedly uttered five times or more as quickly as possible. The relationship between phrases Ph01 to Ph10 and dysphagia will be further described.
- Ph01 and Ph02 are pronunciations of ⁇ pa'' and ⁇ ma'', which require movement to close the lips. related to transport of food during swallowing by increasing the
- Ph03 and Ph04 are pronunciations of "ta” and "ra".
- Tona is a movement using the tip of the tongue, and as a swallowing function, it is related to the function of mastication and feeding movement (movement to move water and food in the mouth to the back of the throat).
- Ra requires the tip of the tongue to move relatively smoothly, and it is a sound in which the smoothness of the tongue movement can be seen. It uses the tip of the tongue in the same way as “ta”, and is related to the function of mastication and the feeding action (move the water and food in the mouth to the back of the throat).
- Ph05 and Ph06 are pronunciations of "ka” and "go". Both are movements that use the back of the tongue, and as swallowing functions, they perform feeding movements and movements that increase intrapharyngeal pressure, and are related to transporting food.
- Ph07 Treasure of the Panda
- Ph08 Egg
- Ph09 banana banana banana banana banana
- Ph10 kimono kimono kimono kimono kimono kimono kimono kimono kimono
- the above phrases are voice-inputted to acquire voice data, and voice analysis is performed in step S202.
- Acoustic features are calculated in a calculation unit (analysis unit) during speech analysis. The acoustic feature amount will be described in detail below.
- the acoustic feature quantity F(a) can be expressed by the following formula.
- g is a linear or nonlinear model that determines the presence or absence of dysphagia and the degree of progression
- x n is a coefficient specific to the phrase input as voice data
- f (n) is an acoustic parameter, One or more selected from the group consisting of formant frequency, mel frequency cepstrum, frequency spectrum, speech envelope, waveform variation information, zero crossing rate, Hurst exponent, and time from closure-opening to onset of vocal fold vibration .
- the mean value or difference can be included
- the variation (variance or standard deviation) or median can be included.
- the acoustic feature amounts have a large difference in numerical values, each may be normalized.
- the feature amount may be divided into two or more.
- the types of acoustic parameters are as follows.
- Arbitrary formant frequency (first formant, second formant, third formant, fourth formant, 5th formant, 6th formant, etc.) within utterance distribution statistics (1st quartile, median, 3rd quartile, 95th percentile, 98th percentile, arithmetic mean, geometric mean , the difference between the 3rd quartile and the median, etc.)
- Arbitrary formant frequencies (1st formant, 2nd formant, 3rd formant, 4th formant, 5th formant, 6th formant, .
- step S203 determination processing is executed in step S203.
- An example of determination using the above acoustic feature amount will be described with reference to FIGS. 4 to 6.
- Example 1 In FIG. 4, a specific program was created using 7 of the acoustic parameters (1) to (14) described above, using voice data in which the subject read out the 10 types of phrases shown in FIG. 3 twice each.
- dysphagia it is a ROC curve that verifies the classification performance of the presence or absence of dysphagia.
- the horizontal axis indicates "1-specificity" and the vertical axis indicates sensitivity.
- AUC was 0.941, confirming sufficient classification performance.
- Example 2 In FIG. 5, the voice data obtained by reading out the 10 types of phrases shown in FIG. It is an ROC curve that verifies the classification performance regarding the degree of progression of dysphagia (whether or not dysphagia is moderate or severe) of a specific program created using the average value of the calculated values.
- the horizontal axis indicates "1-specificity" and the vertical axis indicates sensitivity.
- AUC was 0.981, confirming sufficient classification performance.
- the presence or absence of dysphagia is first determined using the program according to FIG.
- the group can also be further evaluated for the degree of dysphagia progress (mild or moderate or more) using the program according to FIG.
- FIG. 7 is a diagram showing the relationship between the results of formant analysis of speech data and the degree of progression of dysphagia of the subject, and variations when the same utterance content is uttered a plurality of times.
- voice data of Ph07 "Panda's Treasure” is used as the utterance content.
- the horizontal axis is the time axis for the utterances of subjects who are healthy, have mild dysphagia, and have moderate or higher dysphagia
- the vertical axis is the value of the first formant f1.
- each is grouped into multiple utterances, and the order is plotted along the time axis.
- step S201 determines whether the current subject is a healthy subject, a person with mild dysphagia, or a person with moderate or higher dysphagia. You can determine if you are human.
- f3 or f5 of a healthy subject is stored in advance, and if f3 or f5 of a subject deviates from f3 or f5 of a healthy subject by a threshold value or more, the subject has dysphagia. It can be determined that there is In FIG. 7, only f1 of Ph07 is shown, but other voice data and other formant analysis results (for example, any of f2 to f5 other than f1) and other frequency analyzes can also be validated. . Also, in FIG. 7, the difference in the value of f1 between two utterances is targeted, but other number of utterances can also be used. For example, the degree of dysphagia (presence or absence of dysphagia and its severity) may be determined from the difference between the maximum value and minimum value of the formant analysis results of three or more utterances.
- FIG. 8 is a table in which each speech data shown in FIG. 3 is compared with the value of the acoustic feature value based on the formant frequency, and the degree of progression of dysphagia is compared. Items in the horizontal direction indicate items of "healthy vs. mild,”"healthy vs. moderate or higher,” and “mild vs. moderate or higher.” Items in the vertical direction indicate the content of the utterance. In the table, "***” indicates P value ⁇ 0.01, “**” indicates P value ⁇ 0.03333, “*” indicates P value ⁇ 0.05, and "ns" indicates no significant difference. each shown. In the present application, it is considered that there is a significant difference if the P value is less than 0.1.
- f1 is the first formant
- f2 is the second formant
- f3 is the third formant
- f4 is the fourth formant
- f5 is the fifth formant.
- t-test unpaired, one-sided
- Bonferroni's multiple comparison test is used for multiple comparison test of 3 or more groups, and the significance level is 10%. It was set. Any of the parametric test including the t-test used this time, the non-parametric test, the test based on the ratio, the test based on the variance ratio, and the like may be used for the evaluation of the significant difference by the group comparison.
- Ph02 showed a significant difference in f4 and f5 in the "healthy vs. mild" item. In addition, a significant difference was shown between f3 and f5 in the item "healthy versus moderate or higher”. In addition, a significant difference was shown in f3 and f4 in the item of "mild vs. moderate or higher".
- Ph03 showed a significant difference in f3 in the "healthy vs. mild" item.
- a significant difference was shown in f1 and f2 in the item of "healthy versus moderate or higher”.
- a significant difference was shown in f1 to f4 in the item of "mild vs. moderate or more”.
- Ph04 showed a significant difference between f2 and f4 in the item "healthy vs moderate or higher”. In addition, a significant difference was shown in f1 to f4 in the item of "mild vs. moderate or more". For Ph04, none of the formants showed a significant difference in the item “Healthy vs. Mild", but it is possible to judge "Healthy vs. Mild” by comparing the other two groups.
- Ph05 showed a significant difference in f2 and f5 in the "healthy vs. mild" item.
- a significant difference was shown in f1 in the item of "healthy vs. moderate or higher”.
- a significant difference was shown for f1 and f2 in the item of "mild vs. moderate or higher”.
- Ph06 showed a significant difference in f3 and f5 in the "healthy vs. mild" item. In addition, a significant difference was shown in f5 in the item "healthy vs. moderate or higher”. Ph06 showed no significant difference in any of the formants in the item of "mild vs. moderate or higher", but it is possible to judge “mild vs. moderate or higher” by comparing the other two groups.
- Ph07 showed a significant difference in f3 and f5 in the "healthy vs. mild" item.
- a significant difference was shown in f1 in the item of "healthy vs. moderate or higher”.
- a significant difference was shown in f1 and f3 in the item of "mild vs. moderate or higher”.
- Ph08 showed a significant difference in f1, f3, and f5 in the "healthy vs. mild" item. In addition, a significant difference was shown in f1 in the item of "healthy vs. moderate or higher”. In addition, a significant difference was shown in f1, f3, and f4 in the item of "mild vs. moderate or higher.”
- Ph09 showed a significant difference between f2 and f4 in the "healthy vs. mild" item. In addition, significant differences were shown in f1, f4, and f5 in the item “healthy vs. moderate or higher”. In addition, a significant difference was shown in f3 and f4 in the item of "mild vs. moderate or higher".
- Ph10 showed a significant difference between f3 and f5 in the "healthy vs. mild" item. In addition, significant differences were shown in f1, f3, and f5 in the item "healthy versus moderate or higher”. In addition, a significant difference was shown in f1 and f3 in the item of "mild vs. moderate or higher".
- Ph01 to Ph10 can distinguish between healthy and mild dysphagia, healthy and moderate or more dysphagia, and mild and moderate or more dysphagia in all phrases. Therefore, if the subject It was confirmed that healthy and mild dysphagia, healthy and moderate or more dysphagia, and mild and moderate or more dysphagia can be distinguished by inputting speech data containing at least one sound of .
- Example 6 shows the degree of dysphagia (presence or absence of dysphagia, and severity ) is a table showing one of the effective feature amounts in the determination of .
- the table in FIG. 9A uses the average value of two utterances of the audio data shown in FIG. 3 as acoustic parameters.
- the table of FIG. 9B uses the difference between two utterances of the voice data obtained by uttering each voice data shown in FIG. 3 twice as acoustic parameters.
- the items in the horizontal direction indicate the average value, maximum value, minimum value, range value, average minimum value, and slope of the 13th-order mel-frequency cepstrum coefficients or dynamic feature quantities. Items in the vertical direction indicate the content of the utterance.
- " ⁇ " (circle) indicates that it is effective for determining the presence or absence of dysphagia.
- a double circle indicates that it is effective in determining the degree of progression of dysphagia.
- the determination device of the second embodiment differs from the first embodiment in that it uses phoneme analysis when determining dysphagia of a subject.
- the analysis means of the determination device of the second embodiment creates acoustic features by performing phoneme analysis on the voice data uttered by the subject. Then, the determination means of the determination device of the second embodiment determines the degree of progression of dysphagia by executing voice analysis using the acoustic feature amount that is the analysis result of the analysis means.
- FIG. 10 is a diagram showing the relationship between phoneme classes and activated phonemes. Even the same phoneme may belong to a plurality of phoneme classes. For example, as shown in FIG. 10, the phoneme /a/ belongs to "vocal""back”"open”"voiced" and the phoneme /p/ belongs to "consonantal""stop""labial".
- FIG. 11 shows the results of phoneme analysis of "pa" uttered by a healthy person without dysphagia and "pa” uttered by a person with moderate or higher dysphagia.
- FIG. 11 are the results of phoneme analysis for "pa” uttered by a healthy person
- (F), (H), and (J) are the phoneme analysis results for "pa” uttered by a person with moderate or higher dysphagia.
- the horizontal axis represents time (s)
- the vertical axis represents phoneme posterior probabilities (phonological posteriors) of each phoneme class.
- the central waveform shown in FIG. 11 corresponds to the voice data of the voice uttered by the subject.
- (A) and (B) of FIG. 11 are the phoneme analysis results when the phoneme classes are “vocal”, “back”, “consonantal”, and “anterior”.
- (C) and (D) of FIG. 11 are the results of phoneme analysis when the phoneme classes are "open”, “nasal”, “close”, and "stop”.
- (E) and (F) in FIG. 11 are the results of phoneme analysis when the phoneme classes are “continuant”, “flap”, “lateral” and “trill”.
- (G) and (H) of FIG. 11 are the results of phoneme analysis when the phoneme classes are "voice", “labial”, “strident” and “dental”.
- (I) and (J) in FIG. 11 are the phoneme analysis results when the phoneme class is "velar" and "pause”.
- the phoneme analysis result is considered to be useful as an acoustic feature amount when determining the presence or absence of dysphagia.
- Figures 12 to 17 show box-and-whisker diagrams that verify the classification performance of the presence or absence of dysphagia by each acoustic feature amount, which is the phoneme analysis result of the phoneme class.
- “mean” described after the phonological class for example, “consonantal”, “close”, “dental”, “velar”, “stop”, “anterior”, “back”, “continuant”, “open”, “labial” in the figure represents the mean
- “median” represents the median
- “std” represents the standard deviation.
- “healthy” represents healthy subjects
- “mild” represents people with mild dysphagia
- “severe” represents people with moderate or more dysphagia.
- the vertical axes in FIGS. 12 to 17 represent values of each statistic such as mean, median, and standard deviation. Specifically, the vertical axes in FIGS. 12 to 17 represent the average, median, standard deviation, etc. of the values of the acoustic features of the phoneme class at each time when the subject utters a certain phrase.
- Fig. 12 shows the statistics of the acoustic features when the subject utters "pa”.
- the standard deviation of the phoneme class "close” is “close_std”.
- “consonantal_std” which is the standard deviation of the phoneme class “consonantal”
- the phoneme class "close and “close_median” which is the median value of the phoneme class "close” are useful acoustic features.
- Figures 14 and 15 are the statistics of the acoustic features when the subject utters "ra". As shown in FIG. 14, when classifying healthy subjects, those with mild dysphagia, and those with moderate or more dysphagia, the average ⁇ close_mean'' that is the value, ⁇ close_median'' that is the median, ⁇ dental_std'' that is the standard deviation of the phonological class ⁇ dental'', and ⁇ stop_mean'' that is the mean value of the phonological class ⁇ stop'' are useful acoustic features. It turns out that
- the phonological class "velar” is used as an example. It can be seen that “velar_mean”, which is the average value of , and “velar_median”, which is the median value of , are useful acoustic features.
- Figures 16 and 17 are the statistics of the acoustic feature amount when the subject utters "egg". Although detailed description of FIGS. 16 and 17 is omitted, the acoustic feature amount of each phoneme class is useful for determining the degree of dysphagia as in FIGS. 12 to 15 .
- the statistics of each phoneme class are useful for determining the degree of dysphagia. Therefore, it can be confirmed that healthy and mild dysphagia, healthy and moderate or higher dysphagia, and mild dysphagia and moderate or higher dysphagia can be distinguished and determined.
- the analysis means of the determination device of the second embodiment identifies phonemes contained in the speech data by performing phoneme analysis on the speech data of the speech uttered by the subject. Then, the analysis means of the determination device of the second embodiment identifies the phoneme class to which the phonemes included in the speech data belong. Even the same phoneme may belong to a plurality of phoneme classes. For example, as shown in FIG. 10, the phoneme /a/ belongs to "vocal" "back” "open” "voiced” and the phoneme /p/ belongs to "consonantal" "stop” "labial".
- the analysis means of the determination device of the second embodiment calculates the phoneme posterior probabilities (phonological posteriors) of the phoneme class at each time of the speech data, which is time-series data. Then, the determination means of the determination device of the second embodiment uses the statistics of the phoneme posterior probability of the phoneme class at each time obtained by the analysis means (for example, the average, median, and standard deviation of the phoneme posterior probability at each time). etc.), the degree of dysphagia of the subject is determined. Therefore, according to the determination device of the second embodiment, the degree of dysphagia of the subject can be determined with high accuracy by performing the phoneme analysis of the voice data of the voice uttered by the subject.
- the determination device of the third embodiment differs from the first and second embodiments in that it uses voice intensity analysis when determining dysphagia of a subject.
- the analysis means of the determination device of the third embodiment creates an acoustic feature amount by performing a voice intensity analysis on the voice data uttered by the subject. Then, the determination means of the determination device of the third embodiment determines the degree of progression of the dysphagia by executing voice analysis using the acoustic feature amount that is the analysis result of the analysis means.
- FIG. 18 is a diagram showing audio data.
- the horizontal axis of FIG. 18 represents the time, and the vertical axis represents the strength of the sound.
- the black circles in FIG. 18 represent peak points of audio data.
- FIG. 19 The upper four lines of FIG. 19 show the correlation between the speech intensity analysis results and the dysphagia evaluation results for the phrases Ph09 "banana banana...banana” and Ph10 "kimono kimono kimono...kimono” uttered by the subject. It is an experimental result showing the relationship.
- peak_ave represents the average of the intensity peak points of the audio data
- peak_sd represents the standard deviation of the intensity peak points of the audio data
- peak_span_ave is the interval between the intensity peak points of the audio data.
- peak_span_sd represents the standard deviation of the interval between intensity peak points of the audio data.
- FIG. 19 shows the separation of healthy subjects and those with mild or more dysphagia using "peak_ave”, “peak_sd”, “peak_span_ave”, and “peak_span_sd” among the test results by the Bonferroni method, and healthy subjects and those with dysphagia This is the result of separating those with mild dysphagia from those with moderate or more severe dysphagia. As shown in FIG.
- FIG. 20 The left side of FIG. 20 is a box-and-whisker diagram of the acoustic feature value “peak_sd” for the phrase Ph09. I was able to confirm that.
- FIG. 21 is a box-and-whisker diagram of the acoustic feature value “peak_sd” for the phrase Ph10. It could be confirmed.
- the strength analysis results of voice data are useful for determining the degree of dysphagia. Therefore, it can be confirmed that healthy and mild dysphagia, healthy and moderate or higher dysphagia, and mild dysphagia and moderate or higher dysphagia can be distinguished and determined.
- the analysis means of the determination device of the third embodiment generates an acoustic feature amount related to the strength analysis by performing strength analysis on the voice data of the voice uttered by the subject.
- the analysis means of the determination device of the third embodiment generates data such as those described above as acoustic feature amounts relating to intensity analysis.
- the determination means of the determination device of the third embodiment determines the degree of dysphagia of the subject based on the acoustic feature quantity obtained by the analysis means. Therefore, according to the determination device of the third embodiment, the degree of dysphagia of the subject can be determined with high accuracy by analyzing the strength of the voice data of the voice uttered by the subject.
- phrases Ph09 “banana banana...banana” and Ph10 “kimono kimono kimono...kimono” were exemplified as phrases uttered by the subject. It can be anything as long as it is.
- phrases such as “patakapatakapataka”, “papapapa”, and “tatatata” may be used.
- Such phrases are phrases that generate nasal resonance (on the nose) and are considered useful for determining the degree of dysphagia.
- phrases including repetition it is possible to determine the consistency of the rhythm generated by repetition, so this is also considered useful for determining the degree of dysphagia.
- the degree of dysphagia of the subject may be determined by combining the phoneme analysis of the second embodiment and the strength analysis of the third embodiment.
- the analysis means of the determination device separates the acoustic feature amount created by performing the phoneme analysis on the speech data and the acoustic feature amount created by performing the intensity analysis on the speech data. is used to determine the degree of dysphagia in a subject.
- the degree of dysphagia of the subject can be determined with higher accuracy than when using only one of the phoneme analysis and the intensity analysis.
- the combination of acoustic features is not limited to the combination of the acoustic feature for phoneme analysis and the acoustic feature for intensity analysis. Any combination including crossover rate, Hurst index, and time from closure-open to onset of vocal cord vibration may be used.
- FIGS. 22 and 23 show the subject's behavior using acoustic features created by performing phoneme analysis on speech data and acoustic features created by performing strength analysis on speech data.
- FIG. 10 is a ROC curve diagram when determining the degree of dysphagia.
- 22 and 23 are the results of ROC curves when determining mild or higher dysphagia (FIG. 22) or moderate or higher dysphagia (FIG. 23) by a machine learning model using an acoustic feature set. . More specifically, FIGS. 22 and 23 show acoustic features created based on formant frequencies, acoustic features created based on Mel frequency cepstrum, acoustic features created by performing phoneme analysis, and strength FIG.
- An object of the present disclosure is to supply a storage medium storing a program code (computer program) for realizing the functions of the above-described embodiments to a system or device, and to cause the computer of the supplied system or device to execute the program stored in the storage medium. It is also accomplished by reading and executing code.
- the program code itself read from the storage medium implements the functions of the above-described embodiments, and the storage medium storing the program code constitutes the present disclosure.
- the computer executes the program to function as each processing unit. I do not care.
- the present disclosure is not limited to the particular examples described, but includes permutations of each configuration of each example, and various variations within the spirit of the disclosure as recited in the claims. Transformation and change are possible.
Landscapes
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Pathology (AREA)
- Surgery (AREA)
- Dentistry (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Animal Behavior & Ethology (AREA)
- Physiology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Veterinary Medicine (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Epidemiology (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2023551872A JP7754432B2 (ja) | 2021-09-29 | 2022-09-29 | 嚥下障害の判定装置および判定方法 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2021-159606 | 2021-09-29 | ||
| JP2021159606 | 2021-09-29 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023054632A1 true WO2023054632A1 (ja) | 2023-04-06 |
Family
ID=85782906
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/JP2022/036558 Ceased WO2023054632A1 (ja) | 2021-09-29 | 2022-09-29 | 嚥下障害の判定装置および判定方法 |
Country Status (2)
| Country | Link |
|---|---|
| JP (1) | JP7754432B2 (https=) |
| WO (1) | WO2023054632A1 (https=) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2023203962A1 (https=) * | 2022-04-18 | 2023-10-26 | ||
| CN120256885A (zh) * | 2025-06-05 | 2025-07-04 | 南昌大学第一附属医院 | 一种基于吞咽训练仪的吞咽效果预测方法及系统 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019225242A1 (ja) * | 2018-05-23 | 2019-11-28 | パナソニックIpマネジメント株式会社 | 摂食嚥下機能評価方法、プログラム、摂食嚥下機能評価装置および摂食嚥下機能評価システム |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US5182773A (en) * | 1991-03-22 | 1993-01-26 | International Business Machines Corporation | Speaker-independent label coding apparatus |
| JP3450411B2 (ja) * | 1994-03-22 | 2003-09-22 | キヤノン株式会社 | 音声情報処理方法及び装置 |
-
2022
- 2022-09-29 WO PCT/JP2022/036558 patent/WO2023054632A1/ja not_active Ceased
- 2022-09-29 JP JP2023551872A patent/JP7754432B2/ja active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2019225242A1 (ja) * | 2018-05-23 | 2019-11-28 | パナソニックIpマネジメント株式会社 | 摂食嚥下機能評価方法、プログラム、摂食嚥下機能評価装置および摂食嚥下機能評価システム |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPWO2023203962A1 (https=) * | 2022-04-18 | 2023-10-26 | ||
| JP7637922B2 (ja) | 2022-04-18 | 2025-03-03 | パナソニックIpマネジメント株式会社 | 口腔機能評価装置、口腔機能評価システム、及び、口腔機能評価方法 |
| CN120256885A (zh) * | 2025-06-05 | 2025-07-04 | 南昌大学第一附属医院 | 一种基于吞咽训练仪的吞咽效果预测方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| JPWO2023054632A1 (https=) | 2023-04-06 |
| JP7754432B2 (ja) | 2025-10-15 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| Montaña et al. | A Diadochokinesis-based expert system considering articulatory features of plosive consonants for early detection of Parkinson’s disease | |
| Nirgianaki | Acoustic characteristics of Greek fricatives | |
| Lee et al. | Effects of tone on the three-way laryngeal distinction in Korean: An acoustic and aerodynamic comparison of the Seoul and South Kyungsang dialects | |
| JP6024180B2 (ja) | 音声認識装置、音声認識方法、及びプログラム | |
| JP5120826B2 (ja) | 発音診断装置、発音診断方法、記録媒体、及び、発音診断プログラム | |
| US7529670B1 (en) | Automatic speech recognition system for people with speech-affecting disabilities | |
| Golabbakhsh et al. | Automatic identification of hypernasality in normal and cleft lip and palate patients with acoustic analysis of speech | |
| CN113496696A (zh) | 一种基于语音识别的言语功能自动评估系统和方法 | |
| Jeancolas et al. | Automatic detection of early stages of Parkinson's disease through acoustic voice analysis with mel-frequency cepstral coefficients | |
| Tran et al. | Improvement to a NAM-captured whisper-to-speech system | |
| Rozenstoks et al. | Automated assessment of oral diadochokinesis in multiple sclerosis using a neural network approach: Effect of different syllable repetition paradigms | |
| JP7754432B2 (ja) | 嚥下障害の判定装置および判定方法 | |
| CN103889324A (zh) | 一种使用语音特性来表征上气道的系统和方法 | |
| El Emary et al. | Towards developing a voice pathologies detection system | |
| Mittapalle et al. | Glottal flow characteristics in vowels produced by speakers with heart failure | |
| Drugman et al. | Tracheoesophageal speech: A dedicated objective acoustic assessment | |
| CN106157212A (zh) | 一种基于ema的发音障碍中文评估方法 | |
| Benavides et al. | Using HMM to detect speakers with severe obstructive sleep apnoea syndrome | |
| KR20240067186A (ko) | 음성신호 기반의 연하장애검출장치 및 연하장애검출방법 | |
| CN110870765A (zh) | 声门闭合实时测量和视听反馈技术的嗓音治疗仪器及方法 | |
| Amato et al. | Obesity and gastro-esophageal reflux voice disorders: a machine learning approach | |
| Yang et al. | Acoustic development of vowel production in native Mandarin-speaking children | |
| Rodriguez et al. | An evaluation of several methods for computing lingual coarticulatory resistance using ultrasound | |
| CN101292281A (zh) | 发音诊断装置、发音诊断方法、存储媒介、以及发音诊断程序 | |
| US20240341672A1 (en) | Method for evaluating possibility of dysphagia by analyzing acoustic signals, and server and non-transitory computer-readable recording medium performing same |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22876495 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023551872 Country of ref document: JP |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| 122 | Ep: pct application non-entry in european phase |
Ref document number: 22876495 Country of ref document: EP Kind code of ref document: A1 |