WO2024116254A1 - Information processing device, information processing method, information processing system, and information processing program - Google Patents
Information processing device, information processing method, information processing system, and information processing program Download PDFInfo
- Publication number
- WO2024116254A1 WO2024116254A1 PCT/JP2022/043832 JP2022043832W WO2024116254A1 WO 2024116254 A1 WO2024116254 A1 WO 2024116254A1 JP 2022043832 W JP2022043832 W JP 2022043832W WO 2024116254 A1 WO2024116254 A1 WO 2024116254A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- information processing
- user
- time
- voice data
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 88
- 238000003672 processing method Methods 0.000 title claims description 5
- 201000010099 disease Diseases 0.000 claims abstract description 94
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 94
- 238000012545 processing Methods 0.000 claims abstract description 74
- 208000024891 symptom Diseases 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims description 54
- 238000005070 sampling Methods 0.000 claims description 24
- 238000004364 calculation method Methods 0.000 claims description 14
- 238000004891 communication Methods 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 23
- 238000007781 pre-processing Methods 0.000 description 15
- 238000013500 data storage Methods 0.000 description 12
- 208000020016 psychiatric disease Diseases 0.000 description 12
- 238000003860 storage Methods 0.000 description 11
- 230000008602 contraction Effects 0.000 description 9
- 208000012902 Nervous system disease Diseases 0.000 description 8
- 208000018737 Parkinson disease Diseases 0.000 description 7
- 230000000994 depressogenic effect Effects 0.000 description 7
- 208000024714 major depressive disease Diseases 0.000 description 7
- 208000024827 Alzheimer disease Diseases 0.000 description 6
- 239000000284 extract Substances 0.000 description 6
- 238000010606 normalization Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 208000028698 Cognitive impairment Diseases 0.000 description 3
- 206010060860 Neurological symptom Diseases 0.000 description 3
- 208000010877 cognitive disease Diseases 0.000 description 3
- 238000009826 distribution Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000000692 Student's t-test Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 238000012353 t test Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 208000025966 Neurological disease Diseases 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B10/00—Other methods or instruments for diagnosis, e.g. instruments for taking a cell sample, for biopsy, for vaccination diagnosis; Sex determination; Ovulation-period determination; Throat striking implements
Definitions
- the disclosed technology relates to an information processing device, an information processing method, an information processing system, and an information processing program.
- WO 2020/013296 discloses a device for predicting whether a user has a psychiatric or neurological disorder. This device calculates various acoustic parameters from the user's voice data and uses these acoustic parameters to predict whether the user has a psychiatric or neurological disorder.
- the device disclosed in International Publication No. 2020/013296 estimates diseases using acoustic parameters calculated from voice data, but there is room for improvement in terms of accuracy.
- the disclosed technology has been made in consideration of the above circumstances, and provides an information processing device, information processing method, information processing system, and information processing program that can accurately estimate whether a user has a specified disease or symptom by applying a dynamic time warping method to voice data, which is time-series data of voice uttered by a user.
- a first aspect of the present disclosure is an information processing device including an acquisition unit that acquires voice data, which is time-series data of voice uttered by a user; a processing unit that generates preprocessed voice data representing data from the voice data acquired by the acquisition unit that is data that is a first hour or later from the start point of the voice data and that is a second hour or earlier than the end point of the voice data; a generation unit that generates processing result data by applying dynamic time warping to the preprocessed voice data generated by the processing unit; a calculation unit that calculates a score representing the degree to which the user has a predetermined disease or symptom based on the processing result data generated by the generation unit; and an estimation unit that estimates whether or not the user has a predetermined disease or symptom based on the score calculated by the calculation unit.
- a second aspect of the present disclosure is an information processing method that causes a computer to execute the following processes: acquire voice data that is time-series data of voice uttered by a user; generate preprocessed voice data representing data from the acquired voice data that is a first hour or later from the start point of the voice data and a second hour or earlier than the end point of the voice data; generate processing result data by applying dynamic time warping to the generated preprocessed voice data; calculate a score representing the degree to which the user has a specified disease or symptom based on the generated processing result data; and estimate whether or not the user has the specified disease or symptom based on the calculated score.
- a third aspect of the present disclosure is an information processing program for causing a computer to execute a process of acquiring voice data, which is time-series data of a voice uttered by a user, generating preprocessed voice data representing data from the acquired voice data that is a first hour or later from a start point of the voice data and a second hour or earlier than an end point of the voice data, generating processing result data by applying dynamic time warping to the generated preprocessed voice data, calculating a score representing the degree to which the user has a predetermined disease or symptom based on the generated processing result data, and inferring whether or not the user has the predetermined disease or symptom based on the calculated score.
- the disclosed technology has the effect of being able to accurately estimate whether a user has a specific disease or specific symptoms by applying a dynamic time warping method to voice data, which is time-series data of the voice uttered by the user.
- FIG. 1 is a diagram illustrating an example of a schematic configuration of an information processing system according to a first embodiment.
- FIG. 1 is a diagram for explaining an overview of a first embodiment.
- FIG. 2 is a diagram illustrating a schematic diagram of audio data for a predetermined period.
- FIG. 11 is a diagram for explaining a shift process for audio data.
- FIG. 11 is a diagram for explaining a sampling process for audio data.
- FIG. 1 is a diagram illustrating an example of a usage form of an information processing system according to a first embodiment.
- FIG. 1 illustrates an example of a computer constituting an information processing device.
- FIG. 2 is a diagram illustrating an example of a process executed by the information processing apparatus of the first embodiment.
- FIG. 11 is a diagram for explaining an overview of a second embodiment.
- FIG. 11 is a diagram illustrating an example of a usage form of an information processing system according to a second embodiment.
- FIG. 11 is a diagram illustrating an example of a usage form of an information processing system according to a second embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 is a diagram showing experimental results according to an embodiment.
- FIG. 1 shows an information processing system 10 according to the first embodiment.
- the information processing system 10 according to the first embodiment includes a microphone 12, an information processing device 14, and a display device 16.
- the information processing system 10 estimates whether or not the user has a specified disease or a specified symptom (hereinafter simply referred to as "disease, etc.") based on the user's voice collected by the microphone 12. Note that the information processing system 10 of this embodiment estimates whether or not the user has a psychiatric disease or a neurological disease, or a mental disorder symptom or a cognitive impairment symptom, as an example of a specified disease or a specified symptom.
- the information processing device 14 of the information processing system 10 of the first embodiment performs a predetermined preprocessing on the voice data, which is time-series data of the voice uttered by the user, to generate preprocessed data. Then, the information processing device 14 determines whether or not the user has a disease, etc., based on the result of applying dynamic time warping to the preprocessed data.
- dynamic time warping the distance between one time series data and another time series data is calculated.
- the processing result data obtained by dynamic time warping is used to estimate whether the user has a disease or the like.
- the information processing device 14 functionally comprises an acquisition unit 20, a voice data storage unit 22, a reference data storage unit 24, a processing unit 26, a generation unit 28, a calculation unit 30, an estimation unit 32, and an output unit 34.
- the information processing device 14 is realized by a computer as described below.
- the acquisition unit 20 acquires voice data, which is time-series data of the voice uttered by the user.
- the acquisition unit 20 then stores the voice data in the voice data storage unit 22.
- the voice data storage unit 22 stores the voice data acquired by the acquisition unit 20.
- the reference data storage unit 24 stores voice data of reference users who are known to have or have not had a disease, etc.
- the processing unit 26 reads out the voice data stored in the voice data storage unit 22. The processing unit 26 then performs a predetermined preprocessing on the voice data to generate preprocessed voice data. The method for generating the preprocessed voice data will be described in detail below.
- Figure 2 shows a diagram for explaining the preprocessed voice data.
- voice data When estimating whether a user has a disease or the like based on voice data produced by the user, it is preferable to use voice data in which the user's speech is stable.
- the initial part of the time series data represented by the voice data is data from the time when the user started speaking, it is often not desirable to use the data from that part to infer diseases, etc. For example, if a user suddenly starts speaking after not making any sound, it is expected that the user's voice will become hoarse or the volume will become low due to the user's unstable speech. Even if such data is used to infer diseases, etc., it is expected that accurate results will not be obtained.
- the processing unit 26 of the information processing device 14 in this embodiment therefore extracts central data from the audio data, which is time-series data.
- the processing unit 26 generates data D2 representing data from the voice data D1 that is data from a first time T1 onward from the start point of the voice data D1 and that is data from a second time T2 onward from the end point of the voice data D1.
- the data D2 is data that corresponds to a time period T3 in the voice data D1. This generates data for the central portion where the user's speech is stable.
- Fig. 3 is a diagram for explaining the predetermined period of data.
- the audio data Df is time-series data, and a predetermined signal is repeated.
- a similar signal waveform is repeated for each time interval T.
- voice data uttered by the user whose disease is to be estimated may be compared with voice data of a reference user whose disease is known to be present. For this reason, it is preferable that a predetermined period of data extracted from the voice data uttered by the user whose disease is to be estimated is aligned with a predetermined period of data in the voice data of the reference user. For this reason, for example, the processing unit 26 extracts data of a predetermined period that is the same as the period of the voice data of the reference user from the extracted central portion of data. This predetermined period is, for example, set in advance. Alternatively, for example, the predetermined period may be changed depending on the type of data.
- FIG. 4 shows a diagram for explaining the shift of data in the time axis direction.
- the voice data Ds has a repetition of a signal with a period Ts
- the reference user's voice data D Ref has a repetition of a signal with a period T Ref .
- Ts the voice data Ds has a repetition of a signal with a period
- T Ref the reference user's voice data D Ref has a repetition of a signal with a period
- the processing unit 26 shifts the extracted data for a predetermined period in the time axis direction.
- the processing unit 26 shifts the data for a predetermined period shown in FIG. 4 in the time axis direction represented by the arrow S by a predetermined amount.
- the amount of shift for this predetermined amount of time is set in advance.
- the amount of shift may be changed depending on the type of data.
- the processing unit 26 extracts sampling data obtained by sampling from the data shifted in the time axis direction.
- the voice data uttered by the user whose disease is to be estimated may be compared with the voice data of a reference user whose disease is known to be present. Therefore, it is preferable that the sampling rate for the voice data uttered by the user whose disease is to be estimated and the sampling rate for the voice data of the reference user are the same.
- sampled data D A and D B obtained by sampling audio data D are considered sampled data D A and D B obtained by sampling audio data D.
- the distance between sampled data D A generated at a sampling rate A and sampled data D B generated at a sampling rate B is calculated using the dynamic time warping method, a value representing a predetermined distance is calculated even though the original audio data D is the same.
- the processing unit 26 generates sampling data extracted at the same sampling rate as the sampling rate of the reference user's voice data.
- This sampling rate is set in advance.
- the sampling rate may be changed depending on the type of data. For example, 200 sampling points per cycle of data are extracted from a predetermined cycle of data.
- the processing unit 26 performs a time-axis expansion/contraction process on the sampling data obtained by sampling from the voice data.
- the voice data uttered by the target user whose disease is to be estimated may be compared with the voice data of a reference user whose disease is known to be present.
- the processing unit 26 executes a predetermined expansion/contraction process in the time axis direction on the data D3 shown in FIG. 2.
- the method of the predetermined expansion/contraction process is set in advance. Alternatively, for example, the method of expansion/contraction process may be changed depending on the type of data.
- the processing unit 26 performs expansion/contraction processing in the amplitude direction on the data that has been subjected to the expansion/contraction processing in the time axis direction.
- voice data uttered by a user whose disease is to be estimated may be compared with voice data of a reference user whose disease is known to be present.
- the processing unit 26 executes a predetermined stretching process in the amplitude direction on the data D4 shown in FIG. 2.
- the method of the predetermined stretching process is set in advance. Alternatively, for example, the method of stretching process may be changed depending on the type of data.
- the processing unit 26 generates preprocessed audio data by performing multiple preprocessing processes as described above on the audio data.
- the generating unit 28 generates processing result data by applying dynamic time warping to the preprocessed audio data generated by the processing unit 26.
- the processing result data obtained by applying dynamic time warping is calculated as a distance matrix representing the distance between each point of one time series data and each point of another time series data.
- the generation unit 28 reads out the voice data of the reference user stored in the reference data storage unit 24. Then, as shown in Fig. 2, the generation unit 28 applies a dynamic time warping method to the preprocessed voice data D5 and the voice data D Ref of the reference user to generate processing result data representing the distance between the preprocessed voice data D5 and the voice data D Ref of the reference user. Note that the voice data of the reference user may also be subjected to the preprocessing as described above.
- the generating unit 28 may generate the processing result data using only the preprocessed audio data. For example, the generating unit 28 may apply a dynamic time warping method to first audio data representing data in a first time interval in the preprocessed audio data and second audio data representing data in a second time interval in the preprocessed audio data, thereby generating processing result data representing the distance between the first audio data and the second audio data.
- the generation unit 28 applies a dynamic time warping method to first audio data D5-1 representing data in a first time interval in the preprocessed audio data D5, and second audio data D5-1 representing data in a second time interval in the preprocessed audio data D5, thereby generating processing result data representing the distance between the first audio data D5-1 and the second audio data D5-2.
- the generation unit 28 applies a dynamic time warping method to second audio data D5-2 representing data in a second time interval in the preprocessed audio data D5, and third audio data D5-3 representing data in a third time interval in the preprocessed audio data D5, thereby generating processing result data representing the distance between the second audio data D5-2 and the third audio data D5-3.
- the generation unit 28 generates processing result data representing the distance between the first audio data D5-1 and the third audio data D5-3 by applying the dynamic time warping method to the first audio data D5-1 and the third audio data D5-3. In this way, the generation unit 28 generates processing result data for each pair of audio data D5-1 to D5-9 within a predetermined time period.
- the calculation unit 30, which will be described later, may calculate a score representing the degree to which the user has a disease or the like based on the processing result data generated in this manner using only the preprocessed voice data D5.
- the calculation unit 30 calculates a score representing the degree to which the user has a disease or the like based on the processing result data generated by the generation unit 28. For example, the calculation unit 30 uses the average value, maximum value, minimum value, standard deviation, and median value of each element of the distance matrix generated by the generation unit 28 to calculate a score representing the degree to which the user has a specified disease or symptom using a known method.
- the estimation unit 32 estimates whether or not the user has a disease, etc., based on the score calculated by the calculation unit 30. For example, if the score is equal to or greater than a predetermined threshold, the estimation unit 32 estimates that the user has a disease, etc., and if the score is less than the predetermined threshold, the estimation unit 32 estimates that the user does not have a disease, etc.
- the output unit 34 outputs the estimation result estimated by the estimation unit 32. Note that the output unit 34 may output the score itself as the estimation result.
- the display device 16 displays the estimation results output from the estimation unit 32.
- the medical professional or user who operates the information processing device 14 checks the estimation results output from the display device 16 and confirms what disease or symptoms the user may have.
- the information processing system 10 of this embodiment is expected to be used, for example, under conditions such as those shown in FIG. 6.
- a medical professional H such as a doctor holds a tablet terminal, which is an example of the information processing system 10.
- the medical professional H uses a microphone (not shown) provided on the tablet terminal to collect voice data from a user U, who is a subject.
- the tablet terminal estimates whether or not the user U has any disease or symptom based on the voice data of the user U, and outputs the estimation result to a display unit (not shown).
- the medical professional H refers to the estimation result displayed on the display unit (not shown) of the tablet terminal to determine whether or not the user U has any disease or symptom.
- the information processing device 14 can be realized, for example, by a computer 50 shown in FIG. 7.
- the computer 50 has a CPU 51, a memory 52 as a temporary storage area, and a non-volatile storage unit 53.
- the computer 50 also has an input/output interface (I/F) 54 to which external devices and output devices are connected, and a read/write (R/W) unit 55 that controls reading and writing of data to the recording medium.
- the computer 50 also has a network I/F 56 that is connected to a network such as the Internet.
- the CPU 51, memory 52, storage unit 53, input/output I/F 54, R/W unit 55, and network I/F 56 are connected to each other via a bus 57.
- the storage unit 53 can be realized by a Hard Disk Drive (HDD), a Solid State Drive (SSD), a flash memory, etc.
- the storage unit 53 as a storage medium stores programs for causing the computer 50 to function.
- the CPU 51 reads the programs from the storage unit 53, expands them into the memory 52, and sequentially executes the processes contained in the programs.
- the information processing device 14 of the information processing system 10 executes each process shown in FIG. 8.
- step S100 the acquisition unit 20 acquires the user's voice data collected by the microphone 12. Then, the acquisition unit 20 stores the voice data in the voice data storage unit 22.
- step S102 the processing unit 26 reads out the voice data stored in the voice data storage unit 22. Then, the processing unit 26 extracts the central part of the voice data, which is data within a predetermined time period, from the voice data.
- step S104 the processing unit 26 extracts a predetermined period of data from the central portion of the audio data acquired in step S102.
- step S105 the processing unit 26 performs a shift process on the audio data for the predetermined period acquired in step S104.
- step S106 the processing unit 26 generates sampling data by performing a predetermined sampling process on the shifted data for a predetermined period obtained in step S105.
- step S108 the processing unit 26 performs an amplitude stretching process on the sampling data generated in step S106.
- step S110 the processing unit 26 performs expansion/contraction processing in the time axis direction on the sampling data that has been expanded/contracted in the amplitude direction and obtained in step S108.
- preprocessed audio data is generated by performing preprocessing on the audio data.
- step S112 the estimation unit 32 applies dynamic time warping to the preprocessed voice data and the reference user's voice data stored in the reference data storage unit 24, thereby generating processing result data representing the distance between the preprocessed voice data and the reference user's voice data.
- reference users who have a specified disease, etc. and reference users who do not have the specified disease are set as reference users.
- the estimation unit 32 generates processing result data between the preprocessed voice data and the voice data of a reference user who has a disease or the like, which is stored in the reference data storage unit 24.
- the estimation unit 32 generates processing result data between the preprocessed voice data and the voice data of a reference user who does not have a disease or the like, which is stored in the reference data storage unit 24.
- step S114 the calculation unit 30 calculates a score representing the degree to which the user has a disease or the like, based on the processing result data generated in step S112 above.
- the score may be, for example, a larger value the higher the degree to which the user has a disease or the like.
- the score may be, for example, a smaller value the higher the degree to which the user has a disease or the like.
- the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is high.
- the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is low.
- the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is low.
- the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is high.
- step S116 the estimation unit 32 estimates whether or not the user has a disease, etc., based on the score calculated in step S114 above. For example, if the score is equal to or greater than a predetermined threshold, the estimation unit 32 estimates that the user has a disease, etc., and if the score is less than the predetermined threshold, the estimation unit 32 estimates that the user does not have a disease, etc.
- the estimation unit 32 may also estimate which disease, etc. the user has based on the processing result data for each of the voice data of the reference user having disease A, the voice data of the reference user having disease B, and the voice data of the reference user having disease C.
- step S118 the output unit 34 outputs the estimation result estimated in step S116.
- the display device 16 displays the inference results output from the output unit 34.
- the medical professional or user operating the information processing device 14 checks the inference results output from the display device 16 and confirms what disease or symptoms the user is likely to have.
- the information processing system 10 of the first embodiment acquires voice data, which is time-series data of voice uttered by the user, and generates preprocessed data.
- the information processing device 14 then generates processing result data by applying dynamic time warping to the generated preprocessed voice data, and calculates a score representing the degree to which the user has a specified disease or symptom based on the generated processing result data.
- the information processing device 14 estimates whether or not the user has a specified disease or symptom based on the calculated score. In this way, by applying dynamic time warping to the voice data, which is time-series data of voice uttered by the user, it is possible to accurately estimate whether or not the user has a specified disease or specified symptom.
- the preprocessed voice data is the central portion of the acquired voice data that is a first hour or later from the start point of the voice data and that represents data that is a second hour or earlier than the end point of the voice data.
- the preprocessed voice data is also data for a predetermined period.
- the preprocessed voice data is also data obtained by shifting data in the time axis direction.
- the preprocessed voice data is also data obtained by performing a predetermined sampling process.
- the preprocessed voice data is also data obtained by performing a process to expand and contract the voice data in the time axis direction.
- the preprocessed voice data is also data obtained by performing a process to expand and contract the voice data in the amplitude direction.
- FIG. 9 shows an information processing system 310 according to the second embodiment.
- the information processing system 310 includes a user terminal 18 and an information processing device 314.
- the information processing device 314 further includes a communication unit 36.
- the information processing device 314 of the information processing system 310 estimates whether the user has a disease or the like based on the user's voice collected by the microphone 12 provided on the user terminal 18.
- the information processing system 310 of the second embodiment is expected to be used, for example, under the conditions shown in Figures 10 and 11.
- a medical professional H such as a doctor operates an information processing device 314, and a user U, who is a subject, operates a user terminal 18.
- the user U collects his/her own voice data "XXXX" using the microphone 12 of the user terminal 18 that he/she operates.
- the user terminal 18 then transmits the voice data to the information processing device 314 via a network 19 such as the Internet.
- the information processing device 314 receives the voice data "XXX" of the user U transmitted from the user terminal 18. The information processing device 314 then estimates whether or not the user U has any disease or symptom based on the received voice data, and outputs the estimation result to the display unit 315 of the information processing device 314.
- the medical worker H refers to the estimation result displayed on the display unit 315 of the information processing device 314 and determines whether or not the user U has any disease or symptom.
- the subject user U collects his/her own voice data using the microphone 12 of the user terminal 18 that he/she operates.
- the user terminal 18 then transmits the voice data to the information processing device 314 via a network 19 such as the Internet.
- the information processing device 314 receives the user U's voice data transmitted from the user terminal 18.
- the information processing device 314 estimates whether or not the user U has any disease or symptom based on the received voice data, and transmits the estimation result to the user terminal 18.
- the user terminal 18 receives the estimation result transmitted from the information processing device 14, and displays the estimation result on a display unit (not shown). The user checks the estimation result and confirms what disease or symptom the user is likely to have.
- the information processing device 314 executes an information processing routine similar to that shown in FIG. 8 above.
- the information processing system of the second embodiment can estimate whether a user has a psychiatric disorder, a neurological disorder, or symptoms thereof, using an information processing device 214 installed on the cloud.
- FIG. 12 is a graph plotting speech data obtained from subjects evaluated as depressed patients (indicated by square marks in FIG. 12) and speech data obtained from subjects evaluated as healthy subjects (indicated by circles in FIG. 12).
- FIG. 12 shows data obtained using the preprocessing and DTW of this embodiment.
- the horizontal axis dist2 of the graph in FIG. 12 represents the distance from the average reference for healthy subjects, and the vertical axis dist3 represents the distance from the average reference for depressed patients.
- the data represented by square marks which is the speech data of depressed patients, tends to have a long distance from the average reference for healthy subjects and a short distance from the average reference for depressed patients.
- FIG. 12 shows the ROC curve and the AUC value.
- the AUC value is 1.0 when depression is judged by combining the distance dist2 from the average reference for healthy subjects and the distance dist3 from the average reference for depressed patients.
- FIG. 13 is a table of the data shown in FIG. 12 and other experimental results.
- HAMD in FIG. 13 represents the score for depression evaluation.
- HAMD ⁇ 7 represents that only those with a score of 7 or more were evaluated.
- MDD represents patients with depression
- PD represents patients with Parkinson's disease
- AD represents patients with Alzheimer's disease
- HE represents healthy individuals.
- Intra-Person DTW represents the case where a pair is generated between a certain section and another section in the speech data of one subject, features are generated, and DTW is performed.
- the AUC value when distinguishing between HE and MDD is 0.9643
- the AUC value when distinguishing between HE and PD is 0.9173.
- FIG. 14 shows the experimental results indicating the effects of the various pre-processing methods used in this embodiment.
- the results in the top row of FIG. 14 are the baseline results.
- the results from the second row onwards indicate the effect of applying each pre-processing method to the data, and it can be seen that each pre-processing method contributes to the depression assessment performance. Note that in the data in the bottom row of the table in FIG. 14, the performance evaluation (AUC) values are reversed, and this is explained below.
- AUC performance evaluation
- Figure 15 shows the difference in distance calculated by DTW when no amplitude adjustment is performed (labeled "without amplitude normalization” in Figure 15) and when amplitude adjustment is performed (labeled "with amplitude normalization” in Figure 15).
- the vertical axis of the graph shown in Figure 15 is the distance calculated by DTW.
- Figure 15 shows the distance values calculated by DTW for HE_HospitalA, which represents data obtained from multiple healthy subjects at Hospital A, multiple depressed patients MDD, and HE_HospitalB, which represents data obtained from multiple healthy subjects at Hospital B.
- Figure 16 shows various conditions and the results of discrimination between healthy individuals (HE) and patients (Sick) suffering from major depressive disorder (MDD), Alzheimer's disease (AD), and Parkinson's disease (PD) using the Intra-Person DTW of this embodiment.
- Figure 17 shows the ROC curve corresponding to the performance evaluation AUC shown in Figure 16 above.
- Figure 18 shows the DTW value calculated under the conditions shown in Figure 16 above.
- Figure 19 shows the actual symptoms (labeled "Actual” in Figure 19) and the prediction results using the method of this embodiment (labeled "Prediction” in Figure 19).
- Figure 20 shows the results of a multiple comparison test.
- the AUC value when distinguishing between healthy individuals (HE) and patients suffering from some disease (Sick) is 0.8486.
- a multiple comparison test of the average DTW values shows that the distributions of healthy individuals (HE) and those with each disease (MDD, AD, PD) are different (significant difference in the average: p ⁇ 0.01).
- E in the table stands for "x10” and the number next to it stands for the exponent.
- the program can also be provided by storing it on a computer-readable recording medium.
- the processing that the CPU reads and executes the software may be executed by various processors other than the CPU.
- the processor in this case include a PLD (Programmable Logic Device) such as an FPGA (Field-Programmable Gate Array) whose circuit configuration can be changed after manufacture, and a dedicated electrical circuit such as an ASIC (Application Specific Integrated Circuit) that is a processor having a circuit configuration designed specifically to execute a specific process.
- a GPGPU General-purpose graphics processing unit
- Each process may be executed by one of these various processors, or by a combination of two or more processors of the same or different types (e.g., multiple FPGAs, a combination of a CPU and an FPGA, etc.). More specifically, the hardware structure of these various processors is an electric circuit that combines circuit elements such as semiconductor elements.
- the program is described as being pre-stored (installed) in storage, but this is not limiting.
- the program may be provided in a form stored in a non-transient storage medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), or a USB (Universal Serial Bus) memory.
- the program may also be downloaded from an external device via a network.
- each process of this embodiment may be implemented by a computer or server equipped with a general-purpose processor and storage device, and each process may be executed by a program.
- This program is stored in a storage device, and can be recorded on a recording medium such as a magnetic disk, optical disk, or semiconductor memory, or can be provided via a network.
- a recording medium such as a magnetic disk, optical disk, or semiconductor memory
- any other components do not have to be implemented by a single computer or server, and may be distributed across multiple computers connected by a network.
- a psychiatric disease or a nervous system disease, or a psychiatric disorder or a cognitive impairment symptom is estimated as an example of a predetermined disease or a predetermined symptom is described as an example, but the present invention is not limited to this.
- the predetermined disease or the predetermined symptom may be any. It is assumed that various diseases or symptoms are reflected in the voice data. For example, not only respiratory diseases and symptoms, but also psychiatric diseases and the like are affected by the voice data.
- a psychiatric disease or a nervous system disease, or a psychiatric disorder or a cognitive impairment symptom is estimated as an example of a predetermined disease or a predetermined symptom is described as an example, but the present invention is not limited to this, and any disease or the like may be estimated as long as the effect of the disease or the like is affected by the voice data.
- Preprocessed audio data when generating preprocessed audio data, all of the multiple preprocessing processes are executed, but this is not limited to the above.
- Preprocessed audio data may be generated using at least one of the preprocessing processes described above.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Medical Informatics (AREA)
- Engineering & Computer Science (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Pathology (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
Provided is an information processing device that acquires speech data, which is time-series data of speech uttered by a user, and generates preprocessed speech data representing the data that, from out of the acquired speech data, is no earlier than a first time from the start point of the speech data and no later than a second time from the end of the speech data. Additionally, this information processing device generates processing result data by applying dynamic time warping to the preprocessed speech data that is generated. The information processing device calculates a score representing the degree to which the user has a predetermined disease or symptom on the basis of the generated processing result data, and estimates whether or not the user has the predetermined disease or symptom on the basis of the calculated score.
Description
開示の技術は、情報処理装置、情報処理方法、情報処理システム、及び情報処理プログラムに関する。
The disclosed technology relates to an information processing device, an information processing method, an information processing system, and an information processing program.
国際公開第2020/013296号公報には、精神系疾患又は神経系疾患を推定する装置が開示されている。この装置は、ユーザの音声データから各種の音響パラメータを算出し、それらの音響パラメータを用いて、ユーザが精神系疾患又は神経系疾患であるか否かを推定する。
International Publication No. WO 2020/013296 discloses a device for predicting whether a user has a psychiatric or neurological disorder. This device calculates various acoustic parameters from the user's voice data and uses these acoustic parameters to predict whether the user has a psychiatric or neurological disorder.
上記国際公開第2020/013296号公報に開示されている装置は、音声データから算出される音響パラメータを用いて疾患を推定するものの、その精度に関しては改善の余地がある。
The device disclosed in International Publication No. 2020/013296 estimates diseases using acoustic parameters calculated from voice data, but there is room for improvement in terms of accuracy.
開示の技術は、上記の事情を鑑みてなされたものであり、ユーザが発した音声の時系列データである音声データに対して動的時間伸縮法を適用することにより、ユーザが所定の疾患又は症状を有しているか否かを精度良く推定することができる、情報処理装置、情報処理方法、情報処理システム、及び情報処理プログラムを提供する。
The disclosed technology has been made in consideration of the above circumstances, and provides an information processing device, information processing method, information processing system, and information processing program that can accurately estimate whether a user has a specified disease or symptom by applying a dynamic time warping method to voice data, which is time-series data of voice uttered by a user.
上記の目的を達成するために本開示の第1態様は、ユーザが発した音声の時系列データである音声データを取得する取得部と、前記取得部により取得された前記音声データのうちの、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータを表す前処理済み音声データを生成する処理部と、前記処理部によって生成された前記前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成する生成部と、前記生成部により生成された前記処理結果データに基づいて、前記ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出する算出部と、前記算出部により算出された前記スコアに基づいて、前記ユーザが所定の疾患又は症状を有しているか否かを推定する推定部と、を含む情報処理装置である。
In order to achieve the above object, a first aspect of the present disclosure is an information processing device including an acquisition unit that acquires voice data, which is time-series data of voice uttered by a user; a processing unit that generates preprocessed voice data representing data from the voice data acquired by the acquisition unit that is data that is a first hour or later from the start point of the voice data and that is a second hour or earlier than the end point of the voice data; a generation unit that generates processing result data by applying dynamic time warping to the preprocessed voice data generated by the processing unit; a calculation unit that calculates a score representing the degree to which the user has a predetermined disease or symptom based on the processing result data generated by the generation unit; and an estimation unit that estimates whether or not the user has a predetermined disease or symptom based on the score calculated by the calculation unit.
本開示の第2態様は、ユーザが発した音声の時系列データである音声データを取得し、取得された前記音声データのうちの、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータを表す前処理済み音声データを生成し、生成された前記前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成し、生成された前記処理結果データに基づいて、前記ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出し、算出された前記スコアに基づいて、前記ユーザが所定の疾患又は症状を有しているか否かを推定する、処理をコンピュータに実行させる情報処理方法である。
A second aspect of the present disclosure is an information processing method that causes a computer to execute the following processes: acquire voice data that is time-series data of voice uttered by a user; generate preprocessed voice data representing data from the acquired voice data that is a first hour or later from the start point of the voice data and a second hour or earlier than the end point of the voice data; generate processing result data by applying dynamic time warping to the generated preprocessed voice data; calculate a score representing the degree to which the user has a specified disease or symptom based on the generated processing result data; and estimate whether or not the user has the specified disease or symptom based on the calculated score.
本開示の第3態様は、ユーザが発した音声の時系列データである音声データを取得し、取得された前記音声データのうちの、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータを表す前処理済み音声データを生成し、生成された前記前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成し、生成された前記処理結果データに基づいて、前記ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出し、算出された前記スコアに基づいて、前記ユーザが所定の疾患又は症状を有しているか否かを推定する、処理をコンピュータに実行させるための情報処理プログラムである。
A third aspect of the present disclosure is an information processing program for causing a computer to execute a process of acquiring voice data, which is time-series data of a voice uttered by a user, generating preprocessed voice data representing data from the acquired voice data that is a first hour or later from a start point of the voice data and a second hour or earlier than an end point of the voice data, generating processing result data by applying dynamic time warping to the generated preprocessed voice data, calculating a score representing the degree to which the user has a predetermined disease or symptom based on the generated processing result data, and inferring whether or not the user has the predetermined disease or symptom based on the calculated score.
開示の技術によれば、ユーザが発した音声の時系列データである音声データに対して動的時間伸縮法を適用することにより、ユーザが所定の疾患又は所定の症状を有しているか否かを精度良く推定することができる、という効果が得られる。
The disclosed technology has the effect of being able to accurately estimate whether a user has a specific disease or specific symptoms by applying a dynamic time warping method to voice data, which is time-series data of the voice uttered by the user.
以下、図面を参照して開示の技術の実施形態を詳細に説明する。
Below, an embodiment of the disclosed technology will be described in detail with reference to the drawings.
<第1実施形態の情報処理システム>
<First embodiment of information processing system>
図1に、第1実施形態に係る情報処理システム10を示す。図1に示されるように、第1実施形態の情報処理システム10は、マイク12と、情報処理装置14と、表示装置16とを備えている。
FIG. 1 shows an information processing system 10 according to the first embodiment. As shown in FIG. 1, the information processing system 10 according to the first embodiment includes a microphone 12, an information processing device 14, and a display device 16.
情報処理システム10は、マイク12により集音されたユーザの音声に基づいて、ユーザが所定の疾患又は所定の症状(以下、単に「疾患等」と称する。)を有しているか否かを推定する。なお、本実施形態の情報処理システム10は、所定の疾患又は所定の症状の一例として、精神系疾患若しくは神経系疾患、又は、精神障害症状若しくは認知機能障害症状を有しているか否かを推定する。
The information processing system 10 estimates whether or not the user has a specified disease or a specified symptom (hereinafter simply referred to as "disease, etc.") based on the user's voice collected by the microphone 12. Note that the information processing system 10 of this embodiment estimates whether or not the user has a psychiatric disease or a neurological disease, or a mental disorder symptom or a cognitive impairment symptom, as an example of a specified disease or a specified symptom.
第1実施形態の情報処理システム10の情報処理装置14は、ユーザが発した音声の時系列データである音声データに対して所定の前処理を施し、前処理済みのデータを生成する。そして、情報処理装置14は、前処理済みのデータに対して動的時間伸縮法(Dynamic Time Warping)を適用した結果に基づいて、ユーザが疾患等を有しているか否かを判定する。
The information processing device 14 of the information processing system 10 of the first embodiment performs a predetermined preprocessing on the voice data, which is time-series data of the voice uttered by the user, to generate preprocessed data. Then, the information processing device 14 determines whether or not the user has a disease, etc., based on the result of applying dynamic time warping to the preprocessed data.
動的時間伸縮法では、ある時系列データと別の時系列データとの間の距離が計算される。本実施形態では、動的時間伸縮法によって得られる処理結果データを用いて、ユーザが疾患等を有しているか否かを推定する。
In dynamic time warping, the distance between one time series data and another time series data is calculated. In this embodiment, the processing result data obtained by dynamic time warping is used to estimate whether the user has a disease or the like.
以下、具体的に説明する。
The details are explained below.
図1に示されるように、情報処理装置14は、機能的には、取得部20と、音声データ記憶部22と、参照データ記憶部24と、処理部26と、生成部28と、算出部30と、推定部32と、出力部34とを備えている。情報処理装置14は、後述するようなコンピュータにより実現される。
As shown in FIG. 1, the information processing device 14 functionally comprises an acquisition unit 20, a voice data storage unit 22, a reference data storage unit 24, a processing unit 26, a generation unit 28, a calculation unit 30, an estimation unit 32, and an output unit 34. The information processing device 14 is realized by a computer as described below.
取得部20は、ユーザが発した音声の時系列データである音声データを取得する。そして、取得部20は、音声データを音声データ記憶部22へ格納する。
The acquisition unit 20 acquires voice data, which is time-series data of the voice uttered by the user. The acquisition unit 20 then stores the voice data in the voice data storage unit 22.
音声データ記憶部22には、取得部20により取得された音声データが格納される。
The voice data storage unit 22 stores the voice data acquired by the acquisition unit 20.
参照データ記憶部24には、疾患等を有しているか否かが既知である参照用ユーザの音声データが格納されている。
The reference data storage unit 24 stores voice data of reference users who are known to have or have not had a disease, etc.
処理部26は、音声データ記憶部22に記憶されている音声データを読み出す。そして、処理部26は、音声データに対して所定の前処理を施し、前処理済み音声データを生成する。前処理済み音声データの生成方法について、以下、具体的に説明する。図2に、前処理済み音声データを説明するための図を示す。
The processing unit 26 reads out the voice data stored in the voice data storage unit 22. The processing unit 26 then performs a predetermined preprocessing on the voice data to generate preprocessed voice data. The method for generating the preprocessed voice data will be described in detail below. Figure 2 shows a diagram for explaining the preprocessed voice data.
(音声データの中心部分の抽出)
(Extracting the central part of the audio data)
ユーザが発した音声データに基づいて当該ユーザが疾患等を有しているか否かを推定する際には、ユーザの発声が安定している音声データを用いる方が好ましい。
When estimating whether a user has a disease or the like based on voice data produced by the user, it is preferable to use voice data in which the user's speech is stable.
この点、音声データが表す時系列データのうちの初期の箇所は、ユーザが音声を発し始めた時刻のデータであるため、その箇所のデータを疾患等の推定に利用するのは好ましくない場合が多い。例えば、ユーザが声を発していない状態からいきなり声を発する場合、ユーザの発声が安定しないことにより、声がかすれてしまったり、声量が小さくなってしまうといった事態が予想される。このようなデータを疾患等の推定に利用したとしても、精度の良い結果は得られないことが予想される。
In this regard, since the initial part of the time series data represented by the voice data is data from the time when the user started speaking, it is often not desirable to use the data from that part to infer diseases, etc. For example, if a user suddenly starts speaking after not making any sound, it is expected that the user's voice will become hoarse or the volume will become low due to the user's unstable speech. Even if such data is used to infer diseases, etc., it is expected that accurate results will not be obtained.
さらに、音声データが表す時系列データのうちの終点に近い箇所も、疾患等の推定に利用するのは好ましくない場合が多い。例えば、ユーザが長い発音の声を発した場合にユーザが息切れをしてしまい声が続かなかったり、語尾があいまいな発音となってしまうといった事態が予想される。
Furthermore, it is often not desirable to use the points near the end of the time series data represented by the voice data to infer illnesses, etc. For example, if a user speaks a long pronunciation, it is expected that the user may run out of breath and not be able to continue speaking, or the pronunciation of the end of the word may become unclear.
そこで、本実施形態の情報処理装置14の処理部26は、時系列データである音声データから中心部分のデータを抽出する。
The processing unit 26 of the information processing device 14 in this embodiment therefore extracts central data from the audio data, which is time-series data.
具体的には、処理部26は、図2に示されるように、音声データD1のうちの、音声データD1の開始点から第1時間T1以後のデータであって、かつ音声データD1の終了点よりも第2時間T2以前のデータを表すデータD2を生成する。データD2は、音声データD1のうちの時間区間T3に相当するデータである。これにより、ユーザの発声が安定している中心部分のデータが生成される。
Specifically, as shown in FIG. 2, the processing unit 26 generates data D2 representing data from the voice data D1 that is data from a first time T1 onward from the start point of the voice data D1 and that is data from a second time T2 onward from the end point of the voice data D1. The data D2 is data that corresponds to a time period T3 in the voice data D1. This generates data for the central portion where the user's speech is stable.
(所定周期分のデータの抽出)
さらに、処理部26は、抽出された中心部分のデータから所定周期分のデータを抽出する。図3に、所定周期分のデータを説明するための図を示す。図3に示されるように、音声データDfは時系列データであり、所定信号の繰り返しが存在する。例えば、図3に示される例では、時間区間T毎に、同様の信号波形が繰り返されている。 (Extraction of data for a given period)
Furthermore, the processing unit 26 extracts a predetermined period of data from the extracted central portion of data. Fig. 3 is a diagram for explaining the predetermined period of data. As shown in Fig. 3, the audio data Df is time-series data, and a predetermined signal is repeated. For example, in the example shown in Fig. 3, a similar signal waveform is repeated for each time interval T.
さらに、処理部26は、抽出された中心部分のデータから所定周期分のデータを抽出する。図3に、所定周期分のデータを説明するための図を示す。図3に示されるように、音声データDfは時系列データであり、所定信号の繰り返しが存在する。例えば、図3に示される例では、時間区間T毎に、同様の信号波形が繰り返されている。 (Extraction of data for a given period)
Furthermore, the processing unit 26 extracts a predetermined period of data from the extracted central portion of data. Fig. 3 is a diagram for explaining the predetermined period of data. As shown in Fig. 3, the audio data Df is time-series data, and a predetermined signal is repeated. For example, in the example shown in Fig. 3, a similar signal waveform is repeated for each time interval T.
後述するように、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。そのため、疾患等を推定する対象のユーザが発した音声データから切り出される所定周期分のデータと、参照用ユーザの音声データにおける所定周期分のデータとは揃えられている方が好ましい。このため、例えば、処理部26は、抽出された中心部分のデータから、参照用ユーザの音声データの周期と同一の所定周期分のデータを抽出する。この所定周期は、例えば、予め設定される。または、例えば、データの種類に応じて、所定周期を変化させるようにしてもよい。
As described below, when estimating whether or not a user has a disease, voice data uttered by the user whose disease is to be estimated may be compared with voice data of a reference user whose disease is known to be present. For this reason, it is preferable that a predetermined period of data extracted from the voice data uttered by the user whose disease is to be estimated is aligned with a predetermined period of data in the voice data of the reference user. For this reason, for example, the processing unit 26 extracts data of a predetermined period that is the same as the period of the voice data of the reference user from the extracted central portion of data. This predetermined period is, for example, set in advance. Alternatively, for example, the predetermined period may be changed depending on the type of data.
(時間軸方向へのシフトによるデータの抽出)
次に、処理部26は、前記抽出された所定周期分のデータを時間軸方向へシフトさせる。図4に、時間軸方向へのデータのシフトを説明するための図を示す。図4に示されるように、音声データDsには周期Tsで信号の繰り返しが存在しており、参照用ユーザの音声データDRefには周期TRefで信号の繰り返しが存在している場合を考える。この場合、図4に示されるように、音声データDsの切り出しの開始部分P1と、参照用ユーザの音声データDRefの開始部分P2とが揃っていない場合には、仮に音声データDsと参照用ユーザの音声データDRefと類似していたとしても、動的時間伸縮法により計算される、音声データDと音声データDRefと間の距離を表す値が大きくなってしまう場合もあり得る。 (Extracting data by shifting along the time axis)
Next, the processing unit 26 shifts the extracted data of the predetermined period in the time axis direction. FIG. 4 shows a diagram for explaining the shift of data in the time axis direction. As shown in FIG. 4, consider a case where the voice data Ds has a repetition of a signal with a period Ts, and the reference user's voice data D Ref has a repetition of a signal with a period T Ref . In this case, as shown in FIG. 4, if the start part P1 of the cut-out of the voice data Ds and the start part P2 of the reference user's voice data D Ref are not aligned, even if the voice data Ds and the reference user's voice data D Ref are similar, the value representing the distance between the voice data D and the voice data D Ref calculated by the dynamic time warping method may become large.
次に、処理部26は、前記抽出された所定周期分のデータを時間軸方向へシフトさせる。図4に、時間軸方向へのデータのシフトを説明するための図を示す。図4に示されるように、音声データDsには周期Tsで信号の繰り返しが存在しており、参照用ユーザの音声データDRefには周期TRefで信号の繰り返しが存在している場合を考える。この場合、図4に示されるように、音声データDsの切り出しの開始部分P1と、参照用ユーザの音声データDRefの開始部分P2とが揃っていない場合には、仮に音声データDsと参照用ユーザの音声データDRefと類似していたとしても、動的時間伸縮法により計算される、音声データDと音声データDRefと間の距離を表す値が大きくなってしまう場合もあり得る。 (Extracting data by shifting along the time axis)
Next, the processing unit 26 shifts the extracted data of the predetermined period in the time axis direction. FIG. 4 shows a diagram for explaining the shift of data in the time axis direction. As shown in FIG. 4, consider a case where the voice data Ds has a repetition of a signal with a period Ts, and the reference user's voice data D Ref has a repetition of a signal with a period T Ref . In this case, as shown in FIG. 4, if the start part P1 of the cut-out of the voice data Ds and the start part P2 of the reference user's voice data D Ref are not aligned, even if the voice data Ds and the reference user's voice data D Ref are similar, the value representing the distance between the voice data D and the voice data D Ref calculated by the dynamic time warping method may become large.
そこで、処理部26は、抽出された所定周期分のデータを時間軸方向へシフトさせる。例えば、処理部26は、図4に示される所定周期分のデータを、矢印Sが表す時間軸方向へ所定時間分シフトさせる。なお、例えば、この所定時間のシフト量は、予め設定される。または、例えば、データの種類に応じて、シフト量を変化させるようにしてもよい。
Then, the processing unit 26 shifts the extracted data for a predetermined period in the time axis direction. For example, the processing unit 26 shifts the data for a predetermined period shown in FIG. 4 in the time axis direction represented by the arrow S by a predetermined amount. Note that, for example, the amount of shift for this predetermined amount of time is set in advance. Alternatively, for example, the amount of shift may be changed depending on the type of data.
(所定サンプリングレートのサンプリングによるデータの抽出)
次に、処理部26は、時間軸方向へのシフト処理がされたデータからサンプリングすることにより得られるサンプリングデータを抽出する。上述したように、本実施形態では、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。そのため、疾患等を推定する対象のユーザが発した音声データに対するサンプリングレートと、参照用ユーザの音声データに対するサンプリングレートとは揃えられている方が好ましい。 (Extraction of data by sampling at a predetermined sampling rate)
Next, the processing unit 26 extracts sampling data obtained by sampling from the data shifted in the time axis direction. As described above, in this embodiment, when estimating whether or not a user has a disease, the voice data uttered by the user whose disease is to be estimated may be compared with the voice data of a reference user whose disease is known to be present. Therefore, it is preferable that the sampling rate for the voice data uttered by the user whose disease is to be estimated and the sampling rate for the voice data of the reference user are the same.
次に、処理部26は、時間軸方向へのシフト処理がされたデータからサンプリングすることにより得られるサンプリングデータを抽出する。上述したように、本実施形態では、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。そのため、疾患等を推定する対象のユーザが発した音声データに対するサンプリングレートと、参照用ユーザの音声データに対するサンプリングレートとは揃えられている方が好ましい。 (Extraction of data by sampling at a predetermined sampling rate)
Next, the processing unit 26 extracts sampling data obtained by sampling from the data shifted in the time axis direction. As described above, in this embodiment, when estimating whether or not a user has a disease, the voice data uttered by the user whose disease is to be estimated may be compared with the voice data of a reference user whose disease is known to be present. Therefore, it is preferable that the sampling rate for the voice data uttered by the user whose disease is to be estimated and the sampling rate for the voice data of the reference user are the same.
例えば、図5に示されるように、音声データDをサンプリングすることにより得られるサンプリングデータDA,DBを考える。この場合、サンプリングレートAによって生成されたサンプリングデータDAと、サンプリングレートBによって生成されたサンプリングデータDBとの間の距離を、動的時間伸縮法を用いて計算した場合には、その元となる音声データDが同一であるにもかかわらず、所定の距離を表す値が算出される。
5, consider sampled data D A and D B obtained by sampling audio data D. In this case, when the distance between sampled data D A generated at a sampling rate A and sampled data D B generated at a sampling rate B is calculated using the dynamic time warping method, a value representing a predetermined distance is calculated even though the original audio data D is the same.
このため、例えば、処理部26は、参照用ユーザの音声データのサンプリングレートと同一のサンプリングレートによって抽出されたサンプリングデータを生成する。このサンプリングレートは、予め設定される。または、例えば、データの種類に応じて、サンプリングレートを変化させるようにしてもよい。例えば、所定周期分のデータから1周期のデータ当たり200点のサンプリング点が抽出される。
For this reason, for example, the processing unit 26 generates sampling data extracted at the same sampling rate as the sampling rate of the reference user's voice data. This sampling rate is set in advance. Alternatively, for example, the sampling rate may be changed depending on the type of data. For example, 200 sampling points per cycle of data are extracted from a predetermined cycle of data.
(時間軸方向へのデータの伸縮)
次に、処理部26は、音声データからサンプリングすることにより得られるサンプリングデータに対して、時間軸方向への伸縮処理を実行する。上述したように、本実施形態では、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。 (Expansion and contraction of data along the time axis)
Next, the processing unit 26 performs a time-axis expansion/contraction process on the sampling data obtained by sampling from the voice data. As described above, in this embodiment, when estimating whether or not a user has a disease, the voice data uttered by the target user whose disease is to be estimated may be compared with the voice data of a reference user whose disease is known to be present.
次に、処理部26は、音声データからサンプリングすることにより得られるサンプリングデータに対して、時間軸方向への伸縮処理を実行する。上述したように、本実施形態では、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。 (Expansion and contraction of data along the time axis)
Next, the processing unit 26 performs a time-axis expansion/contraction process on the sampling data obtained by sampling from the voice data. As described above, in this embodiment, when estimating whether or not a user has a disease, the voice data uttered by the target user whose disease is to be estimated may be compared with the voice data of a reference user whose disease is known to be present.
そのため、疾患等を推定する対象のユーザが発した音声データの時間軸方向の間隔と、参照用ユーザの音声データの時間軸方向の間隔とは揃えられている方が好ましい。このため、例えば、処理部26は、図2に示されるデータD3に対して時間軸方向への所定の伸縮処理を実行する。所定の伸縮処理の方法は、予め設定される。または、例えば、データの種類に応じて、伸縮処理の方法を変化させるようにしてもよい。
For this reason, it is preferable that the intervals in the time axis direction of the voice data uttered by the target user for which a disease or the like is to be estimated are aligned with the intervals in the time axis direction of the voice data of the reference user. For this reason, for example, the processing unit 26 executes a predetermined expansion/contraction process in the time axis direction on the data D3 shown in FIG. 2. The method of the predetermined expansion/contraction process is set in advance. Alternatively, for example, the method of expansion/contraction process may be changed depending on the type of data.
(振幅方向へのデータの伸縮)
次に、処理部26は、時間軸方向への伸縮処理が実行されたデータに対して、振幅方向への伸縮処理を実行する。上述したように、本実施形態では、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。 (Data expansion and contraction in the amplitude direction)
Next, the processing unit 26 performs expansion/contraction processing in the amplitude direction on the data that has been subjected to the expansion/contraction processing in the time axis direction. As described above, in this embodiment, when estimating whether or not a user has a disease, voice data uttered by a user whose disease is to be estimated may be compared with voice data of a reference user whose disease is known to be present.
次に、処理部26は、時間軸方向への伸縮処理が実行されたデータに対して、振幅方向への伸縮処理を実行する。上述したように、本実施形態では、ユーザが疾患等を有しているか否かを推定する際には、疾患等を推定する対象のユーザが発した音声データと、疾患等を有しているか否かが既知である参照用ユーザの音声データとが比較される場合がある。 (Data expansion and contraction in the amplitude direction)
Next, the processing unit 26 performs expansion/contraction processing in the amplitude direction on the data that has been subjected to the expansion/contraction processing in the time axis direction. As described above, in this embodiment, when estimating whether or not a user has a disease, voice data uttered by a user whose disease is to be estimated may be compared with voice data of a reference user whose disease is known to be present.
そのため、疾患等を推定する対象のユーザが発した音声データの振幅と、参照用ユーザの音声データの振幅とは揃えられている方が好ましい。このため、例えば、処理部26は、図2に示されるデータD4に対して振幅方向への所定の伸縮処理を実行する。所定の伸縮処理の方法は、予め設定される。または、例えば、データの種類に応じて、伸縮処理の方法を変化させるようにしてもよい。
For this reason, it is preferable that the amplitude of the voice data uttered by the target user for which a disease or the like is to be estimated is aligned with the amplitude of the voice data of the reference user. For this reason, for example, the processing unit 26 executes a predetermined stretching process in the amplitude direction on the data D4 shown in FIG. 2. The method of the predetermined stretching process is set in advance. Alternatively, for example, the method of stretching process may be changed depending on the type of data.
処理部26は、上述したような複数の前処理を音声データに対して実行することにより、前処理済み音声データを生成する。
The processing unit 26 generates preprocessed audio data by performing multiple preprocessing processes as described above on the audio data.
生成部28は、処理部26によって生成された前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成する。動的時間伸縮法を適用することにより得られる処理結果データは、ある時系列データの各点と別の時系列データの各点との間の距離を表す距離行列として計算される。
The generating unit 28 generates processing result data by applying dynamic time warping to the preprocessed audio data generated by the processing unit 26. The processing result data obtained by applying dynamic time warping is calculated as a distance matrix representing the distance between each point of one time series data and each point of another time series data.
具体的には、生成部28は、参照データ記憶部24に格納されている参照用ユーザの音声データを読み出す。そして、生成部28は、図2に示されるように、前処理済み音声データD5と、参照用ユーザの音声データDRefとに対して動的時間伸縮法を適用することにより、前処理済み音声データD5と参照用ユーザの音声データDRefとの間の距離を表す処理結果データを生成する。なお、参照用ユーザの音声データに対しても、上述したような前処理が施されていてもよい。
Specifically, the generation unit 28 reads out the voice data of the reference user stored in the reference data storage unit 24. Then, as shown in Fig. 2, the generation unit 28 applies a dynamic time warping method to the preprocessed voice data D5 and the voice data D Ref of the reference user to generate processing result data representing the distance between the preprocessed voice data D5 and the voice data D Ref of the reference user. Note that the voice data of the reference user may also be subjected to the preprocessing as described above.
なお、生成部28は、前処理済み音声データのみを用いて処理結果データを生成してもよい。例えば、前処理済み音声データ内の第1の時間区間におけるデータを表す第1音声データと、前処理済み音声データ内の第2の時間区間におけるデータを表す第2音声データとに対して動的時間伸縮法を適用することにより、第1音声データと第2音声データとの間の距離を表す処理結果データを生成するようにしてもよい。
The generating unit 28 may generate the processing result data using only the preprocessed audio data. For example, the generating unit 28 may apply a dynamic time warping method to first audio data representing data in a first time interval in the preprocessed audio data and second audio data representing data in a second time interval in the preprocessed audio data, thereby generating processing result data representing the distance between the first audio data and the second audio data.
より詳細には、例えば、生成部28は、図2に示されるように、前処理済み音声データD5内の第1の時間区間におけるデータを表す第1音声データD5-1と、前処理済み音声データD5内の第2の時間区間におけるデータを表す第2音声データD5-1とに対して動的時間伸縮法を適用することにより、第1音声データD5-1と第2音声データD5-2との間の距離を表す処理結果データを生成する。
More specifically, for example, as shown in FIG. 2, the generation unit 28 applies a dynamic time warping method to first audio data D5-1 representing data in a first time interval in the preprocessed audio data D5, and second audio data D5-1 representing data in a second time interval in the preprocessed audio data D5, thereby generating processing result data representing the distance between the first audio data D5-1 and the second audio data D5-2.
次に、生成部28は、図2に示されるように、前処理済み音声データD5内の第2の時間区間におけるデータを表す第2音声データD5-2と、前処理済み音声データD5内の第3の時間区間におけるデータを表す第3音声データD5-3とに対して動的時間伸縮法を適用することにより、第2音声データD5-2と第3音声データD5-3との間の距離を表す処理結果データを生成する。
Next, as shown in FIG. 2, the generation unit 28 applies a dynamic time warping method to second audio data D5-2 representing data in a second time interval in the preprocessed audio data D5, and third audio data D5-3 representing data in a third time interval in the preprocessed audio data D5, thereby generating processing result data representing the distance between the second audio data D5-2 and the third audio data D5-3.
さらに、生成部28は、第1音声データD5-1と第3音声データD5-3とに対して動的時間伸縮法を適用することにより、第1音声データD5-1と第3音声データD5-3との間の距離を表す処理結果データを生成する。このようにして、生成部28は、所定時間区間内の音声データD5-1~D5-9のペアの各々に対して処理結果データを生成する。
Furthermore, the generation unit 28 generates processing result data representing the distance between the first audio data D5-1 and the third audio data D5-3 by applying the dynamic time warping method to the first audio data D5-1 and the third audio data D5-3. In this way, the generation unit 28 generates processing result data for each pair of audio data D5-1 to D5-9 within a predetermined time period.
後述する算出部30は、このようにして生成された、前処理済み音声データD5のみを用いた処理結果データに基づいて、ユーザが疾患等を有している度合いを表すスコアを算出してもよい。
The calculation unit 30, which will be described later, may calculate a score representing the degree to which the user has a disease or the like based on the processing result data generated in this manner using only the preprocessed voice data D5.
算出部30は、生成部28により生成された処理結果データに基づいて、ユーザが疾患等を有している度合いを表すスコアを算出する。例えば、算出部30は、生成部28により生成された距離行列の各要素の平均値、最大値、最小値、標準偏差、及び中央値を利用して、既知の手法によりユーザが所定の疾患又は症状を有している度合いを表すスコアを算出する。
The calculation unit 30 calculates a score representing the degree to which the user has a disease or the like based on the processing result data generated by the generation unit 28. For example, the calculation unit 30 uses the average value, maximum value, minimum value, standard deviation, and median value of each element of the distance matrix generated by the generation unit 28 to calculate a score representing the degree to which the user has a specified disease or symptom using a known method.
推定部32は、算出部30により算出されたスコアに基づいて、ユーザが疾患等を有しているか否かを推定する。例えば、推定部32は、スコアが所定の閾値以上である場合には、ユーザが疾患等を有していると推定し、スコアが所定の閾値未満である場合には、ユーザが疾患等を有していないと推定する。
The estimation unit 32 estimates whether or not the user has a disease, etc., based on the score calculated by the calculation unit 30. For example, if the score is equal to or greater than a predetermined threshold, the estimation unit 32 estimates that the user has a disease, etc., and if the score is less than the predetermined threshold, the estimation unit 32 estimates that the user does not have a disease, etc.
出力部34は、推定部32により推定された推定結果を出力する。なお、出力部34は、スコアそのものを推定結果として出力してもよい。
The output unit 34 outputs the estimation result estimated by the estimation unit 32. Note that the output unit 34 may output the score itself as the estimation result.
表示装置16は、推定部32から出力された推定結果を表示する。
The display device 16 displays the estimation results output from the estimation unit 32.
情報処理装置14を操作する医療従事者又はユーザは、表示装置16から出力された推定結果を確認し、ユーザがどのような疾患又は症状を有している可能性があるのかを確認する。
The medical professional or user who operates the information processing device 14 checks the estimation results output from the display device 16 and confirms what disease or symptoms the user may have.
本実施形態の情報処理システム10は、例えば、図6に示されるような状況下においての利用が想定される。
The information processing system 10 of this embodiment is expected to be used, for example, under conditions such as those shown in FIG. 6.
図6の例では、医師等の医療従事者Hが、情報処理システム10の一例であるタブレット型端末を保持している。医療従事者Hは、タブレット型端末が備えるマイク(図示省略)を用いて、被験者であるユーザUの音声データを集音する。そして、タブレット端末は、ユーザUの音声データに基づいて、ユーザUが何れかの疾患又は症状を有しているか否かを推定し、推定結果を表示部(図示省略)へ出力する。医療従事者Hは、タブレット端末の表示部(図示省略)に表示された推定結果を参考にして、ユーザUが何れかの疾患又は症状を有しているか否かを判定する。
In the example of FIG. 6, a medical professional H, such as a doctor, holds a tablet terminal, which is an example of the information processing system 10. The medical professional H uses a microphone (not shown) provided on the tablet terminal to collect voice data from a user U, who is a subject. The tablet terminal then estimates whether or not the user U has any disease or symptom based on the voice data of the user U, and outputs the estimation result to a display unit (not shown). The medical professional H refers to the estimation result displayed on the display unit (not shown) of the tablet terminal to determine whether or not the user U has any disease or symptom.
情報処理装置14は、例えば、図7に示すコンピュータ50で実現することができる。コンピュータ50はCPU51、一時記憶領域としてのメモリ52、及び不揮発性の記憶部53を備える。また、コンピュータ50は、外部装置及び出力装置等が接続される入出力interface(I/F)54、及び記録媒体に対するデータの読み込み及び書き込みを制御するread/write(R/W)部55を備える。また、コンピュータ50は、インターネット等のネットワークに接続されるネットワークI/F56を備える。CPU51、メモリ52、記憶部53、入出力I/F54、R/W部55、及びネットワークI/F56は、バス57を介して互いに接続される。
The information processing device 14 can be realized, for example, by a computer 50 shown in FIG. 7. The computer 50 has a CPU 51, a memory 52 as a temporary storage area, and a non-volatile storage unit 53. The computer 50 also has an input/output interface (I/F) 54 to which external devices and output devices are connected, and a read/write (R/W) unit 55 that controls reading and writing of data to the recording medium. The computer 50 also has a network I/F 56 that is connected to a network such as the Internet. The CPU 51, memory 52, storage unit 53, input/output I/F 54, R/W unit 55, and network I/F 56 are connected to each other via a bus 57.
記憶部53は、Hard Disk Drive(HDD)、Solid State Drive(SSD)、フラッシュメモリ等によって実現できる。記憶媒体としての記憶部53には、コンピュータ50を機能させるためのプログラムが記憶されている。CPU51は、プログラムを記憶部53から読み出してメモリ52に展開し、プログラムが有するプロセスを順次実行する。
The storage unit 53 can be realized by a Hard Disk Drive (HDD), a Solid State Drive (SSD), a flash memory, etc. The storage unit 53 as a storage medium stores programs for causing the computer 50 to function. The CPU 51 reads the programs from the storage unit 53, expands them into the memory 52, and sequentially executes the processes contained in the programs.
[第1実施形態の情報処理システムの動作]
[Operation of the information processing system of the first embodiment]
次に、第1実施形態の情報処理システム10の具体的な動作について説明する。情報処理システム10の情報処理装置14は、図8に示される各処理を実行する。
Next, the specific operation of the information processing system 10 of the first embodiment will be described. The information processing device 14 of the information processing system 10 executes each process shown in FIG. 8.
まず、ステップS100において、取得部20は、マイク12により集音されたユーザの音声データを取得する。そして、取得部20は、音声データを音声データ記憶部22へ格納する。
First, in step S100, the acquisition unit 20 acquires the user's voice data collected by the microphone 12. Then, the acquisition unit 20 stores the voice data in the voice data storage unit 22.
次に、ステップS102において、処理部26は、音声データ記憶部22に記憶されている音声データを読み出す。そして、処理部26は、音声データから所定時間区間内のデータである中心部分の音声データを抽出する。
Next, in step S102, the processing unit 26 reads out the voice data stored in the voice data storage unit 22. Then, the processing unit 26 extracts the central part of the voice data, which is data within a predetermined time period, from the voice data.
ステップS104において、処理部26は、上記ステップS102で取得された中心部分の音声データから、所定周期分のデータを抽出する。
In step S104, the processing unit 26 extracts a predetermined period of data from the central portion of the audio data acquired in step S102.
ステップS105において、処理部26は、上記ステップS104で取得された所定周期分の音声データに対してシフト処理を実行する。
In step S105, the processing unit 26 performs a shift process on the audio data for the predetermined period acquired in step S104.
ステップS106において、処理部26は、上記ステップS105で得られたシフト処理済みの所定周期分のデータに対して、所定のサンプリング処理を実行することによりサンプリングデータを生成する。
In step S106, the processing unit 26 generates sampling data by performing a predetermined sampling process on the shifted data for a predetermined period obtained in step S105.
ステップS108において、処理部26は、上記ステップS106で生成されたサンプリングデータに対して振幅方向の伸縮処理を実行する。
In step S108, the processing unit 26 performs an amplitude stretching process on the sampling data generated in step S106.
ステップS110において、処理部26は、上記ステップS108で得られた、振幅方向への伸縮処理済みのサンプリングデータに対して、時間軸方向の伸縮処理を実行する。
In step S110, the processing unit 26 performs expansion/contraction processing in the time axis direction on the sampling data that has been expanded/contracted in the amplitude direction and obtained in step S108.
ステップS102~ステップS110の各処理が実行されることにより、音声データに対して前処理が実行された前処理済み音声データが生成される。
By executing each process from step S102 to step S110, preprocessed audio data is generated by performing preprocessing on the audio data.
ステップS112において、推定部32は、前処理済み音声データと、参照データ記憶部24に格納された参照用ユーザの音声データとに対して動的時間伸縮法を適用することにより、前処理済み音声データと参照用ユーザの音声データとの間の距離を表す処理結果データを生成する。
In step S112, the estimation unit 32 applies dynamic time warping to the preprocessed voice data and the reference user's voice data stored in the reference data storage unit 24, thereby generating processing result data representing the distance between the preprocessed voice data and the reference user's voice data.
なお、参照用ユーザとしては、所定の疾患等を有している参照用ユーザ及び所定の疾患を有していない参照用ユーザが設定される。
In addition, reference users who have a specified disease, etc. and reference users who do not have the specified disease are set as reference users.
このため、例えば、推定部32は、前処理済み音声データと、参照データ記憶部24に格納されている、疾患等を有している参照用ユーザの音声データとの間の処理結果データを生成する。または、例えば、推定部32は、前処理済み音声データと、参照データ記憶部24に格納されている、疾患等を有していない参照用ユーザの音声データとの間の処理結果データを生成する。
For this reason, for example, the estimation unit 32 generates processing result data between the preprocessed voice data and the voice data of a reference user who has a disease or the like, which is stored in the reference data storage unit 24. Alternatively, for example, the estimation unit 32 generates processing result data between the preprocessed voice data and the voice data of a reference user who does not have a disease or the like, which is stored in the reference data storage unit 24.
ステップS114において、算出部30は、上記ステップS112で生成された処理結果データに基づいて、ユーザが疾患等を有している度合いを表すスコアを算出する。なお、スコアは、例えば、ユーザが疾患等を有している度合いが高いほど大きな値をとる。または、スコアは、例えば、ユーザが疾患等を有している度合いが高いほど小さな値をとるようにしてもよい。
In step S114, the calculation unit 30 calculates a score representing the degree to which the user has a disease or the like, based on the processing result data generated in step S112 above. Note that the score may be, for example, a larger value the higher the degree to which the user has a disease or the like. Alternatively, the score may be, for example, a smaller value the higher the degree to which the user has a disease or the like.
例えば、スコアが、ユーザが疾患等を有している度合いが高いほど大きな値をとる場合を考える。この場合、算出部30は、前処理済み音声データと疾患等を有している参照用ユーザの音声データとの間の距離が小さい場合には、ユーザが疾患等を有している度合いが高くなるようにスコアを算出する。一方、例えば、算出部30は、前処理済み音声データと疾患等を有している参照用ユーザの音声データとの間の距離が大きい場合には、ユーザが疾患等を有している度合いが低くなるようにスコアを算出する。
For example, consider a case where the score takes a larger value the higher the degree to which the user has a disease, etc. In this case, when the distance between the preprocessed voice data and the voice data of a reference user who has a disease, etc. is small, the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is high. On the other hand, for example, when the distance between the preprocessed voice data and the voice data of a reference user who has a disease, etc. is large, the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is low.
また、例えば、算出部30は、前処理済み音声データと疾患等を有していない参照用ユーザの音声データとの間の距離が小さい場合には、ユーザが疾患等を有している度合いが低くなるようにスコアを算出する。一方、算出部30は、前処理済み音声データと疾患等を有していない参照用ユーザの音声データとの間の距離が大きい場合には、ユーザが疾患等を有している度合いが高くなるようにスコアを算出する。
Also, for example, when the distance between the preprocessed voice data and the voice data of a reference user who does not have a disease, etc. is small, the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is low. On the other hand, when the distance between the preprocessed voice data and the voice data of a reference user who does not have a disease, etc. is large, the calculation unit 30 calculates the score so that the degree to which the user has a disease, etc. is high.
ステップS116において、推定部32は、上記ステップS114で算出されたスコアに基づいて、ユーザが疾患等を有しているか否かを推定する。例えば、推定部32は、スコアが所定の閾値以上である場合には、ユーザが疾患等を有していると推定し、スコアが所定の閾値未満である場合には、ユーザが疾患等を有していないと推定する。
In step S116, the estimation unit 32 estimates whether or not the user has a disease, etc., based on the score calculated in step S114 above. For example, if the score is equal to or greater than a predetermined threshold, the estimation unit 32 estimates that the user has a disease, etc., and if the score is less than the predetermined threshold, the estimation unit 32 estimates that the user does not have a disease, etc.
また、推定部32は、疾患Aを有している参照用ユーザの音声データ、疾患Bを有している参照用ユーザの音声データ、及び疾患Cを有している参照用ユーザの音声データの各々についての処理結果データに基づいて、ユーザがどの疾患等を有しているのかを推定するようにしてもよい。
The estimation unit 32 may also estimate which disease, etc. the user has based on the processing result data for each of the voice data of the reference user having disease A, the voice data of the reference user having disease B, and the voice data of the reference user having disease C.
ステップS118において、出力部34は、上記ステップS116で推定された推定結果を出力する。
In step S118, the output unit 34 outputs the estimation result estimated in step S116.
表示装置16は、出力部34から出力された推定結果を表示する。情報処理装置14を操作する医療従事者又はユーザは、表示装置16から出力された推定結果を確認し、ユーザがどのような疾患又は症状を有している可能性があるのかを確認する。
The display device 16 displays the inference results output from the output unit 34. The medical professional or user operating the information processing device 14 checks the inference results output from the display device 16 and confirms what disease or symptoms the user is likely to have.
以上説明したように、第1実施形態の情報処理システム10は、ユーザが発した音声の時系列データである音声データを取得し、前処理済みデータを生成する。そして、情報処理装置14は、生成された前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成し、生成された処理結果データに基づいて、ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出する。そして、情報処理装置14は、算出されたスコアに基づいて、ユーザが所定の疾患又は症状を有しているか否かを推定する。これにより、ユーザが発した音声の時系列データである音声データに対して動的時間伸縮法を適用することにより、ユーザが所定の疾患又は所定の症状を有しているか否かを精度良く推定することができる。
As described above, the information processing system 10 of the first embodiment acquires voice data, which is time-series data of voice uttered by the user, and generates preprocessed data. The information processing device 14 then generates processing result data by applying dynamic time warping to the generated preprocessed voice data, and calculates a score representing the degree to which the user has a specified disease or symptom based on the generated processing result data. The information processing device 14 then estimates whether or not the user has a specified disease or symptom based on the calculated score. In this way, by applying dynamic time warping to the voice data, which is time-series data of voice uttered by the user, it is possible to accurately estimate whether or not the user has a specified disease or specified symptom.
なお、前処理済み音声データは、取得された音声データのうちの、音声データの開始点から第1時間以後のデータであって、かつ音声データの終了点よりも第2時間以前のデータを表す中心部分のデータである。音声データのうちの中心部分を前処理済み音声データとして利用することにより、ユーザが発した音声のうち安定した中心部分を利用して、ユーザが所定の疾患又は所定の症状を有しているか否かを精度良く推定することができる。
The preprocessed voice data is the central portion of the acquired voice data that is a first hour or later from the start point of the voice data and that represents data that is a second hour or earlier than the end point of the voice data. By using the central portion of the voice data as the preprocessed voice data, it is possible to use the stable central portion of the voice uttered by the user to accurately estimate whether the user has a specified disease or a specified symptom.
また、前処理済み音声データは、所定周期分のデータでもある。また、前処理済み音声データは、データを時間軸方向へシフトさせることにより得られるデータでもある。また、前処理済み音声データは、所定のサンプリング処理を実行することにより得られるデータでもある。また、前処理済み音声データは、時間軸方向において伸縮させる処理が実行されることにより得られるデータでもある。また、前処理済み音声データは、振幅方向において伸縮させる処理が実行されることにより得られるデータでもある。これらの前処理を音声データに対して実行することにより、音声データを疾患等の推定に適した形式することが可能となり、ユーザが疾患等を有しているか否かを精度良く推定することができる。
The preprocessed voice data is also data for a predetermined period. The preprocessed voice data is also data obtained by shifting data in the time axis direction. The preprocessed voice data is also data obtained by performing a predetermined sampling process. The preprocessed voice data is also data obtained by performing a process to expand and contract the voice data in the time axis direction. The preprocessed voice data is also data obtained by performing a process to expand and contract the voice data in the amplitude direction. By performing these preprocessing processes on the voice data, it is possible to convert the voice data into a format suitable for estimating illness, etc., and it is possible to accurately estimate whether or not the user has an illness, etc.
<第2実施形態の情報処理システム>
<Second embodiment of information processing system>
次に、第2実施形態について説明する。なお、第2実施形態の情報処理システムの構成のうちの、第1実施形態と同様の構成となる部分については、同一符号を付して説明を省略する。
Next, the second embodiment will be described. Note that, among the configurations of the information processing system of the second embodiment, the parts that are similar to those of the first embodiment will be given the same reference numerals and the description will be omitted.
図9に、第2実施形態の情報処理システム310を示す。図9に示されるように、情報処理システム310は、ユーザ端末18と、情報処理装置314とを備えている。情報処理装置314は、通信部36を更に備えている。
FIG. 9 shows an information processing system 310 according to the second embodiment. As shown in FIG. 9, the information processing system 310 includes a user terminal 18 and an information processing device 314. The information processing device 314 further includes a communication unit 36.
情報処理システム310の情報処理装置314は、ユーザ端末18に備えられたマイク12により集音されたユーザの音声に基づいて、ユーザが疾患等を有しているか否かを推定する。
The information processing device 314 of the information processing system 310 estimates whether the user has a disease or the like based on the user's voice collected by the microphone 12 provided on the user terminal 18.
第2実施形態の情報処理システム310は、例えば、図10及び図11に示されるような状況下においての利用が想定される。
The information processing system 310 of the second embodiment is expected to be used, for example, under the conditions shown in Figures 10 and 11.
図10の例では、医師等の医療従事者Hが情報処理装置314を操作しており、被験者であるユーザUはユーザ端末18を操作している。ユーザUは、自らが操作するユーザ端末18のマイク12により自らの音声データ「XXXX」を集音する。そして、ユーザ端末18は、インターネット等のネットワーク19を介して音声データを情報処理装置314へ送信する。
In the example of FIG. 10, a medical professional H such as a doctor operates an information processing device 314, and a user U, who is a subject, operates a user terminal 18. The user U collects his/her own voice data "XXXX" using the microphone 12 of the user terminal 18 that he/she operates. The user terminal 18 then transmits the voice data to the information processing device 314 via a network 19 such as the Internet.
情報処理装置314は、ユーザ端末18から送信されたユーザUの音声データ「XXX」を受信する。そして、情報処理装置314は、受信した音声データに基づいて、ユーザUが何れかの疾患又は症状を有しているか否かを推定し、推定結果を情報処理装置314の表示部315へ出力する。医療従事者Hは、情報処理装置314の表示部315に表示された推定結果を参考にして、ユーザUが何れかの疾患又は症状を有しているか否かを判定する。
The information processing device 314 receives the voice data "XXX" of the user U transmitted from the user terminal 18. The information processing device 314 then estimates whether or not the user U has any disease or symptom based on the received voice data, and outputs the estimation result to the display unit 315 of the information processing device 314. The medical worker H refers to the estimation result displayed on the display unit 315 of the information processing device 314 and determines whether or not the user U has any disease or symptom.
一方、図11の例では、被験者であるユーザUは、自らが操作するユーザ端末18のマイク12により自らの音声データを集音する。そして、ユーザ端末18は、インターネット等のネットワーク19を介して音声データを情報処理装置314へ送信する。情報処理装置314は、ユーザ端末18から送信されたユーザUの音声データを受信する。そして、情報処理装置314は、受信した音声データに基づいて、ユーザUが何れかの疾患又は症状を有しているか否かを推定し、推定結果をユーザ端末18へ送信する。ユーザ端末18は、情報処理装置14から送信された推定結果を受信し、その推定結果を表示部(図示省略)へ表示する。ユーザは、推定結果を確認し、自らがどのような疾患又は症状を有している可能性が高いのかを確認する。
On the other hand, in the example of FIG. 11, the subject, user U, collects his/her own voice data using the microphone 12 of the user terminal 18 that he/she operates. The user terminal 18 then transmits the voice data to the information processing device 314 via a network 19 such as the Internet. The information processing device 314 receives the user U's voice data transmitted from the user terminal 18. The information processing device 314 then estimates whether or not the user U has any disease or symptom based on the received voice data, and transmits the estimation result to the user terminal 18. The user terminal 18 receives the estimation result transmitted from the information processing device 14, and displays the estimation result on a display unit (not shown). The user checks the estimation result and confirms what disease or symptom the user is likely to have.
なお、情報処理装置314は、上記図8と同様の情報処理ルーチンを実行する。
The information processing device 314 executes an information processing routine similar to that shown in FIG. 8 above.
以上説明したように、第2実施形態の情報処理システムは、クラウド上に設置された情報処理装置214を用いてユーザが精神系疾患、神経系疾患又はそれらの症状を有しているか否かを推定することができる。
As described above, the information processing system of the second embodiment can estimate whether a user has a psychiatric disorder, a neurological disorder, or symptoms thereof, using an information processing device 214 installed on the cloud.
次に、実施例を説明する。本実施例では、本実施形態において説明した前処理の効果に関する実験結果を示す。
Next, an example will be described. In this example, experimental results regarding the effect of the pretreatment described in this embodiment will be shown.
図12は、うつ病患者として評価された被験者から得られた音声データ(図12では四角印により表記)と、健常者であると評価された被験者から得られた音声データ(図12では丸印により表記)とをプロットした場合のグラフである。図12は、本実施形態の前処理とDTWとを用いて得られたデータである。図12のグラフの横軸dist2は健常者平均リファレンスからの距離を表し、縦軸dist3はうつ患者平均リファレンスからの距離を表す。図12に示されているように、うつ患者の音声データである四角印のデータは、健常者平均リファレンスからの距離が長く、うつ患者平均リファレンスからの距離が短い傾向にある。また、健常者の音声データである丸印のデータは、健常者平均リファレンスからの距離が短く、うつ患者平均リファレンスからの距離が長い傾向にある。さらに、図12には、ROC曲線とAUCの値とが示されている。図12に示されているように、健常者平均リファレンスからの距離dist2とうつ患者平均リファレンスからの距離dist3とを組み合わせて、うつ病を判定した場合のAUCの値は1.0となっている。
12 is a graph plotting speech data obtained from subjects evaluated as depressed patients (indicated by square marks in FIG. 12) and speech data obtained from subjects evaluated as healthy subjects (indicated by circles in FIG. 12). FIG. 12 shows data obtained using the preprocessing and DTW of this embodiment. The horizontal axis dist2 of the graph in FIG. 12 represents the distance from the average reference for healthy subjects, and the vertical axis dist3 represents the distance from the average reference for depressed patients. As shown in FIG. 12, the data represented by square marks, which is the speech data of depressed patients, tends to have a long distance from the average reference for healthy subjects and a short distance from the average reference for depressed patients. Furthermore, the data represented by circles, which is the speech data of healthy subjects, tends to have a short distance from the average reference for healthy subjects and a long distance from the average reference for depressed patients. Furthermore, FIG. 12 shows the ROC curve and the AUC value. As shown in FIG. 12, the AUC value is 1.0 when depression is judged by combining the distance dist2 from the average reference for healthy subjects and the distance dist3 from the average reference for depressed patients.
また、図13は、図12に示されているデータとその他の実験結果のデータを表にしたデータである。なお、図13におけるHAMDとはうつ病評価に関するスコアを表す。HAMD≧7とは、スコア7以上のもののみを評価対象としたことを表している。また、MDDはうつ患者を表し、PDはパーキンソン病に罹患している患者を表し、ADはアルツハイマー病に罹患している患者を表し、HEは健常者を表す。MDD=20とは20個のうつ患者のデータを用いたことを表し、HE=14とは14個の健常者のデータを用いたことを表す。Intra-Person DTWとは、1人の被験者の音声データにおけるある区間と別の区間とでペアを生成して特徴量を生成してDTWを実施した場合を示している。図13では、本実施形態の前処理である周期調整をしない場合の性能が示されているが、その性能値はAUC=0.7893となっており、前処理を実施した場合よりも低い性能値となっている。なお、図13の最上段の結果に示されているように、HEとMDDを判別した場合のAUCは0.9643、最下段の結果に示されているように、HEとPDとを判別した場合のAUCの値は0.9173となっている。
FIG. 13 is a table of the data shown in FIG. 12 and other experimental results. HAMD in FIG. 13 represents the score for depression evaluation. HAMD≧7 represents that only those with a score of 7 or more were evaluated. MDD represents patients with depression, PD represents patients with Parkinson's disease, AD represents patients with Alzheimer's disease, and HE represents healthy individuals. MDD=20 represents the use of data from 20 patients with depression, and HE=14 represents the use of data from 14 healthy individuals. Intra-Person DTW represents the case where a pair is generated between a certain section and another section in the speech data of one subject, features are generated, and DTW is performed. FIG. 13 shows the performance when the period adjustment, which is the preprocessing of this embodiment, is not performed, and the performance value is AUC=0.7893, which is a lower performance value than when preprocessing is performed. As shown in the results at the top of Figure 13, the AUC value when distinguishing between HE and MDD is 0.9643, and as shown in the results at the bottom, the AUC value when distinguishing between HE and PD is 0.9173.
図14は、本実施形態で用いた各種の前処理の効果を示す実験結果である。図14に示されている最上段の結果が基準となる結果である。2段目以降の結果は、データに対して各前処理を施したことによる効果を示すものであり、各前処理はうつ病の判定性能に寄与していることがわかる。なお、図14の表の最下段のデータでは、性能評価(AUC)の値が大小逆転しているが、これについて以下説明する。
FIG. 14 shows the experimental results indicating the effects of the various pre-processing methods used in this embodiment. The results in the top row of FIG. 14 are the baseline results. The results from the second row onwards indicate the effect of applying each pre-processing method to the data, and it can be seen that each pre-processing method contributes to the depression assessment performance. Note that in the data in the bottom row of the table in FIG. 14, the performance evaluation (AUC) values are reversed, and this is explained below.
図15は、振幅調整がなされなかった場合(図15では「振幅正規化なし」と表記)と、振幅調整がなされた場合(図15では「振幅正規化あり」と表記)とにおける、DTWによって計算される距離の違いを表す図を示す。図15に示されているグラフの縦軸は、DTWによって計算される距離である。
Figure 15 shows the difference in distance calculated by DTW when no amplitude adjustment is performed (labeled "without amplitude normalization" in Figure 15) and when amplitude adjustment is performed (labeled "with amplitude normalization" in Figure 15). The vertical axis of the graph shown in Figure 15 is the distance calculated by DTW.
図15では、病院Aにおいて複数の健常者から得られたデータを表すHE_HospitalAと、複数のうつ病患者MDDと、病院Bにおいて複数の健常者から得られたデータを表すHE_HospitalBとについての、DTWによって計算される距離の値が示されている。
Figure 15 shows the distance values calculated by DTW for HE_HospitalA, which represents data obtained from multiple healthy subjects at Hospital A, multiple depressed patients MDD, and HE_HospitalB, which represents data obtained from multiple healthy subjects at Hospital B.
図15における「振幅正規化なし」では、病院Aにおいて複数の健常者から得られたデータを表すHE_HospitalAと、病院Bにおいて複数の健常者から得られたデータを表すHE_HospitalBとの間の差異が大きいことがわかる。これは、音声データの録音環境及び録音設定等の違いによるものであると考えられる。録音された音声データの音圧が低い場合にはDTWによって計算される距離の値は小さくなる一方で、音声データの音圧が高い場合にはDTWによって計算される距離の値が大きくなるように算出される。その結果、図15における「振幅正規化なし」に示されているように、異なる環境で録音されたHE_HospitalAのデータ分布とHE_HospitalBのデータ分布との間に差異が生じている(t検定による平均値の差に有意差あり:p<0.01)。
In "without amplitude normalization" in Figure 15, it can be seen that there is a large difference between HE_HospitalA, which represents data obtained from multiple healthy subjects at Hospital A, and HE_HospitalB, which represents data obtained from multiple healthy subjects at Hospital B. This is thought to be due to differences in the recording environment and recording settings of the voice data. When the sound pressure of the recorded voice data is low, the distance value calculated by DTW is small, while when the sound pressure of the voice data is high, the distance value calculated by DTW is calculated to be large. As a result, as shown in "without amplitude normalization" in Figure 15, there is a difference between the data distribution of HE_HospitalA and HE_HospitalB, which were recorded in different environments (there is a significant difference in the average value by t-test: p<0.01).
これに対して、図15における「振幅正規化あり」では、HE_HospitalAにおける録音条件とHE_HospitalBの録音条件との間の差異が校正され、異なる環境で録音されたHE_HospitalAのデータ分布とHE_HospitalBのデータ分布との間には差が見られない(t検定による平均値の差に有意差なし:p>0.1)。このように、前処理に振幅正規化を取り入れることによって、録音条件の違いの影響を受けることなく、ユーザが精神系疾患、神経系疾患又はそれらの症状を有しているか否かを正しく分類することができる。
In contrast, in the case of "with amplitude normalization" in Figure 15, the difference between the recording conditions in HE_HospitalA and HE_HospitalB is corrected, and no difference is observed between the data distributions of HE_HospitalA and HE_HospitalB, which were recorded in different environments (no significant difference in the average values by t-test: p>0.1). In this way, by incorporating amplitude normalization into the preprocessing, it is possible to correctly classify whether a user has a psychiatric disorder, a neurological disorder, or symptoms thereof, without being affected by differences in recording conditions.
なお、アルツハイマー病及びパーキンソン病の判定に関しても、同様に疾患等を推定することができる。図16には、本実施形態のIntra-Person DTWを用いて、健常者(HE)と、大うつ病(MDD)、アルツハイマー病(AD)、及びパーキンソン病(PD)に罹患している患者(Sick)との判別した際の各種条件と、その判別結果とが示されている。また、図17に、上記図16に示された性能評価AUCに対応するROC曲線を示す。図18に、上記図16に示された条件にて計算されたDTWの値を示す。また、図19に、実際の症状(図19では「Actual」と表記)と、本実施形態の手法による予測結果(図19では「Prediction」と表記)とを示す。また、図20に、多重比較検定結果を示す。
Note that diseases such as Alzheimer's disease and Parkinson's disease can also be predicted in a similar manner. Figure 16 shows various conditions and the results of discrimination between healthy individuals (HE) and patients (Sick) suffering from major depressive disorder (MDD), Alzheimer's disease (AD), and Parkinson's disease (PD) using the Intra-Person DTW of this embodiment. Figure 17 shows the ROC curve corresponding to the performance evaluation AUC shown in Figure 16 above. Figure 18 shows the DTW value calculated under the conditions shown in Figure 16 above. Figure 19 shows the actual symptoms (labeled "Actual" in Figure 19) and the prediction results using the method of this embodiment (labeled "Prediction" in Figure 19). Figure 20 shows the results of a multiple comparison test.
図16に示されているように、健常者(HE)と何らかの病に罹患している患者(Sick)とを判別した場合のAUCの値は0.8486となっている。また、図20に示されているように、DTWの値の平均値の多重比較検定において、健常者(HE)と各疾患(MDD、AD、PD)の分布が異なる(平均値に有意差あり:p<0.01)ことが示されている。なお、表中の「E」は「×10」を表し、その隣の数字は指数を表す。
As shown in Figure 16, the AUC value when distinguishing between healthy individuals (HE) and patients suffering from some disease (Sick) is 0.8486. Also, as shown in Figure 20, a multiple comparison test of the average DTW values shows that the distributions of healthy individuals (HE) and those with each disease (MDD, AD, PD) are different (significant difference in the average: p<0.01). Note that "E" in the table stands for "x10" and the number next to it stands for the exponent.
図16~図20に示されている結果からも、本実施形態の手法によれば、ユーザが精神系疾患、神経系疾患又はそれらの症状を有しているか否かを精度良く推定することができる、といえる。
The results shown in Figures 16 to 20 also show that the method of this embodiment can accurately estimate whether a user has a psychiatric disorder, a neurological disorder, or symptoms thereof.
なお、本開示の技術は、上述した実施形態に限定されるものではなく、この発明の要旨を逸脱しない範囲内で様々な変形や応用が可能である。
The technology disclosed herein is not limited to the above-described embodiments, and various modifications and applications are possible without departing from the spirit and scope of the invention.
例えば、本願明細書中において、プログラムが予めインストールされている実施形態として説明したが、当該プログラムを、コンピュータ読み取り可能な記録媒体に格納して提供することも可能である。
For example, although the present specification has described an embodiment in which the program is pre-installed, the program can also be provided by storing it on a computer-readable recording medium.
なお、上記実施形態でCPUがソフトウェア(プログラム)を読み込んで実行した処理を、CPU以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、FPGA(Field-Programmable Gate Array)等の製造後に回路構成を変更可能なPLD(Programmable Logic Device)、及びASIC(Application Specific Integrated Circuit)等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。または、プロセッサとしては、GPGPU(General-purpose graphics processing unit)を用いてもよい。また、各処理を、これらの各種のプロセッサのうちの1つで実行してもよいし、同種又は異種の2つ以上のプロセッサの組み合わせ(例えば、複数のFPGA、及びCPUとFPGAとの組み合わせ等)で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より具体的には、半導体素子等の回路素子を組み合わせた電気回路である。
In the above embodiment, the processing that the CPU reads and executes the software (program) may be executed by various processors other than the CPU. Examples of the processor in this case include a PLD (Programmable Logic Device) such as an FPGA (Field-Programmable Gate Array) whose circuit configuration can be changed after manufacture, and a dedicated electrical circuit such as an ASIC (Application Specific Integrated Circuit) that is a processor having a circuit configuration designed specifically to execute a specific process. Alternatively, a GPGPU (General-purpose graphics processing unit) may be used as the processor. Each process may be executed by one of these various processors, or by a combination of two or more processors of the same or different types (e.g., multiple FPGAs, a combination of a CPU and an FPGA, etc.). More specifically, the hardware structure of these various processors is an electric circuit that combines circuit elements such as semiconductor elements.
また、上記各実施形態では、プログラムがストレージに予め記憶(インストール)されている態様を説明したが、これに限定されない。プログラムは、CD-ROM(Compact Disk Read Only Memory)、DVD-ROM(Digital Versatile Disk Read Only Memory)、及びUSB(Universal Serial Bus)メモリ等の非一時的(non-transitory)記憶媒体に記憶された形態で提供されてもよい。また、プログラムは、ネットワークを介して外部装置からダウンロードされる形態としてもよい。
In addition, in each of the above embodiments, the program is described as being pre-stored (installed) in storage, but this is not limiting. The program may be provided in a form stored in a non-transient storage medium such as a CD-ROM (Compact Disk Read Only Memory), a DVD-ROM (Digital Versatile Disk Read Only Memory), or a USB (Universal Serial Bus) memory. The program may also be downloaded from an external device via a network.
また、本実施形態の各処理を、汎用演算処理装置及び記憶装置等を備えたコンピュータ又はサーバ等により構成して、各処理がプログラムによって実行されるものとしてもよい。このプログラムは記憶装置に記憶されており、磁気ディスク、光ディスク、半導体メモリ等の記録媒体に記録することも、ネットワークを通して提供することも可能である。もちろん、その他いかなる構成要素についても、単一のコンピュータやサーバによって実現しなければならないものではなく、ネットワークによって接続された複数のコンピュータに分散して実現してもよい。
Furthermore, each process of this embodiment may be implemented by a computer or server equipped with a general-purpose processor and storage device, and each process may be executed by a program. This program is stored in a storage device, and can be recorded on a recording medium such as a magnetic disk, optical disk, or semiconductor memory, or can be provided via a network. Of course, any other components do not have to be implemented by a single computer or server, and may be distributed across multiple computers connected by a network.
また、上記各実施形態では、所定の疾患又は所定の症状の一例として、精神系疾患若しくは神経系疾患、又は、精神障害症状若しくは認知機能障害症状を有しているか否かを推定する場合を例に説明したが、これに限定されるものではない。所定の疾患又は所定の症状はどのようなものであってもよい。音声データには、様々な疾患又は症状が反映されるものと想定される。例えば、呼吸器系の疾患及び症状はもちろんのこと、精神系疾患等も音声データへその影響が表れる。そのため、上記各実施形態では、所定の疾患又は所定の症状の一例として、精神系疾患若しくは神経系疾患、又は、精神障害症状若しくは認知機能障害症状を有しているか否かを推定する場合を例に説明したが、これに限定されず、音声データへその疾患等の影響が表れるものであれば、どのような疾患等を推定してもよい。
In addition, in the above embodiments, the case where a psychiatric disease or a nervous system disease, or a psychiatric disorder or a cognitive impairment symptom is estimated as an example of a predetermined disease or a predetermined symptom is described as an example, but the present invention is not limited to this. The predetermined disease or the predetermined symptom may be any. It is assumed that various diseases or symptoms are reflected in the voice data. For example, not only respiratory diseases and symptoms, but also psychiatric diseases and the like are affected by the voice data. Therefore, in the above embodiments, the case where a psychiatric disease or a nervous system disease, or a psychiatric disorder or a cognitive impairment symptom is estimated as an example of a predetermined disease or a predetermined symptom is described as an example, but the present invention is not limited to this, and any disease or the like may be estimated as long as the effect of the disease or the like is affected by the voice data.
また、上記各実施形態では、前処理済み音声データを生成する際には、複数の前処理の全てを実行する場合を例に説明したが、これに限定されるものではない。上述したような前処理のうちの少なくとも1つ以上を用いて、前処理済み音声データを生成するようにしてもよい。
In addition, in each of the above embodiments, when generating preprocessed audio data, all of the multiple preprocessing processes are executed, but this is not limited to the above. Preprocessed audio data may be generated using at least one of the preprocessing processes described above.
本明細書に記載された全ての文献、特許出願、および技術規格は、個々の文献、特許出願、および技術規格が参照により取り込まれることが具体的かつ個々に記された場合と同程度に、本明細書中に参照により取り込まれる。
All publications, patent applications, and technical standards described in this specification are incorporated by reference into this specification to the same extent as if each individual publication, patent application, and technical standard was specifically and individually indicated to be incorporated by reference.
Claims (11)
- ユーザが発した音声の時系列データである音声データを取得する取得部と、
前記取得部により取得された前記音声データのうちの、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータを表す前処理済み音声データを生成する処理部と、
前記処理部によって生成された前記前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成する生成部と、
前記生成部により生成された前記処理結果データに基づいて、前記ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出する算出部と、
前記算出部により算出された前記スコアに基づいて、前記ユーザが所定の疾患又は症状を有しているか否かを推定する推定部と、
を含む情報処理装置。 An acquisition unit that acquires voice data, which is time-series data of a voice uttered by a user;
a processing unit that generates preprocessed voice data representing data of the voice data acquired by the acquisition unit, the data being data from a first time after a start point of the voice data and being data from a second time before an end point of the voice data;
a generation unit that generates processing result data by applying dynamic time warping to the pre-processed audio data generated by the processing unit;
a calculation unit that calculates a score representing a degree to which the user has a predetermined disease or symptom based on the processing result data generated by the generation unit;
an estimation unit that estimates whether or not the user has a predetermined disease or symptom based on the score calculated by the calculation unit;
An information processing device comprising: - 前記処理部は、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータのうちの、所定周期分のデータを前記前処理済み音声データとして生成する、
請求項1に記載の情報処理装置。 the processing unit generates, as the preprocessed audio data, data for a predetermined period among data that is a first time or later from a start point of the audio data and is a second time or earlier than an end point of the audio data.
The information processing device according to claim 1 . - 前記処理部は、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータに対して所定のサンプリング処理を実行することにより得られるデータを、前記前処理済み音声データとして生成する、
請求項1又は請求項2に記載の情報処理装置。 the processing unit generates, as the preprocessed audio data, data obtained by performing a predetermined sampling process on data that is a first time or later from a start point of the audio data and is a second time or earlier than an end point of the audio data;
3. The information processing device according to claim 1 or 2. - 前記処理部は、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータに対し、時間軸方向において伸縮させる処理を実行することにより、前記前処理済み音声データを生成する、
請求項1~請求項3の何れか1項に記載の情報処理装置。 the processing unit performs a process of expanding or contracting data, which is a first time from the start point of the audio data and later and is a second time before the end point of the audio data, in a time axis direction, to generate the preprocessed audio data.
4. The information processing device according to claim 1. - 前記処理部は、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータに対し、振幅方向において伸縮させる処理を実行することにより、前記前処理済み音声データを生成する、
請求項1~請求項4の何れか1項に記載の情報処理装置。 the processing unit performs a process of expanding or contracting data in an amplitude direction, the data being a first time or later from a start point of the audio data and being a second time or earlier than an end point of the audio data, thereby generating the preprocessed audio data.
The information processing device according to any one of claims 1 to 4. - 前記処理部は、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータに対し、前記データを時間軸方向へシフトさせることにより前記前処理済み音声データを生成する、
請求項1~請求項5の何れか1項に記載の情報処理装置。 the processing unit generates the preprocessed audio data by shifting data, which is a first time or later from a start point of the audio data and is a second time or earlier than an end point of the audio data, in a time axis direction.
The information processing device according to any one of claims 1 to 5. - 前記生成部は、前記前処理済み音声データと、前記所定の疾患又は症状を有しているか否かが既知である参照用ユーザの前記音声データとに対して前記動的時間伸縮法を適用することにより、前記前処理済み音声データと前記参照用ユーザの前記音声データとの間の距離を表す前記処理結果データを生成する、
請求項1~請求項6の何れか1項に記載の情報処理装置。 the generation unit applies the dynamic time warping method to the preprocessed voice data and the voice data of a reference user who is known to have the predetermined disease or symptom, thereby generating the processing result data representing a distance between the preprocessed voice data and the voice data of the reference user.
The information processing device according to any one of claims 1 to 6. - 前記生成部は、前記前処理済み音声データ内の第1の時間区間におけるデータを表す第1音声データと、前記前処理済み音声データ内の第2の時間区間におけるデータを表す第2音声データとに対して前記動的時間伸縮法を適用することにより、前記第1音声データと前記第2音声データとの間の距離を表す前記処理結果データを生成する、
請求項1~請求項6の何れか1項に記載の情報処理装置。 the generation unit applies the dynamic time warping method to first audio data representing data in a first time interval in the preprocessed audio data and second audio data representing data in a second time interval in the preprocessed audio data, thereby generating the processing result data representing a distance between the first audio data and the second audio data.
The information processing device according to any one of claims 1 to 6. - マイクを備えるユーザ端末と、請求項1~請求項8の何れか1項に記載の情報処理装置とを含む情報処理システムであって、
前記ユーザ端末は、前記マイクにより取得された前記音声データを前記情報処理装置へ送信し、
前記情報処理装置の前記取得部は、前記ユーザ端末から送信された前記音声データを取得し、
前記情報処理装置の通信部は、前記推定部により推定された推定結果をユーザ端末へ送信し、
前記ユーザ端末は、前記情報処理装置から送信された前記推定結果を受信する、
情報処理システム。 An information processing system including a user terminal equipped with a microphone and the information processing device according to any one of claims 1 to 8,
the user terminal transmits the voice data acquired by the microphone to the information processing device;
The acquisition unit of the information processing device acquires the voice data transmitted from the user terminal,
a communication unit of the information processing device that transmits an estimation result estimated by the estimation unit to a user terminal;
The user terminal receives the estimation result transmitted from the information processing device.
Information processing system. - ユーザが発した音声の時系列データである音声データを取得し、
取得された前記音声データのうちの、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータを表す前処理済み音声データを生成し、
生成された前記前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成し、
生成された前記処理結果データに基づいて、前記ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出し、
算出された前記スコアに基づいて、前記ユーザが所定の疾患又は症状を有しているか否かを推定する、
処理をコンピュータに実行させる情報処理方法。 Acquire voice data, which is time-series data of the voice uttered by the user,
generating pre-processed voice data representing data from the acquired voice data that is a first time or later from a start point of the voice data and a second time or earlier than an end point of the voice data;
applying dynamic time warping to the generated pre-processed audio data to generate processed result data;
Calculating a score representing the degree to which the user has a predetermined disease or symptom based on the generated processing result data;
Based on the calculated score, it is estimated whether the user has a predetermined disease or symptom.
An information processing method for causing a computer to execute processing. - ユーザが発した音声の時系列データである音声データを取得し、
取得された前記音声データのうちの、前記音声データの開始点から第1時間以後のデータであって、かつ前記音声データの終了点よりも第2時間以前のデータを表す前処理済み音声データを生成し、
生成された前記前処理済み音声データに対して動的時間伸縮法(Dynamic Time Warping)を適用することにより、処理結果データを生成し、
生成された前記処理結果データに基づいて、前記ユーザが所定の疾患又は症状を有している度合いを表すスコアを算出し、
算出された前記スコアに基づいて、前記ユーザが所定の疾患又は症状を有しているか否かを推定する、
処理をコンピュータに実行させるための情報処理プログラム。 Acquire voice data, which is time-series data of the voice uttered by the user,
generating pre-processed voice data representing data from the acquired voice data that is a first time or later from the start point of the voice data and a second time or earlier than the end point of the voice data;
applying dynamic time warping to the generated pre-processed audio data to generate processed result data;
Calculating a score representing the degree to which the user has a predetermined disease or symptom based on the generated processing result data;
Based on the calculated score, it is estimated whether the user has a predetermined disease or symptom.
An information processing program for causing a computer to execute processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/043832 WO2024116254A1 (en) | 2022-11-28 | 2022-11-28 | Information processing device, information processing method, information processing system, and information processing program |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2022/043832 WO2024116254A1 (en) | 2022-11-28 | 2022-11-28 | Information processing device, information processing method, information processing system, and information processing program |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024116254A1 true WO2024116254A1 (en) | 2024-06-06 |
Family
ID=91323408
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2022/043832 WO2024116254A1 (en) | 2022-11-28 | 2022-11-28 | Information processing device, information processing method, information processing system, and information processing program |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024116254A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019188405A1 (en) * | 2018-03-29 | 2019-10-03 | パナソニックIpマネジメント株式会社 | Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program |
WO2020013296A1 (en) * | 2018-07-13 | 2020-01-16 | Pst株式会社 | Apparatus for estimating mental/neurological disease |
JP2021113965A (en) * | 2020-01-16 | 2021-08-05 | 國立中正大學 | Device and method for generating synchronous voice |
-
2022
- 2022-11-28 WO PCT/JP2022/043832 patent/WO2024116254A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019188405A1 (en) * | 2018-03-29 | 2019-10-03 | パナソニックIpマネジメント株式会社 | Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program |
WO2020013296A1 (en) * | 2018-07-13 | 2020-01-16 | Pst株式会社 | Apparatus for estimating mental/neurological disease |
JP2021113965A (en) * | 2020-01-16 | 2021-08-05 | 國立中正大學 | Device and method for generating synchronous voice |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6337362B1 (en) | Cognitive function evaluation apparatus and cognitive function evaluation system | |
Tsanas et al. | Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests | |
JP2019084249A (en) | Dementia diagnosis apparatus, dementia diagnosis method, and dementia diagnosis program | |
JP6604113B2 (en) | Eating and drinking behavior detection device, eating and drinking behavior detection method, and eating and drinking behavior detection computer program | |
JP6515670B2 (en) | Sleep depth estimation device, sleep depth estimation method, and program | |
TW201923735A (en) | Cognitive function evaluation device, cognitive function evaluation system, cognitive function evaluation method and program | |
WO2020151155A1 (en) | Method and device for building alzheimer's disease detection model | |
JP6845404B2 (en) | Sleep stage determination method, sleep stage determination device, and sleep stage determination program | |
EP3866687A1 (en) | A method and apparatus for diagnosis of maladies from patient sounds | |
JP5803125B2 (en) | Suppression state detection device and program by voice | |
JP7430398B2 (en) | Information processing device, information processing method, information processing system, and information processing program | |
WO2024116254A1 (en) | Information processing device, information processing method, information processing system, and information processing program | |
TW201742053A (en) | Estimation method, estimation program, estimation device, and estimation system | |
Akafi et al. | Assessment of hypernasality for children with cleft palate based on cepstrum analysis | |
WO2021132289A1 (en) | Pathological condition analysis system, pathological condition analysis device, pathological condition analysis method, and pathological condition analysis program | |
JPWO2016207951A1 (en) | Shunt sound analysis device, shunt sound analysis method, computer program, and recording medium | |
Morales et al. | Glottal Airflow Estimation Using Neck Surface Acceleration and Low-Order Kalman Smoothing | |
JP6925056B2 (en) | Sleep stage determination method, sleep stage determination device, and sleep stage determination program | |
JP7246664B1 (en) | Information processing device, information processing method, information processing system, and information processing program | |
den Brinker et al. | Performance requirements for cough classifiers in real-world applications | |
JP6782940B2 (en) | Tongue position / tongue habit judgment device, tongue position / tongue habit judgment method and program | |
JP6627625B2 (en) | Response support device, response support method, response support program, response evaluation device, response evaluation method, and response evaluation program | |
JP2021519122A (en) | Detection of subjects with respiratory disabilities | |
WO2024209534A1 (en) | Cochlear nerve feature amount extraction device, hearing ability estimation device, cochlear nerve feature amount extraction method, and program | |
JP7497023B2 (en) | Hearing test system and hearing test method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22967089 Country of ref document: EP Kind code of ref document: A1 |