WO2018117171A1 - Bioacoustic analysis method, program, storage medium, and bioacoustic analysis device - Google Patents

Bioacoustic analysis method, program, storage medium, and bioacoustic analysis device Download PDF

Info

Publication number
WO2018117171A1
WO2018117171A1 PCT/JP2017/045777 JP2017045777W WO2018117171A1 WO 2018117171 A1 WO2018117171 A1 WO 2018117171A1 JP 2017045777 W JP2017045777 W JP 2017045777W WO 2018117171 A1 WO2018117171 A1 WO 2018117171A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
sound
learning
abnormal
teacher
Prior art date
Application number
PCT/JP2017/045777
Other languages
French (fr)
Japanese (ja)
Inventor
隆真 亀谷
Original Assignee
パイオニア株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by パイオニア株式会社 filed Critical パイオニア株式会社
Priority to JP2018558046A priority Critical patent/JP6672478B2/en
Publication of WO2018117171A1 publication Critical patent/WO2018117171A1/en

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B7/00Instruments for auscultation
    • A61B7/02Stethoscopes
    • A61B7/04Electric stethoscopes

Definitions

  • the present invention relates to a technical field of a biological sound analysis method, a program, a storage medium, and a biological sound analysis device for analyzing a biological sound such as a respiratory sound.
  • Patent Document 1 describes a technique of detecting a plurality of abnormal sounds (sub-noises) included in respiratory sounds by decomposing them into sound types.
  • the acquired biological sound information and abnormal sound information stored in advance (specifically, a biological sound when an abnormal sound is actually generated) It is possible to determine the occurrence of an abnormal sound by comparing with (information).
  • the body sound information varies depending on individual differences, measurement environments, and the like, it is difficult to accurately determine whether or not an abnormal sound is generated simply by comparing the information. For this reason, unless an appropriate judgment criterion is set, there is a technical problem that the abnormal sound is generated but cannot be detected, or the abnormal sound is not detected but erroneously detected.
  • An object of the present invention is to provide a body sound analysis method, a program, a storage medium, and a body sound analysis apparatus that can suitably analyze abnormal sounds included in body sounds.
  • the biological sound analysis method for solving the above-described problem is a biological sound analysis method used in a biological sound analysis device for analyzing biological sounds, the first acquisition step of acquiring first information related to biological sounds, A second acquisition step of acquiring second information indicating a timing at which an abnormal sound is generated in the first information; a learning step of learning a correspondence relationship between the first information and the second information; and the learning step And a discrimination step of discriminating an abnormal sound included in the input body sound information based on the learning result by.
  • the program for solving the above problem causes the biological sound analysis apparatus to execute the above-described biological sound analysis method.
  • a storage medium for solving the above problem stores the above-described program.
  • a biological sound analysis apparatus for solving the above problems includes a first acquisition unit that acquires biological sound information related to a biological sound, and a determination unit that determines an abnormal sound included in the biological sound based on a learning result. And the learning result is based on the first information about the body sound and the second information indicating the timing at which the abnormal sound is generated in the body sound, and the correspondence relationship between the first information and the second information. Is a learning result of learning.
  • the body sound analysis method is a body sound analysis method used in a body sound analysis apparatus that analyzes body sounds, and includes a first acquisition step of acquiring first information related to body sounds, and the first. A second acquisition step of acquiring second information indicating the timing of occurrence of abnormal sound in the information, a learning step of learning a correspondence relationship between the first information and the second information, and learning by the learning step A discriminating step for discriminating an abnormal sound included in the input body sound information based on the result.
  • the first information related to the body sound is acquired, and the timing at which the abnormal sound (for example, secondary noise) is generated in the first information is determined.
  • Second information to be shown is acquired.
  • the 1st information is acquired as information (for example, time-axis waveform which shows a living body sound) which shows change with time of a living body sound.
  • the second information is desirably information that accurately indicates the timing at which the abnormal sound in the first information is generated. For this reason, it is preferable that the second information is information prepared in advance using the first information.
  • the learning step is preferably executed a plurality of times using a plurality of first information and second information.
  • the “input body sound information” is information regarding the body sound to be analyzed by the body sound analysis method according to the present embodiment, and is input separately from the first information and the second information described above. is there. Particularly in the present embodiment, since the correspondence between the first information and the second information is learned in advance, it is possible to accurately determine the timing at which the abnormal sound is generated from the input body sound information. Therefore, it is possible to appropriately discriminate abnormal sounds included in the body sound information.
  • the method further includes a first generation step of generating feature amount information indicating a feature amount in the first information based on the first information, and the learning step includes Instead of the correspondence relationship between the first information and the second information, the correspondence relationship between the feature amount information and the second information is learned.
  • the feature amount information indicating the feature amount in the first information is generated.
  • the “feature amount” is a value indicating the size (degree) of a feature that can be used to determine an abnormal sound included in a body sound.
  • the correspondence relationship between the feature amount information and the second information is learned instead of the correspondence relationship between the first information and the second information. Therefore, a more suitable learning result can be obtained in order to discriminate abnormal sounds included in the input body sound information.
  • the method further comprises a dividing step of dividing the first information and the second information into predetermined frame units, and the learning step learns in the predetermined frame units.
  • the first information and the second information are divided into predetermined frame units while learning is performed.
  • the predetermined frame unit is set as a period in which an appropriate learning result can be obtained more easily. For this reason, it is possible to obtain a learning result more suitably by performing learning in units of predetermined frames.
  • the third information indicating whether or not an abnormal sound is generated in the first information is acquired. It is desirable that the third information is information that accurately indicates whether or not an abnormal sound is generated in the first information. For this reason, it is preferable that the third information is information prepared in advance using the first information.
  • fourth information indicating the ratio of the period in which the abnormal sound is generated to the period in which the first information is acquired is calculated.
  • the fourth information is calculated by analyzing the first information using the learning result.
  • a threshold for determining whether or not an abnormal sound is included in the input body sound information is determined based on the third information and the fourth information.
  • the This threshold value is a threshold value for determining whether or not an abnormal sound is actually included when determining an abnormal sound included in the input body sound information. It is a threshold value that is compared with a value indicating a ratio of a period in which abnormal sound is generated to a period in which information is acquired.
  • this threshold value for example, it can be determined that an abnormal sound has occurred when the ratio of the period in which the abnormal sound related to the input body sound information is occurring is equal to or greater than the determined threshold value. On the other hand, when the ratio of the period when the abnormal sound related to the input body sound information is generated is less than the determined threshold, it can be determined that no abnormal sound is generated.
  • the threshold value is determined based on the fourth information calculated using the learning result, it is possible to more accurately determine whether or not an abnormal sound has occurred.
  • the program according to the present embodiment causes the biological sound analysis apparatus to execute the biological sound analysis method described above.
  • the biological sound analysis method according to the present embodiment described above can be executed, it is possible to suitably determine abnormal sounds included in the biological sound information.
  • the storage medium according to the present embodiment stores the above-described program.
  • the storage medium since the program according to the present embodiment described above can be executed, it is possible to appropriately determine an abnormal sound included in the body sound information.
  • the biological sound analysis apparatus includes a first acquisition unit that acquires biological sound information related to a biological sound, and a determination unit that determines abnormal sounds included in the biological sound based on a learning result.
  • the learning result learns the correspondence between the first information and the second information based on the first information about the body sound and the second information indicating the timing at which the abnormal sound in the body sound is generated. The learning result.
  • the biological sound analysis apparatus it is possible to appropriately determine abnormal sounds included in the biological sound information based on the learning result, similarly to the biological sound analysis method described above.
  • biological sound analysis apparatus can also adopt various aspects similar to the various aspects of the biological sound analysis method according to the present embodiment described above.
  • Teacher data is data in which three sets of information of teacher voice signal, teacher frame information, and whole teacher information are set as one set, and a plurality of sets are prepared in advance.
  • the teacher voice signal is a signal (for example, a time axis waveform) indicating a temporal change in the breathing sound.
  • the teacher frame information is information indicating the generation timing of abnormal sound in the teacher voice signal for each sound type.
  • the entire teacher information is information indicating whether or not an abnormal sound is generated in the teacher voice signal for each sound type.
  • the teacher data is used for a learning operation described later, and the larger the number, the higher the learning effect (in other words, the accuracy of biological sound analysis).
  • frame determination learning is a learning operation for increasing the determination accuracy of the frame determination process for determining the occurrence of abnormal sound in units of frames.
  • FIG. 1 is a block diagram illustrating a configuration of a frame determination learner according to the present embodiment.
  • the frame determination learner includes a teacher speech signal input unit 110, a teacher frame information input unit 120, a processing unit 200, and a learning result output unit 300. ing.
  • the teacher voice signal input unit 110 is configured to acquire a teacher voice signal included in the teacher data and output it to the processing unit 200.
  • the teacher frame information input unit 120 is configured to acquire teacher frame information included in the teacher data and to output it to the processing unit 200.
  • the processing unit 200 includes a plurality of arithmetic circuits and memories.
  • the processing unit 200 includes a frame dividing unit 210, a first local feature value calculating unit 220, a frequency analyzing unit 230, a second local feature value calculating unit 240, and a learning unit 250.
  • the frame dividing unit 210 is configured to be able to execute a dividing process for dividing the teacher speech signal input from the teacher speech signal input unit 110 into a plurality of frames.
  • the respiratory sound signal divided by the frame dividing unit 210 is output to the first local feature value calculating unit 220 and the frequency analyzing unit 230.
  • the first local feature amount calculation unit 220 is configured to be able to calculate the first local feature amount based on the waveform of the teacher speech signal. The processing executed by the first local feature quantity calculation unit 220 will be described in detail later.
  • the first local feature value calculated by the first local feature value calculation unit 220 is output to the learning unit 250.
  • the frequency analysis unit 230 is configured to be able to execute a time frequency analysis process (for example, an FFT process) on the teacher audio signal input from the teacher audio signal input unit 110.
  • the analysis result of the time frequency analysis unit 230 is configured to be output to the second local feature amount calculation unit 240.
  • the second local feature quantity calculation unit 240 is configured to be able to calculate the second local feature quantity based on the analysis result of the frequency analysis unit 230. The processing executed by the second local feature quantity calculation unit 240 will be described in detail later.
  • the second local feature value calculated by the second local feature value calculation unit 240 is output to the learning unit 250.
  • the learning unit 250 learns the correspondence between the local feature values calculated by the first local feature value calculating unit 220 and the second local feature value calculating unit 240 and the teacher frame information input from the teacher frame information input unit 120. It is configured to be possible. The processing executed by the learning unit 250 will be described in detail later. The learning result of the learning unit 250 is output to the learning result output unit 300.
  • the learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 in such a manner that it can be used for the analysis of the body sound.
  • the learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 to a memory or the like of the biological sound analyzer.
  • FIG. 2 is a flowchart illustrating the flow of the frame determination learning operation according to the present embodiment.
  • the learning unit 250 is first initialized (step S100). Subsequently, a teacher voice signal is acquired by the teacher voice signal input unit 110 (step S101). The teacher voice signal input unit 110 outputs the acquired teacher voice signal to the processing unit 200.
  • the teacher voice signal is a specific example of “first information”.
  • FIG. 3 is a conceptual diagram showing a frame division process of the teacher voice signal.
  • the teacher voice signal is divided into a plurality of frames at a predetermined interval.
  • This frame is set as a processing unit for suitably executing a local feature amount calculation process to be described later, and the period per frame is, for example, 12 msec.
  • the teacher audio signal divided into frames is input to the first local feature amount calculation unit 220, and the first local feature amount is calculated (step S103). Further, the teacher voice signal divided into frames is subjected to frequency analysis by the frequency analysis unit 230 and input to the second local feature amount calculation unit 240. In the second local feature amount calculation unit 240, the second local feature amount is calculated based on the teacher voice signal subjected to frequency analysis (for example, a spectrum indicating frequency characteristics).
  • FIG. 4 is a flowchart showing the calculation process of the first local feature quantity.
  • FIG. 5 is a flowchart showing the calculation process of the second local feature quantity.
  • FIG. 6 is a diagram illustrating a local feature vector obtained from a waveform and a spectrum.
  • the waveform of the teacher speech signal is acquired (step S201), and prefiltering is performed (step S202).
  • the pre-filter process is a process using a high-pass filter, for example, and can remove an extra component included in the teacher voice signal.
  • a local variance value is calculated using the teacher voice signal that has been subjected to the prefiltering process (step S203).
  • the local variance value is calculated as, for example, a first variance value indicating variation in the teacher speech signal in the first period w1, and a second variance value indicating variation in the teacher speech signal in the second period w2 including the first period w1.
  • the local variance calculated in this way functions as a local feature amount for determining an intermittent rale (for example, a water bubble sound) among abnormal sounds.
  • the maximum value of the local variance value in each frame of the teacher speech signal is calculated (step S204) and output as the first local feature amount (step S205).
  • CMN Cepstral Mean Normalization
  • a liftering process for extracting an envelope component (step S303) and a liftering process for extracting a fine component (step S304) are performed on the teacher speech signal subjected to the CMN process.
  • the liftering process is a process of cutting a predetermined cefency component from the cepstrum.
  • CMN processing and liftering processing it is possible to make it easy to determine continuous rales (for example, analogy sounds, whistle sounds, haircut sounds, etc.) that are buried in other biological sounds. Since the CMN process and the liftering process are existing techniques, a detailed description thereof is omitted here.
  • the enhancement process using the KL information amount is executed to calculate the feature quantity.
  • the KL information amount is a parameter calculated using the observed value P and the reference value Q (for example, theoretical value, model value, predicted value, etc.), and a characteristic observed value P appears with respect to the reference value Q. Then, the KL information amount is calculated as a large value. According to the processing using the KL information amount, the tone component (that is, the component for discriminating the continuous rarity) included in the teacher speech signal is emphasized and clarified.
  • HAAR-LIKE features are also calculated for the spectrum obtained by frequency analysis (step S306).
  • the HAAR-LIKE feature is a technique mainly used in the field of image processing.
  • the HAAR-LIKE feature is calculated from the spectrum by a similar method by associating the amplitude value for each frequency with the pixel value in the image processing. Note that the calculation of the HAAR-LIKE feature is an existing technique, and thus detailed description thereof is omitted here.
  • an average value is calculated for each frequency band (step S307), and is output as the second local feature amount (Ste S307).
  • a plurality of types of local feature quantities (that is, a first local feature quantity and a second local feature quantity) are obtained from the waveform and spectrum of the respiratory sound signal by the above-described processing. These are output as local feature vectors for each frame.
  • teacher frame information is acquired by the teacher frame information input unit 120 (step S106).
  • the teacher frame information is a specific example of “second information”.
  • the acquired teacher frame information is output to the learning unit 250 together with the local feature amount.
  • FIG. 7 is a diagram illustrating teacher frame information associated with a local feature amount.
  • the local feature amount and the teacher frame information are set as a pair of data for each frame (step S107). Thereby, a local feature-value and the generation
  • the local feature amount vector associated with the teacher frame information indicating the timing at which the abnormal sound is generated is learned as the local feature amount when the abnormal sound is generated.
  • the local feature vector associated with the teacher frame information indicating the timing at which no abnormal sound is generated is learned as a local feature when no abnormal sound is generated.
  • step S108 it is determined whether or not the setting is completed for all the frames. If the setting has not been completed for all the frames (step S108: NO), the processes after steps S103 and S104 are repeated for the unset frames.
  • step S109 it is determined whether or not the processing has been completed for all the plurality of teacher data. If it is determined that the processing has not been completed for all the teacher data (step S109: NO), the processing after step S101 is executed again. By repeating the process in this way, the correspondence is performed for all the frames of all the teacher data.
  • the learning process by the learning unit 250 is actually executed (step S110).
  • the learning process is performed using a machine learning algorithm such as AdaBoost, for example.
  • AdaBoost a machine learning algorithm
  • an existing method can be appropriately employed for the learning process, and thus detailed description thereof is omitted here.
  • step S111 Since the learning process described above is performed for each abnormal sound type, it is determined whether or not the learning process has been completed for all sound types after the process ends (step S111), and the learning process has been completed for all sound types. If not (step S111: NO), the learning process of step S110 is executed again for the incomplete sound type.
  • step S111 when the learning process is completed for all the sound types (step S111: YES), the learning result is output to the body sound analyzer (step S112).
  • the learning result of the frame determination learning described above is used for the analysis of the respiratory sound by the body sound analyzer.
  • the same processing as the processing for the teacher speech signal in the frame determination learning operation described above is executed for the respiratory sound signal to be analyzed.
  • the input respiratory sound signal is divided into frames, and a local feature amount is calculated for each frame.
  • the abnormal sound is preferably generated using the local feature amount calculated from the respiratory sound.
  • the timing in other words, the temporal position of the frame where the abnormal sound is generated
  • the timing can be detected. Therefore, it is possible to determine abnormal sounds extremely accurately as compared with the case where the learning operation is not performed in advance.
  • the optimum threshold value determining operation is a ratio of a period in which it is determined that an abnormal sound is occurring. This is an operation for determining, as an optimum value, a threshold value used when determining the presence or absence of actual occurrence of abnormal sound.
  • FIG. 8 is a block diagram illustrating the configuration of the threshold value determination unit according to the present embodiment.
  • the threshold value determination unit includes a frame determination result input unit 410, an entire teacher information input unit 420, a determination unit 500, and a threshold output unit 600. .
  • the frame determination result input unit 410 acquires the result of the frame determination process of the teacher audio signal using the learning result (that is, the process of determining whether or not abnormal sound is generated for each frame), and determines the determination unit 500 It is configured to be able to output to.
  • the entire teacher information input unit 120 is configured to be able to acquire the entire teacher information included in the teacher data and output it to the determination unit 500.
  • the determination unit 500 includes a global feature amount calculation unit 510, an ROC analysis (Receiver Operating Characteristic analysis) unit 520, and an optimum threshold value calculation unit 530.
  • the global feature quantity calculation unit 510 is configured to be able to calculate the ratio of the occurrence time when the abnormal sound is generated to the period during which the teacher voice signal is input, based on the determination result of the frame determination process. Information indicating the ratio of occurrence time of abnormal sound calculated by the global feature amount calculation unit 510 is output to the ROC analysis unit 520.
  • the ROC analysis unit 520 determines, based on the relationship between the information indicating the ratio of the abnormal sound occurrence time and the entire teacher information, a certain threshold for the ratio of the abnormal sound occurrence time and the threshold.
  • An ROC analysis for acquiring a relationship with performance as an ROC curve is configured to be executable. Since the ROC analysis is an existing technique, detailed description thereof is omitted here.
  • the analysis result of the ROC analysis unit 520 is configured to be output to the optimum threshold value calculation unit 530.
  • the optimum threshold value calculation unit 530 uses the analysis result of the ROC analysis unit 520 to determine whether or not the abnormal sound is actually generated from the ratio of the generation time during which the abnormal sound is generated. A threshold value (threshold value that gives a point closest to the reference point (0, 1) in the ROC curve) can be calculated (see FIG. 9). The threshold value calculated by the optimum threshold value calculation unit 530 is output to the threshold value output unit 600.
  • the threshold value output unit 600 is configured to be able to output the threshold value calculated by the optimum threshold value calculation unit 530 in a manner that can be used for analysis of a body sound.
  • the threshold value output unit 300 is configured to be able to output the threshold value calculated by the optimal threshold value calculation unit 530 to a memory or the like of the biological sound analyzer.
  • FIG. 10 is a flowchart showing the flow of the optimum threshold value determining operation according to the present embodiment.
  • step S201 the frame determination result of the teacher voice signal is acquired by the frame determination result input unit 410 (step S201). Subsequently, it is determined whether or not the frame determination results for all the frames have been acquired (step S202). When the frame determination results for all the frames have not been acquired (step S202: NO), the process of step S201 is executed again for the unacquired frames.
  • step S202 when the frame determination results of all the frames have been acquired (step S202: YES), the acquired frame determination results are output to the global feature amount calculation unit 510, and the frames of the entire section of the teacher speech signal The ratio of the number of frames determined to have an abnormal sound to the number (that is, the global feature amount) is calculated (step S203).
  • the global feature amount is a specific example of “fourth information”.
  • the entire teacher information included in the teacher data is acquired by the entire teacher information input unit 420 (step S204).
  • the entire teacher information is a specific example of “third information”.
  • the entire teacher information is output to the ROC analysis unit 520 together with the global feature quantity, and the global feature quantity and the overall teacher information are set as a pair of data (step S205). That is, the ratio of the number of frames determined to have an abnormal sound and the information indicating whether or not the abnormal sound has occurred are associated with each other. Thereafter, it is determined whether or not the setting has been completed for all teacher data (step S206). When the setting of all the teacher data has not been completed (step S206: NO), the processing after step S201 is executed again for the unset teacher data.
  • step S206 when the setting of all teacher data has been completed (step S206: YES), ROC analysis is performed by the ROC analysis unit 520, and the relationship between the threshold value and the discrimination performance is obtained as an ROC curve (step S207). .
  • the optimum threshold value calculation unit 530 calculates an optimum threshold value according to the ROC analysis result (step S208).
  • step S209 it is determined whether or not the process has been completed for all sound types after the completion. If the processing has not been completed for all sound types (step S209: NO), the processing of step S207 and step S208 is executed for the incomplete sound types. If the processing has been completed for all sound types (step S209: YES), a threshold is output (step S210), and the series of processing ends.
  • the threshold value determined as described above is used for analysis of respiratory sounds by the body sound analyzer. For example, when analyzing the respiratory sound, the same processing as the processing for the teacher speech signal in the frame determination learning operation described above is executed for the respiratory sound signal to be analyzed. Specifically, from the frame determination result of the input respiratory sound signal, the ratio of the period in which abnormal sound is occurring to the respiratory sound acquisition period (that is, the global feature value) is calculated. As a result, for example, when the ratio of the period in which abnormal sound is occurring is large, it can be determined that the abnormal sound is actually generated. On the other hand, when the ratio of the period in which the abnormal sound is generated is small, it can be determined that the abnormal sound is not actually generated although the abnormal sound is detected in the frame unit.
  • the determination regarding the presence or absence of such abnormal sound is realized by comparison with the threshold value determined by the above-described optimum threshold value determination operation. Specifically, it can be determined that the abnormal sound is generated when the ratio of the period in which the abnormal sound is generated is larger than the threshold, and it can be determined that the abnormal sound is not generated when the ratio is smaller than the threshold.
  • the global feature amount and the entire teacher data that is, information indicating the presence / absence of abnormal sound
  • the optimum threshold value Is calculated Therefore, it is possible to determine the presence / absence of abnormal sound extremely accurately based on the global feature amount calculated from the respiratory sound.
  • the present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit or idea of the invention that can be read from the claims and the entire specification, and biological sound analysis accompanied by such changes
  • a method, a program, a storage medium, and a biological sound analysis device are also included in the technical scope of the present invention.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Medical Informatics (AREA)
  • Acoustics & Sound (AREA)
  • Veterinary Medicine (AREA)
  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Heart & Thoracic Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Pulmonology (AREA)
  • Physiology (AREA)
  • Pathology (AREA)
  • Biophysics (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

Provided is a bioacoustic analysis method that is used in a bioacoustic analysis device for analyzing biological sound. This bioacoustic analysis method includes the following: a first acquisition step (S101) for acquiring first information relating to biological sound; a second acquisition step (S106) for acquiring second information indicating the timing, in the first information, at which an abnormal sound has arisen; a learning step (S110) for learning the correspondence between the first information and the second information; and a discrimination step for discriminating the abnormal sound contained in the inputted bioacoustic information on the basis of the learning results obtained from the learning step. According to this bioacoustic analysis method, the correspondence between the first information and the second information is learned in advance, thus making it possible to suitably distinguish abnormal sounds contained in biological sound.

Description

生体音解析方法、プログラム、記憶媒体及び生体音解析装置Body sound analysis method, program, storage medium, and body sound analysis apparatus
 本発明は、例えば呼吸音等の生体音を解析する生体音解析方法、プログラム、記憶媒体及び生体音解析装置の技術分野に関する。 The present invention relates to a technical field of a biological sound analysis method, a program, a storage medium, and a biological sound analysis device for analyzing a biological sound such as a respiratory sound.
 電子聴診器等によって検出される生体の呼吸音について、そこに含まれる異常音(即ち、正常な呼吸音とは異なる音)を検出しようとする装置が知られている。例えば特許文献1には、呼吸音に含まれる複数の異常音(副雑音)を、音種別に分解して検出するという技術が記載されている。 An apparatus that detects an abnormal sound (that is, a sound different from a normal respiratory sound) included in a respiratory sound detected by an electronic stethoscope or the like is known. For example, Patent Document 1 describes a technique of detecting a plurality of abnormal sounds (sub-noises) included in respiratory sounds by decomposing them into sound types.
国際公開第2016/002004号International Publication No. 2016/002004
 生体音に含まれている異常音に関する解析を行う場合、取得された生体音情報と、予め記憶している異常音情報(具体的には、実際に異常音が発生している場合の生体音情報)とを比較することで、異常音の発生を判断することが可能である。しかしながら、生体音情報は個人差や測定環境等に応じて変動するため、単に情報を比較するだけでは、異常音が発生しているか否かを正確に判断することは難しい。このため、適切な判断基準を設定しておかなければ、異常音が発生しているのに検出できない、或いは異常音発生していないのに誤って検出してしまうという技術的問題点が生ずる。 When analyzing an abnormal sound included in a biological sound, the acquired biological sound information and abnormal sound information stored in advance (specifically, a biological sound when an abnormal sound is actually generated) It is possible to determine the occurrence of an abnormal sound by comparing with (information). However, since the body sound information varies depending on individual differences, measurement environments, and the like, it is difficult to accurately determine whether or not an abnormal sound is generated simply by comparing the information. For this reason, unless an appropriate judgment criterion is set, there is a technical problem that the abnormal sound is generated but cannot be detected, or the abnormal sound is not detected but erroneously detected.
 本発明が解決しようとする課題には、上記のようなものが一例として挙げられる。本発明は、生体音に含まれる異常音を好適に解析可能な生体音解析方法、プログラム、記憶媒体及び生体音解析装置を提供することを課題とする。 Examples of problems to be solved by the present invention include the above. An object of the present invention is to provide a body sound analysis method, a program, a storage medium, and a body sound analysis apparatus that can suitably analyze abnormal sounds included in body sounds.
 上記課題を解決するための生体音解析方法は、生体音を解析する生体音解析装置に利用される生体音解析方法であって、生体音に関する第1情報を取得する第1取得工程と、前記第1情報における、異常音が発生しているタイミングを示す第2情報を取得する第2取得工程と、前記第1情報と前記第2情報との対応関係を学習する学習工程と、前記学習工程による学習結果に基づいて、入力された生体音情報に含まれる異常音を判別する判別工程と、を含む。 The biological sound analysis method for solving the above-described problem is a biological sound analysis method used in a biological sound analysis device for analyzing biological sounds, the first acquisition step of acquiring first information related to biological sounds, A second acquisition step of acquiring second information indicating a timing at which an abnormal sound is generated in the first information; a learning step of learning a correspondence relationship between the first information and the second information; and the learning step And a discrimination step of discriminating an abnormal sound included in the input body sound information based on the learning result by.
 上記課題を解決するためのプログラムは、上述した生体音解析方法を、前記生体音解析装置に実行させる。 The program for solving the above problem causes the biological sound analysis apparatus to execute the above-described biological sound analysis method.
 上記課題を解決するための記憶媒体は、上述したプログラムを記憶している。 A storage medium for solving the above problem stores the above-described program.
 上記課題を解決するための生体音解析装置は、生体音に関する生体音情報を取得する第1取得部と、学習結果に基づいて、前記生体音に含まれる異常音を判別する判別部と、を備え、前記学習結果は、生体音に関する第1情報と、前記生体音における異常音が発生しているタイミングを示す第2情報とに基づいて、前記第1情報と前記第2情報との対応関係を学習した学習結果である。 A biological sound analysis apparatus for solving the above problems includes a first acquisition unit that acquires biological sound information related to a biological sound, and a determination unit that determines an abnormal sound included in the biological sound based on a learning result. And the learning result is based on the first information about the body sound and the second information indicating the timing at which the abnormal sound is generated in the body sound, and the correspondence relationship between the first information and the second information. Is a learning result of learning.
本実施例に係るフレーム判定学習器の構成を示すブロック図である。It is a block diagram which shows the structure of the frame determination learning device which concerns on a present Example. 本実施例に係るフレーム判定学習動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the frame determination learning operation | movement which concerns on a present Example. 教師音声信号のフレーム分割処理を示す概念図である。It is a conceptual diagram which shows the frame division | segmentation process of a teacher audio | voice signal. 第1局所特徴量の算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of a 1st local feature-value. 第2局所特徴量の算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of a 2nd local feature-value. 波形及びスペクトラムから得られる局所特徴量ベクトルを示す図である。It is a figure which shows the local feature-value vector obtained from a waveform and a spectrum. 局所特徴量と対応付けられる教師フレーム情報を示す図である。It is a figure which shows the teacher frame information matched with a local feature-value. 本実施例に係る閾値決定部の構成を示すブロック図である。It is a block diagram which shows the structure of the threshold value determination part which concerns on a present Example. ROC解析の処理内容の一例を示す概念図である。It is a conceptual diagram which shows an example of the processing content of ROC analysis. 本実施例に係る最適閾値決定動作の流れを示すフローチャートである。It is a flowchart which shows the flow of the optimal threshold value determination operation | movement which concerns on a present Example.
 <1>
 本実施形態に係る生体音解析方法は、生体音を解析する生体音解析装置に利用される生体音解析方法であって、生体音に関する第1情報を取得する第1取得工程と、前記第1情報における、異常音が発生しているタイミングを示す第2情報を取得する第2取得工程と、前記第1情報と前記第2情報との対応関係を学習する学習工程と、前記学習工程による学習結果に基づいて、入力された生体音情報に含まれる異常音を判別する判別工程と、を含む。
<1>
The body sound analysis method according to the present embodiment is a body sound analysis method used in a body sound analysis apparatus that analyzes body sounds, and includes a first acquisition step of acquiring first information related to body sounds, and the first. A second acquisition step of acquiring second information indicating the timing of occurrence of abnormal sound in the information, a learning step of learning a correspondence relationship between the first information and the second information, and learning by the learning step A discriminating step for discriminating an abnormal sound included in the input body sound information based on the result.
 本実施形態に係る生体音解析方法によれば、生体音(例えば、呼吸音)に関する第1情報が取得されると共に、第1情報における異常音(例えば、副雑音)が発生しているタイミングを示す第2情報が取得される。第1情報は、生体音の経時的な変化を示す情報(例えば生体音を示す時間軸波形)として取得される。一方で、第2情報は、第1情報における異常音が発生しているタイミングを正確に示す情報であることが望まれる。このため、第2情報は、第1情報を利用して予め用意された情報であることが好ましい。 According to the body sound analysis method according to the present embodiment, the first information related to the body sound (for example, breathing sound) is acquired, and the timing at which the abnormal sound (for example, secondary noise) is generated in the first information is determined. Second information to be shown is acquired. The 1st information is acquired as information (for example, time-axis waveform which shows a living body sound) which shows change with time of a living body sound. On the other hand, the second information is desirably information that accurately indicates the timing at which the abnormal sound in the first information is generated. For this reason, it is preferable that the second information is information prepared in advance using the first information.
 第1情報及び第2情報が取得されると、第1情報と第2情報との対応関係が学習される。具体的には、第1情報から第2情報を導き出すためのパラメータが学習される。このパラメータは複数種類存在してもよい。学習工程は、学習結果をより正確なものとするために、複数の第1情報及び第2情報を利用して、複数回実行されることが好ましい。 When the first information and the second information are acquired, the correspondence between the first information and the second information is learned. Specifically, parameters for deriving the second information from the first information are learned. There may be multiple types of this parameter. In order to make the learning result more accurate, the learning step is preferably executed a plurality of times using a plurality of first information and second information.
 学習工程後には、学習結果に基づいて、入力された生体音情報に含まれる異常音が判別される。なお「入力された生体音情報」とは、本実施形態に係る生体音解析方法の解析対象となる生体音に関する情報であり、上述した第1情報や第2情報とは別に入力されるものである。本実施形態では特に、第1情報と第2情報との対応関係が予め学習されているため、入力された生体音情報から異常音が発生しているタイミングを正確に判定できる。よって、生体音情報に含まれる異常音を好適に判別することが可能である。 After the learning process, abnormal sounds included in the input body sound information are determined based on the learning result. The “input body sound information” is information regarding the body sound to be analyzed by the body sound analysis method according to the present embodiment, and is input separately from the first information and the second information described above. is there. Particularly in the present embodiment, since the correspondence between the first information and the second information is learned in advance, it is possible to accurately determine the timing at which the abnormal sound is generated from the input body sound information. Therefore, it is possible to appropriately discriminate abnormal sounds included in the body sound information.
 <2>
 本実施形態に係る生体音解析方法の一態様では、前記第1情報に基づいて、前記第1情報における特徴量を示す特徴量情報を生成する第1生成工程を更に含み、前記学習工程は、前記第1情報と前記第2情報との対応関係に代えて、前記特徴量情報と前記第2情報との対応関係を学習する。
<2>
In one aspect of the biological sound analysis method according to the present embodiment, the method further includes a first generation step of generating feature amount information indicating a feature amount in the first information based on the first information, and the learning step includes Instead of the correspondence relationship between the first information and the second information, the correspondence relationship between the feature amount information and the second information is learned.
 この態様によれば、第1情報が取得されると、第1情報における特徴量を示す特徴量情報が生成される。なお「特徴量」とは、生体音に含まれる異常音を判別するために利用可能な特徴の大きさ(度合い)を示す値である。 According to this aspect, when the first information is acquired, the feature amount information indicating the feature amount in the first information is generated. Note that the “feature amount” is a value indicating the size (degree) of a feature that can be used to determine an abnormal sound included in a body sound.
 本態様では特に、第1情報と第2情報との対応関係に代えて、特徴量情報と第2情報との対応関係が学習される。従って、入力された生体音情報に含まれる異常音を判別するために、より適した学習結果が得られる。 In this aspect, in particular, the correspondence relationship between the feature amount information and the second information is learned instead of the correspondence relationship between the first information and the second information. Therefore, a more suitable learning result can be obtained in order to discriminate abnormal sounds included in the input body sound information.
 <3>
 本実施形態に係る生体音解析方法の一態様では、前記第1情報及び前記第2情報を所定のフレーム単位に分割する分割工程を更に備え、前記学習工程は、前記所定のフレーム単位で学習する。
<3>
In one aspect of the biological sound analysis method according to the present embodiment, the method further comprises a dividing step of dividing the first information and the second information into predetermined frame units, and the learning step learns in the predetermined frame units. .
 この態様によれば、学習が行われる間に、第1情報及び第2情報が所定のフレーム単位に分割される。所定のフレーム単位は、より容易に適切な学習結果が得られるような期間として設定されている。このため、所定のフレーム単位で学習を行うことで、より好適に学習結果を得ることが可能とある。 According to this aspect, the first information and the second information are divided into predetermined frame units while learning is performed. The predetermined frame unit is set as a period in which an appropriate learning result can be obtained more easily. For this reason, it is possible to obtain a learning result more suitably by performing learning in units of predetermined frames.
 <4>
 本実施形態に係る生体音解析方法の一態様では、前記第1情報における前記異常音の発生の有無を示す第3情報を取得する第3取得工程と、前記第1情報及び前記学習工程による前記学習結果に基づいて、前記第1情報が取得された期間に対する、前記異常音が発生している期間の割合を示す第4情報を算出する算出工程と、前記第3情報及び前記第4情報に基づいて、前記入力された生体音情報に異常音が含まれるか否かを判定するための閾値を決定する決定工程と、を更に含む。
<4>
In one aspect of the biological sound analysis method according to the present embodiment, a third acquisition step of acquiring third information indicating presence / absence of occurrence of the abnormal sound in the first information, and the first information and the learning step, Based on the learning result, a calculation step of calculating fourth information indicating a ratio of the period in which the abnormal sound is generated to the period in which the first information is acquired, and the third information and the fourth information And a determination step of determining a threshold for determining whether or not abnormal sound is included in the input body sound information.
 この態様によれば、第1情報における異常音の発生の有無を示す第3情報が取得される。第3情報は、第1情報における異常音の発生の有無を正確に示す情報であることが望まれる。このため、第3情報は、第1情報を利用して予め用意された情報であることが好ましい。 According to this aspect, the third information indicating whether or not an abnormal sound is generated in the first information is acquired. It is desirable that the third information is information that accurately indicates whether or not an abnormal sound is generated in the first information. For this reason, it is preferable that the third information is information prepared in advance using the first information.
 本態様では更に、第1情報及び学習結果に基づいて、第1情報が取得された期間に対する、異常音が発生している期間の割合を示す第4情報が算出される。具体的には、第1情報が学習結果を利用して解析されることで、第4情報が算出される。 In this aspect, based on the first information and the learning result, fourth information indicating the ratio of the period in which the abnormal sound is generated to the period in which the first information is acquired is calculated. Specifically, the fourth information is calculated by analyzing the first information using the learning result.
 第3情報が取得され第4情報が算出されると、第3情報及び第4情報に基づいて、入力された生体音情報に異常音が含まれるか否かを判定するための閾値が決定される。この閾値は、入力された生体音情報に含まれる異常音を判別する際に、実際に異常音が含まれるか否かを判別するための閾値であり、具体的には、入力された生体音情報が取得された期間に対する、異常音が発生している期間の割合を示す値と比較される閾値である。 When the third information is acquired and the fourth information is calculated, a threshold for determining whether or not an abnormal sound is included in the input body sound information is determined based on the third information and the fourth information. The This threshold value is a threshold value for determining whether or not an abnormal sound is actually included when determining an abnormal sound included in the input body sound information. It is a threshold value that is compared with a value indicating a ratio of a period in which abnormal sound is generated to a period in which information is acquired.
 この閾値を利用すれば、例えば入力された生体音情報に関する異常音が発生している期間の割合が、決定された閾値以上である場合に異常音が発生していると判定できる。一方で、入力された生体音情報に関する異常音が発生している期間の割合が、決定された閾値未満である場合に異常音が発生していないと判定できる。 Using this threshold value, for example, it can be determined that an abnormal sound has occurred when the ratio of the period in which the abnormal sound related to the input body sound information is occurring is equal to or greater than the determined threshold value. On the other hand, when the ratio of the period when the abnormal sound related to the input body sound information is generated is less than the determined threshold, it can be determined that no abnormal sound is generated.
 本態様では特に、閾値が学習結果を利用して算出された第4情報に基づいて決定されるため、異常音の発生の有無をより正確に判定することが可能である。 In this aspect, in particular, since the threshold value is determined based on the fourth information calculated using the learning result, it is possible to more accurately determine whether or not an abnormal sound has occurred.
 <5>
 本実施形態に係るプログラムは、上述した生体音解析方法を、前記生体音解析装置に実行させる。
<5>
The program according to the present embodiment causes the biological sound analysis apparatus to execute the biological sound analysis method described above.
 本実施形態に係るプログラムによれば、上述した本実施形態に係る生体音解析方法を実行させることができるため、生体音情報に含まれる異常音を好適に判別することが可能である。 According to the program according to the present embodiment, since the biological sound analysis method according to the present embodiment described above can be executed, it is possible to suitably determine abnormal sounds included in the biological sound information.
 <6>
 本実施形態に係る記憶媒体は、上述したプログラムを記憶している。
<6>
The storage medium according to the present embodiment stores the above-described program.
 本実施形態に係る記憶媒体によれば、上述した本実施形態に係るプログラムを実行させることができるため、生体音情報に含まれる異常音を好適に判別することが可能である。 According to the storage medium according to the present embodiment, since the program according to the present embodiment described above can be executed, it is possible to appropriately determine an abnormal sound included in the body sound information.
 <7>
 本実施形態に係る生体音解析装置は、生体音に関する生体音情報を取得する第1取得部と、学習結果に基づいて、前記生体音に含まれる異常音を判別する判別部と、を備え、前記学習結果は、生体音に関する第1情報と、前記生体音における異常音が発生しているタイミングを示す第2情報とに基づいて、前記第1情報と前記第2情報との対応関係を学習した学習結果である。
<7>
The biological sound analysis apparatus according to the present embodiment includes a first acquisition unit that acquires biological sound information related to a biological sound, and a determination unit that determines abnormal sounds included in the biological sound based on a learning result. The learning result learns the correspondence between the first information and the second information based on the first information about the body sound and the second information indicating the timing at which the abnormal sound in the body sound is generated. The learning result.
 本実施形態に係る生体音解析装置によれば、上述した生体音解析方法と同様に、学習結果に基づいて、生体音情報に含まれる異常音を好適に判別することが可能である。 According to the biological sound analysis apparatus according to the present embodiment, it is possible to appropriately determine abnormal sounds included in the biological sound information based on the learning result, similarly to the biological sound analysis method described above.
 なお、本実施形態に係る生体音解析装置においても、上述した本実施形態に係る生体音解析方法における各種態様と同様の各種態様を採ることが可能である。 Note that the biological sound analysis apparatus according to the present embodiment can also adopt various aspects similar to the various aspects of the biological sound analysis method according to the present embodiment described above.
 本実施形態に係る生体音解析方法、プログラム、記憶媒体及び生体音解析装置の作用及び他の利得については、以下に示す実施例において、より詳細に説明する。 The operation and other gains of the biological sound analysis method, the program, the storage medium, and the biological sound analysis apparatus according to the present embodiment will be described in more detail in the following examples.
 以下では、生体音解析方法、プログラム、記憶媒体及び生体音解析装置の実施例について、図面を参照しながら詳細に説明する。なお、以下では、呼吸音の解析を行う生体音解析方法を例に挙げて説明する。 Hereinafter, embodiments of the body sound analysis method, program, storage medium, and body sound analysis apparatus will be described in detail with reference to the drawings. Hereinafter, a biological sound analysis method for analyzing respiratory sounds will be described as an example.
 <教師データ>
 まず、本実施例に係る生体音解析方法で用いられる教師データについて説明する。
<Teacher data>
First, teacher data used in the body sound analysis method according to the present embodiment will be described.
 教師データは、教師音声信号、教師フレーム情報、及び教師全体情報の3つの情報を1セットとするデータであり、事前に複数セット用意される。 Teacher data is data in which three sets of information of teacher voice signal, teacher frame information, and whole teacher information are set as one set, and a plurality of sets are prepared in advance.
 教師音声信号は、呼吸音の経時的変化を示す信号(例えば、時間軸波形)である。教師フレーム情報は、教師音声信号における異常音の発生タイミングを音種毎に示す情報である。教師全体情報は、教師音声信号における異常音の発生の有無を音種毎に示す情報である。 The teacher voice signal is a signal (for example, a time axis waveform) indicating a temporal change in the breathing sound. The teacher frame information is information indicating the generation timing of abnormal sound in the teacher voice signal for each sound type. The entire teacher information is information indicating whether or not an abnormal sound is generated in the teacher voice signal for each sound type.
 教師データは、後述する学習動作に利用されるものであり、数が多いほど学習効果(言い換えれば、生体音解析の精度)を高めることができる。 The teacher data is used for a learning operation described later, and the larger the number, the higher the learning effect (in other words, the accuracy of biological sound analysis).
 <フレーム判定学習>
 次に、本実施例に係る生体音解析方法のフレーム判定学習について、図1から図7を参照して説明する。なお、フレーム判定学習とは、異常音の発生をフレーム単位で判定するフレーム判定処理の判定精度を高めるための学習動作である。
<Frame judgment learning>
Next, frame determination learning of the biological sound analysis method according to the present embodiment will be described with reference to FIGS. Note that the frame determination learning is a learning operation for increasing the determination accuracy of the frame determination process for determining the occurrence of abnormal sound in units of frames.
 <学習器の構成>
 まず、フレーム判定学習に用いられるフレーム判定学習器の構成について、図1を参照して説明する。図1は、本実施例に係るフレーム判定学習器の構成を示すブロック図である。
<Configuration of learning device>
First, the configuration of a frame determination learner used for frame determination learning will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of a frame determination learner according to the present embodiment.
 図1に示すように、本実施例に係るフレーム判定学習器は、教師音声信号入力部110と、教師フレーム情報入力部120と、処理部200と、学習結果出力部300とを備えて構成されている。 As shown in FIG. 1, the frame determination learner according to the present embodiment includes a teacher speech signal input unit 110, a teacher frame information input unit 120, a processing unit 200, and a learning result output unit 300. ing.
 教師音声信号入力部110は、教師データに含まれる教師音声信号を取得して、処理部200に出力可能に構成されている。 The teacher voice signal input unit 110 is configured to acquire a teacher voice signal included in the teacher data and output it to the processing unit 200.
 教師フレーム情報入力部120は、教師データに含まれる教師フレーム情報を取得して、処理部200に出力可能に構成されている。 The teacher frame information input unit 120 is configured to acquire teacher frame information included in the teacher data and to output it to the processing unit 200.
 処理部200は、複数の演算回路やメモリ等を含んで構成されている。処理部200は、フレーム分割部210と、第1局所特徴量算出部220と、周波数解析部230と、第2局所特徴量算出部240と、学習部250とを備えて構成されている。 The processing unit 200 includes a plurality of arithmetic circuits and memories. The processing unit 200 includes a frame dividing unit 210, a first local feature value calculating unit 220, a frequency analyzing unit 230, a second local feature value calculating unit 240, and a learning unit 250.
 フレーム分割部210は、教師音声信号入力部110から入力された教師音声信号を複数のフレームに分割する分割処理を実行可能に構成されている。フレーム分割部210で分割された呼吸音信号は、第1局所特徴量算出部220及び周波数解析部230に出力される構成となっている。 The frame dividing unit 210 is configured to be able to execute a dividing process for dividing the teacher speech signal input from the teacher speech signal input unit 110 into a plurality of frames. The respiratory sound signal divided by the frame dividing unit 210 is output to the first local feature value calculating unit 220 and the frequency analyzing unit 230.
 第1局所特徴量算出部220は、教師音声信号の波形に基づいて、第1局所特徴量を算出可能に構成されている。第1局所特徴量算出部220が実行する処理については、後に詳述する。第1局所特徴量算出部220で算出された第1局所特徴量は、学習部250に出力される構成となっている。 The first local feature amount calculation unit 220 is configured to be able to calculate the first local feature amount based on the waveform of the teacher speech signal. The processing executed by the first local feature quantity calculation unit 220 will be described in detail later. The first local feature value calculated by the first local feature value calculation unit 220 is output to the learning unit 250.
 周波数解析部230は、教師音声信号入力部110から入力された教師音声信号に対して時間周波数解析処理(例えば、FFT処理等)を実行可能に構成されている。時間周波数解析部230の解析結果は、第2局所特徴量算出部240に出力される構成となっている。 The frequency analysis unit 230 is configured to be able to execute a time frequency analysis process (for example, an FFT process) on the teacher audio signal input from the teacher audio signal input unit 110. The analysis result of the time frequency analysis unit 230 is configured to be output to the second local feature amount calculation unit 240.
 第2局所特徴量算出部240は、周波数解析部230の解析結果に基づいて、第2局所特徴量を算出可能に構成されている。第2局所特徴量算出部240が実行する処理については、後に詳述する。第2局所特徴量算出部240で算出された第2局所特徴量は、学習部250に出力される構成となっている。 The second local feature quantity calculation unit 240 is configured to be able to calculate the second local feature quantity based on the analysis result of the frequency analysis unit 230. The processing executed by the second local feature quantity calculation unit 240 will be described in detail later. The second local feature value calculated by the second local feature value calculation unit 240 is output to the learning unit 250.
 学習部250は、第1局所特徴量算出部220及び第2局所特徴量算出部240で算出された局所特徴量と、教師フレーム情報入力部120から入力される教師フレーム情報との対応関係を学習可能に構成されている。学習部250が実行する処理については、後に詳述する。学習部250の学習結果は、学習結果出力部300に出力される構成となっている。 The learning unit 250 learns the correspondence between the local feature values calculated by the first local feature value calculating unit 220 and the second local feature value calculating unit 240 and the teacher frame information input from the teacher frame information input unit 120. It is configured to be possible. The processing executed by the learning unit 250 will be described in detail later. The learning result of the learning unit 250 is output to the learning result output unit 300.
 学習結果出力部300は、学習部250の学習結果を生体音の解析に利用できるような態様で出力可能に構成されている。例えば、学習結果出力部300は、学習部250の学習結果を、生体音解析装置のメモリ等に出力可能に構成されている。 The learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 in such a manner that it can be used for the analysis of the body sound. For example, the learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 to a memory or the like of the biological sound analyzer.
 <動作説明>
 次に、上述したフレーム判定学習器で実行されるフレーム判定学習動作の流れについて、図2を参照して説明する。図2は、本実施例に係るフレーム判定学習動作の流れを示すフローチャートである。
<Description of operation>
Next, the flow of the frame determination learning operation executed by the above-described frame determination learning device will be described with reference to FIG. FIG. 2 is a flowchart illustrating the flow of the frame determination learning operation according to the present embodiment.
 図2に示すように、本実施例に係るフレーム学習動作時には、まず学習部250が初期化される(ステップS100)。続いて、教師音声信号入力部110によって教師音声信号が取得される(ステップS101)。教師音声信号入力部110は、取得した教師音声信号を処理部200に出力する。教師音声信号は、「第1情報」の一具体例である。 As shown in FIG. 2, during the frame learning operation according to the present embodiment, the learning unit 250 is first initialized (step S100). Subsequently, a teacher voice signal is acquired by the teacher voice signal input unit 110 (step S101). The teacher voice signal input unit 110 outputs the acquired teacher voice signal to the processing unit 200. The teacher voice signal is a specific example of “first information”.
 続いて、フレーム分割部210によって、呼吸音が複数のフレームに分割される(ステップS102)。以下では、呼吸音信号のフレーム分割について、図3を参照して具体的に説明する。図3は、教師音声信号のフレーム分割処理を示す概念図である。 Subsequently, the breathing sound is divided into a plurality of frames by the frame dividing unit 210 (step S102). Hereinafter, frame division of the respiratory sound signal will be specifically described with reference to FIG. FIG. 3 is a conceptual diagram showing a frame division process of the teacher voice signal.
 図3に示すように、教師音声信号は、所定の間隔で複数のフレームに分割される。このフレームは、後述する局所特徴量の算出処理を好適に実行するための処理単位として設定されるものであり、1フレーム当たりの期間は、例えば12msecとされている。 As shown in FIG. 3, the teacher voice signal is divided into a plurality of frames at a predetermined interval. This frame is set as a processing unit for suitably executing a local feature amount calculation process to be described later, and the period per frame is, for example, 12 msec.
 図2に戻り、フレーム分割された教師音声信号は、第1局所特徴量算出部220に入力され、第1局所特徴量が算出される(ステップS103)。また、フレーム分割された教師音声信号は、周波数解析部230によって周波数解析され、第2局所特徴量算出部240に入力される。第2局所特徴量算出部240では、周波数解析された教師音声信号(例えば、周波数特性を示すスペクトラム)に基づいて、第2局所特徴量が算出される。 Referring back to FIG. 2, the teacher audio signal divided into frames is input to the first local feature amount calculation unit 220, and the first local feature amount is calculated (step S103). Further, the teacher voice signal divided into frames is subjected to frequency analysis by the frequency analysis unit 230 and input to the second local feature amount calculation unit 240. In the second local feature amount calculation unit 240, the second local feature amount is calculated based on the teacher voice signal subjected to frequency analysis (for example, a spectrum indicating frequency characteristics).
 以下では、第1局所特徴量算出部220による第1局所特徴量の算出処理、及び第2局所特徴量算出部240による第2局所特徴量の算出処理について、図4から図6を参照して詳細に説明する。図4は、第1局所特徴量の算出処理を示すフローチャートである。図5は、第2局所特徴量の算出処理を示すフローチャートである。図6は、波形及びスペクトラムから得られる局所特徴量ベクトルを示す図である。 Hereinafter, the calculation process of the first local feature value by the first local feature value calculation unit 220 and the calculation process of the second local feature value by the second local feature value calculation unit 240 will be described with reference to FIGS. 4 to 6. This will be described in detail. FIG. 4 is a flowchart showing the calculation process of the first local feature quantity. FIG. 5 is a flowchart showing the calculation process of the second local feature quantity. FIG. 6 is a diagram illustrating a local feature vector obtained from a waveform and a spectrum.
 図4に示すように、第1局所特徴量の算出時には、まず教師音声信号の波形が取得され(ステップS201)、プリフィルター処理が施される(ステップS202)。プリフィルター処理は、例えばハイパスフィルターを用いた処理であり、教師音声信号に含まれる余分な成分を除去することが可能である。 As shown in FIG. 4, at the time of calculating the first local feature, first, the waveform of the teacher speech signal is acquired (step S201), and prefiltering is performed (step S202). The pre-filter process is a process using a high-pass filter, for example, and can remove an extra component included in the teacher voice signal.
 続いて、プリフィルター処理が施された教師音声信号を用いて、局所分散値が算出される(ステップS203)。局所分散値は、例えば第1期間w1における教師音声信号のばらつきを示す第1分散値、及び第1期間w1を含む第2期間w2における教師音声信号のばらつきを示す第2分散値として算出される。このようにして算出される局所分散値は、異常音の中でも特に断続性ラ音(例えば、水泡音)を判定するための局所特徴量として機能する。 Subsequently, a local variance value is calculated using the teacher voice signal that has been subjected to the prefiltering process (step S203). The local variance value is calculated as, for example, a first variance value indicating variation in the teacher speech signal in the first period w1, and a second variance value indicating variation in the teacher speech signal in the second period w2 including the first period w1. . The local variance calculated in this way functions as a local feature amount for determining an intermittent rale (for example, a water bubble sound) among abnormal sounds.
 局所分散値が算出されると、教師音声信号の各フレームにおける局所分散値の最大値が算出され(ステップS204)、第1局所特徴量として出力される(ステップS205)。 When the local variance value is calculated, the maximum value of the local variance value in each frame of the teacher speech signal is calculated (step S204) and output as the first local feature amount (step S205).
 図5に示すように、第2局所特徴量の算出時には、まず周波数解析によって得られたスペクトラムが取得され(ステップS301)、CMN(Cepstral Mean Normalization)処理が実行される(ステップS302)。CMN処理では、教師音声信号からセンサや環境等の定常的に畳み込まれている特性を除去することができる。 As shown in FIG. 5, when calculating the second local feature, first, a spectrum obtained by frequency analysis is acquired (step S301), and a CMN (Cepstral Mean Normalization) process is executed (step S302). In the CMN processing, it is possible to remove characteristics such as sensors and environment that are constantly convoluted from the teacher voice signal.
 CMN処理が施された教師音声信号には更に、包絡成分を抽出するためのリフタリング処理(ステップS303)及び微細成分を抽出するためのリフタリング処理(ステップS304)が実行される。リフタリング処理は、ケプストラムから所定のケフレンシー成分をカットする処理である。 Further, a liftering process for extracting an envelope component (step S303) and a liftering process for extracting a fine component (step S304) are performed on the teacher speech signal subjected to the CMN process. The liftering process is a process of cutting a predetermined cefency component from the cepstrum.
 上述したCMN処理及びリフタリング処理によれば、他の生体音に埋もれてしまう連続性ラ音(例えば、類鼾音、笛声音、捻髪音等)を判別し易い状態にすることができる。なお、CMN処理及びリフタリング処理については、既存の技術であるため、ここでのより詳細な説明については省略する。 According to the above-described CMN processing and liftering processing, it is possible to make it easy to determine continuous rales (for example, analogy sounds, whistle sounds, haircut sounds, etc.) that are buried in other biological sounds. Since the CMN process and the liftering process are existing techniques, a detailed description thereof is omitted here.
 微細成分を抽出するリフタリング処理が行われた教師音声信号については、KL情報量を用いた強調処理が実行され特徴量が算出される。KL情報量は、観測値Pと基準値Q(例えば、理論値、モデル値、予測値等)とを用いて算出されるパラメータであり、基準値Qに対して特徴のある観測値Pが現れると、KL情報量は大きな値として算出される。KL情報量を用いた処理によれば、教師音声信号に含まれているトーン性成分(即ち、連続性ラ音を判別するための成分)が強調され明確になる。 For the teacher speech signal that has been subjected to the liftering process for extracting the fine components, the enhancement process using the KL information amount is executed to calculate the feature quantity. The KL information amount is a parameter calculated using the observed value P and the reference value Q (for example, theoretical value, model value, predicted value, etc.), and a characteristic observed value P appears with respect to the reference value Q. Then, the KL information amount is calculated as a large value. According to the processing using the KL information amount, the tone component (that is, the component for discriminating the continuous rarity) included in the teacher speech signal is emphasized and clarified.
 他方、周波数解析によって得られたスペクトラムには、HAAR-LIKE特徴の算出も実行される(ステップS306)。HAAR-LIKE特徴は主に画像処理の分野で用いられる技術であるが、ここでは周波数ごとの振幅値を画像処理における画素値に対応付けることによって、同様の手法でスペクトラムから算出される。なお、HAAR-LIKE特徴の算出については、既存の技術であるため、ここでの詳細な説明は省略する。 On the other hand, HAAR-LIKE features are also calculated for the spectrum obtained by frequency analysis (step S306). The HAAR-LIKE feature is a technique mainly used in the field of image processing. Here, the HAAR-LIKE feature is calculated from the spectrum by a similar method by associating the amplitude value for each frequency with the pixel value in the image processing. Note that the calculation of the HAAR-LIKE feature is an existing technique, and thus detailed description thereof is omitted here.
 以上のように教師音声信号に対し各種処理を施して複数の特徴量が周波数ごとに算出されると、周波数帯域別に平均値が算出され(ステップS307)、第2局所特徴量として出力される(ステップS307)。 As described above, when various processes are performed on the teacher speech signal and a plurality of feature amounts are calculated for each frequency, an average value is calculated for each frequency band (step S307), and is output as the second local feature amount ( Step S307).
 図6に示すように、呼吸音信号の波形及びスペクトラムからは、上述した処理によって、複数種類の局所特徴量(即ち、第1局所特徴量及び第2局所特徴量)が得られる。これらは、フレーム毎に局所特徴量ベクトルとして出力される。 As shown in FIG. 6, a plurality of types of local feature quantities (that is, a first local feature quantity and a second local feature quantity) are obtained from the waveform and spectrum of the respiratory sound signal by the above-described processing. These are output as local feature vectors for each frame.
 再び図2に戻り、局所特徴量が算出されると、続いて教師フレーム情報入力部120により、教師フレーム情報が取得される(ステップS106)。教師フレーム情報は、「第2情報」の一具体例である。取得された教師フレーム情報は、局所特徴量と共に学習部250に出力される。 2 again, when the local feature amount is calculated, teacher frame information is acquired by the teacher frame information input unit 120 (step S106). The teacher frame information is a specific example of “second information”. The acquired teacher frame information is output to the learning unit 250 together with the local feature amount.
 以下では、学習部250における学習動作について、図7を参照して説明する。図7は、局所特徴量と対応付けられる教師フレーム情報を示す図である。 Hereinafter, the learning operation in the learning unit 250 will be described with reference to FIG. FIG. 7 is a diagram illustrating teacher frame information associated with a local feature amount.
 図7に示すように、学習部250では、局所特徴量と、教師フレーム情報とが、フレーム毎に対のデータとしてセットされる(ステップS107)。これにより、局所特徴量と異常音の発生タイミングとが対応付けられる。 As shown in FIG. 7, in the learning unit 250, the local feature amount and the teacher frame information are set as a pair of data for each frame (step S107). Thereby, a local feature-value and the generation | occurrence | production timing of abnormal sound are matched.
 より具体的には、異常音が発生しているタイミングを示す教師フレーム情報と対応付けられた局所特徴量ベクトルは、異常音が発生している場合の局所特徴量として学習されることになる。他方、異常音が発生していないタイミングを示す教師フレーム情報と対応付けられた局所特徴量ベクトルは、異常音が発生していない場合の局所特徴量として学習されることになる。 More specifically, the local feature amount vector associated with the teacher frame information indicating the timing at which the abnormal sound is generated is learned as the local feature amount when the abnormal sound is generated. On the other hand, the local feature vector associated with the teacher frame information indicating the timing at which no abnormal sound is generated is learned as a local feature when no abnormal sound is generated.
 再び図2に戻り、上述した局所特徴量と教師フレーム情報との対応づけはフレーム毎に実行されるため、セットが完了すると、全てのフレームについてセットが完了したか否かが判定され(ステップS108)、全フレームについてセットが完了していない場合には(ステップS108:NO)、未セットのフレームについて再びステップS103及びS104以降の処理が繰り返される。 Returning to FIG. 2 again, the association between the local feature amount and the teacher frame information described above is executed for each frame. When the setting is completed, it is determined whether or not the setting is completed for all the frames (step S108). ) If the setting has not been completed for all the frames (step S108: NO), the processes after steps S103 and S104 are repeated for the unset frames.
 一方、全てのフレームについてセットが完了した場合には(ステップS108:YES)、複数の教師データ全てについて処理が完了したか否かが判定される(ステップS109)。そして、全ての教師データについて処理が完了していないと判定された場合には(ステップS109:NO)、ステップS101以降の処理が再び実行される。このように処理を繰り返すことで、全ての教師データの、全てのフレームについて、対応づけが行われることになる。 On the other hand, if the setting has been completed for all the frames (step S108: YES), it is determined whether or not the processing has been completed for all the plurality of teacher data (step S109). If it is determined that the processing has not been completed for all the teacher data (step S109: NO), the processing after step S101 is executed again. By repeating the process in this way, the correspondence is performed for all the frames of all the teacher data.
 その後、実際に学習部250による学習処理が実行される(ステップS110)。学習処理は、例えばAdaBoost等の機械学習アルゴリズムを利用して行われる。なお、学習処理には、上述したAdaBoostの他、既存の手法を適宜採用することができるため、ここでの詳細な説明は省略する。 Thereafter, the learning process by the learning unit 250 is actually executed (step S110). The learning process is performed using a machine learning algorithm such as AdaBoost, for example. Note that, in addition to the above-described AdaBoost, an existing method can be appropriately employed for the learning process, and thus detailed description thereof is omitted here.
 上述した学習処理は異常音の音種毎に行われるため、処理終了後には、全音種について学習処理が終了したか否かが判定され、(ステップS111)、全音種について学習処理が完了していない場合には(ステップS111:NO)、未完了の音種について再びステップS110の学習処理が実行される。 Since the learning process described above is performed for each abnormal sound type, it is determined whether or not the learning process has been completed for all sound types after the process ends (step S111), and the learning process has been completed for all sound types. If not (step S111: NO), the learning process of step S110 is executed again for the incomplete sound type.
 一方、全ての音種について学習処理が完了した場合には(ステップS111:YES)、生体音解析装置に学習結果が出力される(ステップS112)。 On the other hand, when the learning process is completed for all the sound types (step S111: YES), the learning result is output to the body sound analyzer (step S112).
 <フレーム判定学習による効果>
 上述したフレーム判定学習の学習結果は、生体音解析装置による呼吸音の解析に利用される。呼吸音の解析時には、上述したフレーム判定学習動作における教師音声信号に対する処理と同様の処理が、解析対象である呼吸音信号に対して実行される。具体的には、入力された呼吸音信号がフレーム分割され、フレームごとに局所特徴量が算出される。
<Effects of frame judgment learning>
The learning result of the frame determination learning described above is used for the analysis of the respiratory sound by the body sound analyzer. When analyzing the respiratory sound, the same processing as the processing for the teacher speech signal in the frame determination learning operation described above is executed for the respiratory sound signal to be analyzed. Specifically, the input respiratory sound signal is divided into frames, and a local feature amount is calculated for each frame.
 本実施例では、上述した学習動作により、局所特徴量と異常音の発生タイミングとの関係が学習されているため、呼吸音から算出された局所特徴量を利用して、好適に異常音の発生タイミング(言い換えれば、異常音が発生しているフレームの時間的位置)を検出することができる。よって、学習動作を事前に行わない場合と比べると、極めて正確に異常音の判別を行うことができる。 In this embodiment, since the relationship between the local feature amount and the generation timing of the abnormal sound is learned by the learning operation described above, the abnormal sound is preferably generated using the local feature amount calculated from the respiratory sound. The timing (in other words, the temporal position of the frame where the abnormal sound is generated) can be detected. Therefore, it is possible to determine abnormal sounds extremely accurately as compared with the case where the learning operation is not performed in advance.
 <最適閾値の決定>
 次に、本実施例に係る最適閾値決定動作について、図8及び図10を参照して説明する、なお、最適閾値決定動作とは、異常音が発生していると判定された期間の割合に基づいて、実際の異常音の発生の有無を判定する際に用いられる閾値を、最適な値として決定するための動作である。
<Determination of optimal threshold>
Next, the optimum threshold value determining operation according to the present embodiment will be described with reference to FIGS. 8 and 10. Note that the optimum threshold value determining operation is a ratio of a period in which it is determined that an abnormal sound is occurring. This is an operation for determining, as an optimum value, a threshold value used when determining the presence or absence of actual occurrence of abnormal sound.
 <閾値決定部の構成>
 まず、最適閾値決定動作を実行する閾値決定部の構成について、図8を参照して説明する。図8は、本実施例に係る閾値決定部の構成を示すブロック図である。
<Configuration of threshold determination unit>
First, the configuration of the threshold value determination unit that executes the optimum threshold value determination operation will be described with reference to FIG. FIG. 8 is a block diagram illustrating the configuration of the threshold value determination unit according to the present embodiment.
 図8に示すように、本実施例に係る閾値決定部は、フレーム判定結果入力部410と、教師全体情報入力部420と、決定部500と、閾値出力部600とを備えて構成されている。 As illustrated in FIG. 8, the threshold value determination unit according to the present embodiment includes a frame determination result input unit 410, an entire teacher information input unit 420, a determination unit 500, and a threshold output unit 600. .
 フレーム判定結果入力部410は、学習結果を利用した教師音声信号のフレーム判定処理(即ち、フレーム毎に異常音が発生しているか否かを判定する処理)の結果を取得して、決定部500に出力可能に構成されている。 The frame determination result input unit 410 acquires the result of the frame determination process of the teacher audio signal using the learning result (that is, the process of determining whether or not abnormal sound is generated for each frame), and determines the determination unit 500 It is configured to be able to output to.
 教師全体情報入力部120は、教師データに含まれる教師全体情報を取得して、決定部500に出力可能に構成されている。 The entire teacher information input unit 120 is configured to be able to acquire the entire teacher information included in the teacher data and output it to the determination unit 500.
 決定部500は、大局特徴量算出部510と、ROC解析(Receiver Operating Characteristic analysis)部520と、最適閾値算出部530とを備えて構成されている。 The determination unit 500 includes a global feature amount calculation unit 510, an ROC analysis (Receiver Operating Characteristic analysis) unit 520, and an optimum threshold value calculation unit 530.
 大局特徴量算出部510は、フレーム判定処理の判定結果に基づいて、教師音声信号が入力された期間に対する、異常音が発生している発生時間の割合を算出可能に構成されている。大局特徴量算出部510で算出された異常音の発生時間の割合を示す情報は、ROC解析部520に出力される構成となっている。 The global feature quantity calculation unit 510 is configured to be able to calculate the ratio of the occurrence time when the abnormal sound is generated to the period during which the teacher voice signal is input, based on the determination result of the frame determination process. Information indicating the ratio of occurrence time of abnormal sound calculated by the global feature amount calculation unit 510 is output to the ROC analysis unit 520.
 ROC解析部520は、異常音の発生時間の割合を示す情報と教師全体情報との関係に基づいて、異常音の発生時間の割合に対するある閾値とその閾値を用いて判別を行った場合の判別性能との関係性をROC曲線として取得するROC解析を実行可能に構成されている。なお、ROC解析については既存の技術であるため、ここでの詳細な説明は省略する。ROC解析部520の解析結果は、最適閾値算出部530に出力される構成となっている。 The ROC analysis unit 520 determines, based on the relationship between the information indicating the ratio of the abnormal sound occurrence time and the entire teacher information, a certain threshold for the ratio of the abnormal sound occurrence time and the threshold. An ROC analysis for acquiring a relationship with performance as an ROC curve is configured to be executable. Since the ROC analysis is an existing technique, detailed description thereof is omitted here. The analysis result of the ROC analysis unit 520 is configured to be output to the optimum threshold value calculation unit 530.
 最適閾値算出部530は、ROC解析部520の解析結果を利用して、異常音が発生している発生時間の割合から、異常音が実際に発生しているか否かを判定するための最適な閾値(ROC曲線において基準点(0,1)に最も近い点を与える閾値)を算出することが可能に構成されている(図9参照)。最適閾値算出部530で算出された閾値は、閾値出力部600に出力される構成となっている。 The optimum threshold value calculation unit 530 uses the analysis result of the ROC analysis unit 520 to determine whether or not the abnormal sound is actually generated from the ratio of the generation time during which the abnormal sound is generated. A threshold value (threshold value that gives a point closest to the reference point (0, 1) in the ROC curve) can be calculated (see FIG. 9). The threshold value calculated by the optimum threshold value calculation unit 530 is output to the threshold value output unit 600.
 閾値出力部600は、最適閾値算出部530で算出された閾値を、生体音の解析に利用できるような態様で出力可能に構成されている。例えば、閾値出力部300は、最適閾値算出部530で算出された閾値を、生体音解析装置のメモリ等に出力可能に構成されている。 The threshold value output unit 600 is configured to be able to output the threshold value calculated by the optimum threshold value calculation unit 530 in a manner that can be used for analysis of a body sound. For example, the threshold value output unit 300 is configured to be able to output the threshold value calculated by the optimal threshold value calculation unit 530 to a memory or the like of the biological sound analyzer.
 <動作説明>
 次に、上述した閾値決定部で実行される最適閾値決定動作の流れについて、図10を参照して説明する。図10は、本実施例に係る最適閾値決定動作の流れを示すフローチャートである。
<Description of operation>
Next, the flow of the optimum threshold value determination operation executed by the above-described threshold value determination unit will be described with reference to FIG. FIG. 10 is a flowchart showing the flow of the optimum threshold value determining operation according to the present embodiment.
 図10に示すように、本実施例に係る最適閾値決定動作時には、まずフレーム判定結果入力部410によって教師音声信号のフレーム判定結果が取得される(ステップS201)。続いて、全フレームのフレーム判定結果が取得されたか否かが判定される(ステップS202)。全フレームのフレーム判定結果が取得されていない場合には(ステップS202:NO)、未取得のフレームについて再びステップS201の処理が実行される。 As shown in FIG. 10, in the optimum threshold value determining operation according to this embodiment, first, the frame determination result of the teacher voice signal is acquired by the frame determination result input unit 410 (step S201). Subsequently, it is determined whether or not the frame determination results for all the frames have been acquired (step S202). When the frame determination results for all the frames have not been acquired (step S202: NO), the process of step S201 is executed again for the unacquired frames.
 一方で、全フレームのフレーム判定結果が取得されている場合には(ステップS202:YES)、取得されたフレーム判定結果が、大局特徴量算出部510に出力され、教師音声信号の区間全体のフレーム数に対する、異常音が発生していると判定されたフレームの数の割合(即ち、大局特徴量)が算出される(ステップS203)。大局特徴量は、「第4情報」の一具体例である。 On the other hand, when the frame determination results of all the frames have been acquired (step S202: YES), the acquired frame determination results are output to the global feature amount calculation unit 510, and the frames of the entire section of the teacher speech signal The ratio of the number of frames determined to have an abnormal sound to the number (that is, the global feature amount) is calculated (step S203). The global feature amount is a specific example of “fourth information”.
 続いて、教師全体情報入力部420によって教師データに含まれる教師全体情報が取得される(ステップS204)。教師全体情報は、「第3情報」の一具体例である。 Subsequently, the entire teacher information included in the teacher data is acquired by the entire teacher information input unit 420 (step S204). The entire teacher information is a specific example of “third information”.
 教師全体情報は、大局特徴量と共にROC解析部520に出力され、大局特徴量と教師全体情報が対のデータとしてセットされる(ステップS205)。即ち、異常音が発生していると判定されたフレームの数の割合と、異常音の発生の有無を示す情報とが対応づけられる。その後、全教師データについてセットが完了したか否かが判定される(ステップS206)。全教師データのセットが完了していない場合には(ステップS206:NO)、未セットの教師データについて再びステップS201以降の処理が実行される。 The entire teacher information is output to the ROC analysis unit 520 together with the global feature quantity, and the global feature quantity and the overall teacher information are set as a pair of data (step S205). That is, the ratio of the number of frames determined to have an abnormal sound and the information indicating whether or not the abnormal sound has occurred are associated with each other. Thereafter, it is determined whether or not the setting has been completed for all teacher data (step S206). When the setting of all the teacher data has not been completed (step S206: NO), the processing after step S201 is executed again for the unset teacher data.
 一方で、全教師データのセットが完了している場合には(ステップS206:YES)、ROC解析部520によるROC解析が実行され、閾値と判別性能の関係がROC曲線として求められる(ステップS207)。ROC解析が終了すると、最適閾値算出部530によってROC解析結果に応じた最適な閾値が算出される(ステップS208)。 On the other hand, when the setting of all teacher data has been completed (step S206: YES), ROC analysis is performed by the ROC analysis unit 520, and the relationship between the threshold value and the discrimination performance is obtained as an ROC curve (step S207). . When the ROC analysis ends, the optimum threshold value calculation unit 530 calculates an optimum threshold value according to the ROC analysis result (step S208).
 上述したROC解析及び閾値算出処理は、異常音の音種毎に実行されるため、終了後には全音種について処理が終了したか否かが判定される(ステップS209)。そして、全音種について処理が完了していない場合には(ステップS209:NO)、未完了の音種について、ステップS207及びステップS208の処理が実行される。全音種について処理が完了している場合には(ステップS209:YES)、閾値が出力され(ステップS210)、一連の処理は終了する。 Since the above-described ROC analysis and threshold value calculation process is executed for each abnormal sound type, it is determined whether or not the process has been completed for all sound types after the completion (step S209). If the processing has not been completed for all sound types (step S209: NO), the processing of step S207 and step S208 is executed for the incomplete sound types. If the processing has been completed for all sound types (step S209: YES), a threshold is output (step S210), and the series of processing ends.
 <最適閾値決定による効果>
 上述したように決定された閾値は、生体音解析装置による呼吸音の解析に利用される。例えば、呼吸音の解析時には、上述したフレーム判定学習動作における教師音声信号に対する処理と同様の処理が、解析対象である呼吸音信号に対して実行される。具体的には、入力された呼吸音信号のフレーム判定結果から、呼吸音の取得期間に対する、異常音が発生している期間の割合(即ち、大局特徴量)が算出される。この結果、例えば異常音が発生している期間の割合が多い場合には、実際に異常音が発生していると判定できる。一方で、異常音が発生している期間の割合が少ない場合には、フレーム単位では異常音が検出されているものの、実際には異常音が発生していないと判定できる。
<Effect of determining optimum threshold value>
The threshold value determined as described above is used for analysis of respiratory sounds by the body sound analyzer. For example, when analyzing the respiratory sound, the same processing as the processing for the teacher speech signal in the frame determination learning operation described above is executed for the respiratory sound signal to be analyzed. Specifically, from the frame determination result of the input respiratory sound signal, the ratio of the period in which abnormal sound is occurring to the respiratory sound acquisition period (that is, the global feature value) is calculated. As a result, for example, when the ratio of the period in which abnormal sound is occurring is large, it can be determined that the abnormal sound is actually generated. On the other hand, when the ratio of the period in which the abnormal sound is generated is small, it can be determined that the abnormal sound is not actually generated although the abnormal sound is detected in the frame unit.
 このような異常音の有無に関する判定は、上述した最適閾値決定動作によって決定された閾値との比較によって実現される。具体的には、異常音が発生している期間の割合が閾値より大きい場合には異常音が発生していると判定でき、閾値より小さい場合には異常音が発生していないと判定できる。ここで本実施例では特に、上述した最適閾値決定動作において、大局特徴量と教師全体データ(即ち、異常音の発生の有無を示す情報)との対応づけが行われ、その結果として最適な閾値が算出されている。よって、呼吸音から算出された大局特徴量に基づいて、極めて正確に異常音の発生の有無を判別することができる。 The determination regarding the presence or absence of such abnormal sound is realized by comparison with the threshold value determined by the above-described optimum threshold value determination operation. Specifically, it can be determined that the abnormal sound is generated when the ratio of the period in which the abnormal sound is generated is larger than the threshold, and it can be determined that the abnormal sound is not generated when the ratio is smaller than the threshold. Here, in the present embodiment, in particular, in the above-described optimum threshold value determination operation, the global feature amount and the entire teacher data (that is, information indicating the presence / absence of abnormal sound) are associated, and as a result, the optimum threshold value Is calculated. Therefore, it is possible to determine the presence / absence of abnormal sound extremely accurately based on the global feature amount calculated from the respiratory sound.
 本発明は、上述した実施形態に限られるものではなく、特許請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う生体音解析方法、プログラム、記憶媒体及び生体音解析装置もまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit or idea of the invention that can be read from the claims and the entire specification, and biological sound analysis accompanied by such changes A method, a program, a storage medium, and a biological sound analysis device are also included in the technical scope of the present invention.
 110 教師音声信号入力部
 120 教師フレーム情報入力部
 200 処理部
 210 フレーム分割部
 220 第1局所特徴量算出部
 230 周波数解析部
 240 第2局所特徴量算出部
 250 学習部
 300 学習結果出力部
 410 フレーム判定結果入力部
 420 教師全体情報入力部
 500 決定部
 510 大局特徴量算出部
 520 ROC解析部
 530 最適閾値算出部
 600 閾値出力部
DESCRIPTION OF SYMBOLS 110 Teacher speech signal input part 120 Teacher frame information input part 200 Processing part 210 Frame division part 220 1st local feature-value calculation part 230 Frequency analysis part 240 2nd local feature-value calculation part 250 Learning part 300 Learning result output part 410 Frame determination Result input unit 420 Overall teacher information input unit 500 Determination unit 510 Global feature amount calculation unit 520 ROC analysis unit 530 Optimal threshold calculation unit 600 Threshold output unit

Claims (7)

  1.  生体音を解析する生体音解析装置に利用される生体音解析方法であって、
     生体音に関する第1情報を取得する第1取得工程と、
     前記第1情報における、異常音が発生しているタイミングを示す第2情報を取得する第2取得工程と、
     前記第1情報と前記第2情報との対応関係を学習する学習工程と、
     前記学習工程による学習結果に基づいて、入力された生体音情報に含まれる異常音を判別する判別工程と、
     を含むことを特徴とする生体音解析方法。
    A body sound analysis method used in a body sound analysis apparatus for analyzing body sounds,
    A first acquisition step of acquiring first information related to a body sound;
    A second acquisition step of acquiring second information indicating the timing of occurrence of abnormal sound in the first information;
    A learning step of learning a correspondence relationship between the first information and the second information;
    A discrimination step for discriminating abnormal sounds included in the input body sound information based on the learning result of the learning step;
    A body sound analysis method comprising:
  2.  前記第1情報に基づいて、前記第1情報における特徴量を示す特徴量情報を生成する第1生成工程を更に含み、
     前記学習工程は、前記第1情報と前記第2情報との対応関係に代えて、前記特徴量情報と前記第2情報との対応関係を学習する
     ことを特徴とする請求項1に記載の生体音解析方法。
    Based on the first information, further including a first generation step of generating feature amount information indicating a feature amount in the first information;
    The living body according to claim 1, wherein the learning step learns a correspondence relationship between the feature amount information and the second information instead of a correspondence relationship between the first information and the second information. Sound analysis method.
  3.  前記第1情報及び前記第2情報を所定のフレーム単位に分割する分割工程を更に備え、
     前記学習工程は、前記所定のフレーム単位で学習する
     ことを特徴とする請求項1又は2に記載の生体音解析方法。
    A division step of dividing the first information and the second information into predetermined frame units;
    The biological sound analysis method according to claim 1, wherein the learning step learns in units of the predetermined frame.
  4.  前記第1情報における前記異常音の発生の有無を示す第3情報を取得する第3取得工程と、
     前記第1情報及び前記学習工程による前記学習結果に基づいて、前記第1情報が取得された期間に対する、前記異常音が発生している期間の割合を示す第4情報を算出する算出工程と、
     前記第3情報及び前記第4情報に基づいて、前記入力された生体音情報に異常音が含まれるか否かを判定するための閾値を決定する決定工程と、
     を更に含むことを特徴とする請求項1から3のいずれか一向に記載の生体音解析方法。
    A third acquisition step of acquiring third information indicating presence or absence of occurrence of the abnormal sound in the first information;
    A calculation step of calculating fourth information indicating a ratio of a period in which the abnormal sound is generated to a period in which the first information is acquired based on the first information and the learning result by the learning step;
    A determination step of determining a threshold for determining whether or not abnormal sound is included in the input body sound information based on the third information and the fourth information;
    The biological sound analysis method according to any one of claims 1 to 3, further comprising:
  5.  請求項1に記載の生体音解析方法を、前記生体音解析装置に実行させるためのプログラム。 A program for causing the biological sound analysis apparatus to execute the biological sound analysis method according to claim 1.
  6.  請求項5に記載のプログラムを記憶した記憶媒体。 A storage medium storing the program according to claim 5.
  7.  生体音に関する生体音情報を取得する第1取得部と、
     学習結果に基づいて、前記生体音に含まれる異常音を判別する判別部と、
     を備え、
     前記学習結果は、生体音に関する第1情報と、前記生体音における異常音が発生しているタイミングを示す第2情報とに基づいて、前記第1情報と前記第2情報との対応関係を学習した学習結果である、ことを特徴とする生体音解析装置。
    A first acquisition unit that acquires biological sound information related to the biological sound;
    A discriminator for discriminating an abnormal sound included in the biological sound based on a learning result;
    With
    The learning result learns the correspondence between the first information and the second information based on the first information about the body sound and the second information indicating the timing at which the abnormal sound in the body sound is generated. The biological sound analysis device characterized by being a learning result.
PCT/JP2017/045777 2016-12-20 2017-12-20 Bioacoustic analysis method, program, storage medium, and bioacoustic analysis device WO2018117171A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2018558046A JP6672478B2 (en) 2016-12-20 2017-12-20 Body sound analysis method, program, storage medium, and body sound analysis device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016246559 2016-12-20
JP2016-246559 2016-12-20

Publications (1)

Publication Number Publication Date
WO2018117171A1 true WO2018117171A1 (en) 2018-06-28

Family

ID=62626554

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/045777 WO2018117171A1 (en) 2016-12-20 2017-12-20 Bioacoustic analysis method, program, storage medium, and bioacoustic analysis device

Country Status (2)

Country Link
JP (1) JP6672478B2 (en)
WO (1) WO2018117171A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022044129A1 (en) * 2020-08-25 2022-03-03 日本電気株式会社 Pulmonary sound analysis system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009240527A (en) * 2008-03-31 2009-10-22 Yamaguchi Univ Apparatus and method for analyzing heart sound frequency
WO2011114526A1 (en) * 2010-03-19 2011-09-22 富士通株式会社 Bruxism detection device, bruxism detection method, and computer program for detecting bruxism
JP2012120688A (en) * 2010-12-08 2012-06-28 Sony Corp Respiratory condition analysis apparatus, respiratory condition display apparatus, processing method therein, and program
JP2013123494A (en) * 2011-12-13 2013-06-24 Sharp Corp Information analyzer, information analysis method, control program, and recording medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101165779B (en) * 2006-10-20 2010-06-02 索尼株式会社 Information processing apparatus and method, program, and record medium
JP6480124B2 (en) * 2014-08-19 2019-03-06 大学共同利用機関法人情報・システム研究機構 Biological detection device, biological detection method, and program

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2009240527A (en) * 2008-03-31 2009-10-22 Yamaguchi Univ Apparatus and method for analyzing heart sound frequency
WO2011114526A1 (en) * 2010-03-19 2011-09-22 富士通株式会社 Bruxism detection device, bruxism detection method, and computer program for detecting bruxism
JP2012120688A (en) * 2010-12-08 2012-06-28 Sony Corp Respiratory condition analysis apparatus, respiratory condition display apparatus, processing method therein, and program
JP2013123494A (en) * 2011-12-13 2013-06-24 Sharp Corp Information analyzer, information analysis method, control program, and recording medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022044129A1 (en) * 2020-08-25 2022-03-03 日本電気株式会社 Pulmonary sound analysis system

Also Published As

Publication number Publication date
JP6672478B2 (en) 2020-03-25
JPWO2018117171A1 (en) 2019-10-31

Similar Documents

Publication Publication Date Title
JP5229234B2 (en) Non-speech segment detection method and non-speech segment detection apparatus
US9959886B2 (en) Spectral comb voice activity detection
JP4572218B2 (en) Music segment detection method, music segment detection device, music segment detection program, and recording medium
WO2016015461A1 (en) Method and apparatus for detecting abnormal frame
CN108899033B (en) Method and device for determining speaker characteristics
JP6725186B2 (en) Learning device, voice section detection device, and voice section detection method
US20160180865A1 (en) Video-based sound source separation
KR101667557B1 (en) Device and method for sound classification in real time
JPWO2020013296A1 (en) A device for estimating mental and nervous system diseases
Khoa Noise robust voice activity detection
CN116597864A (en) Voice detection method and device
Poorjam et al. A parametric approach for classification of distortions in pathological voices
WO2018117171A1 (en) Bioacoustic analysis method, program, storage medium, and bioacoustic analysis device
JP6404780B2 (en) Wiener filter design apparatus, sound enhancement apparatus, acoustic feature quantity selection apparatus, method and program thereof
US7910820B2 (en) Information processing apparatus and method, program, and record medium
Metzger et al. Using Approximate Entropy as a speech quality measure for a speaker recognition system
Seong et al. WADA-W: A modified WADA SNR estimator for audio-visual speech recognition
Dov et al. Voice activity detection in presence of transients using the scattering transform
WO2018117170A1 (en) Biological-sound analysis device, biological-sound analysis method, program, and storage medium
JP6298339B2 (en) Respiratory sound analysis device, respiratory sound analysis method, computer program, and recording medium
JP7293826B2 (en) PROBLEM DETECTION DEVICE, PROBLEM DETECTION METHOD AND PROBLEM DETECTION PROGRAM
US20240071409A1 (en) Aerosol quantity estimation method, aerosol quantity estimation device, and recording medium
JP6827602B2 (en) Information processing equipment, programs and information processing methods
KR20190020471A (en) Apparatus and method for discriminating voice presence section
WO2022270327A1 (en) Articulation abnormality detection method, articulation abnormality detection device, and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17884609

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018558046

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17884609

Country of ref document: EP

Kind code of ref document: A1