WO2018117171A1

WO2018117171A1 - Bioacoustic analysis method, program, storage medium, and bioacoustic analysis device

Info

Publication number: WO2018117171A1
Application number: PCT/JP2017/045777
Authority: WO
Inventors: 隆真亀谷
Original assignee: パイオニア株式会社
Priority date: 2016-12-20
Filing date: 2017-12-20
Publication date: 2018-06-28
Also published as: JP6672478B2; JPWO2018117171A1

Abstract

Provided is a bioacoustic analysis method that is used in a bioacoustic analysis device for analyzing biological sound. This bioacoustic analysis method includes the following: a first acquisition step (S101) for acquiring first information relating to biological sound; a second acquisition step (S106) for acquiring second information indicating the timing, in the first information, at which an abnormal sound has arisen; a learning step (S110) for learning the correspondence between the first information and the second information; and a discrimination step for discriminating the abnormal sound contained in the inputted bioacoustic information on the basis of the learning results obtained from the learning step. According to this bioacoustic analysis method, the correspondence between the first information and the second information is learned in advance, thus making it possible to suitably distinguish abnormal sounds contained in biological sound.

Description

Body sound analysis method, program, storage medium, and body sound analysis apparatus

The present invention relates to a technical field of a biological sound analysis method, a program, a storage medium, and a biological sound analysis device for analyzing a biological sound such as a respiratory sound.

An apparatus that detects an abnormal sound (that is, a sound different from a normal respiratory sound) included in a respiratory sound detected by an electronic stethoscope or the like is known. For example, Patent Document 1 describes a technique of detecting a plurality of abnormal sounds (sub-noises) included in respiratory sounds by decomposing them into sound types.

International Publication No. 2016/002004

When analyzing an abnormal sound included in a biological sound, the acquired biological sound information and abnormal sound information stored in advance (specifically, a biological sound when an abnormal sound is actually generated) It is possible to determine the occurrence of an abnormal sound by comparing with (information). However, since the body sound information varies depending on individual differences, measurement environments, and the like, it is difficult to accurately determine whether or not an abnormal sound is generated simply by comparing the information. For this reason, unless an appropriate judgment criterion is set, there is a technical problem that the abnormal sound is generated but cannot be detected, or the abnormal sound is not detected but erroneously detected.

Examples of problems to be solved by the present invention include the above. An object of the present invention is to provide a body sound analysis method, a program, a storage medium, and a body sound analysis apparatus that can suitably analyze abnormal sounds included in body sounds.

The biological sound analysis method for solving the above-described problem is a biological sound analysis method used in a biological sound analysis device for analyzing biological sounds, the first acquisition step of acquiring first information related to biological sounds, A second acquisition step of acquiring second information indicating a timing at which an abnormal sound is generated in the first information; a learning step of learning a correspondence relationship between the first information and the second information; and the learning step And a discrimination step of discriminating an abnormal sound included in the input body sound information based on the learning result by.

The program for solving the above problem causes the biological sound analysis apparatus to execute the above-described biological sound analysis method.

A storage medium for solving the above problem stores the above-described program.

A biological sound analysis apparatus for solving the above problems includes a first acquisition unit that acquires biological sound information related to a biological sound, and a determination unit that determines an abnormal sound included in the biological sound based on a learning result. And the learning result is based on the first information about the body sound and the second information indicating the timing at which the abnormal sound is generated in the body sound, and the correspondence relationship between the first information and the second information. Is a learning result of learning.

It is a block diagram which shows the structure of the frame determination learning device which concerns on a present Example. It is a flowchart which shows the flow of the frame determination learning operation | movement which concerns on a present Example. It is a conceptual diagram which shows the frame division | segmentation process of a teacher audio | voice signal. It is a flowchart which shows the calculation process of a 1st local feature-value. It is a flowchart which shows the calculation process of a 2nd local feature-value. It is a figure which shows the local feature-value vector obtained from a waveform and a spectrum. It is a figure which shows the teacher frame information matched with a local feature-value. It is a block diagram which shows the structure of the threshold value determination part which concerns on a present Example. It is a conceptual diagram which shows an example of the processing content of ROC analysis. It is a flowchart which shows the flow of the optimal threshold value determination operation | movement which concerns on a present Example.

<1>
The body sound analysis method according to the present embodiment is a body sound analysis method used in a body sound analysis apparatus that analyzes body sounds, and includes a first acquisition step of acquiring first information related to body sounds, and the first. A second acquisition step of acquiring second information indicating the timing of occurrence of abnormal sound in the information, a learning step of learning a correspondence relationship between the first information and the second information, and learning by the learning step A discriminating step for discriminating an abnormal sound included in the input body sound information based on the result.

According to the body sound analysis method according to the present embodiment, the first information related to the body sound (for example, breathing sound) is acquired, and the timing at which the abnormal sound (for example, secondary noise) is generated in the first information is determined. Second information to be shown is acquired. The 1st information is acquired as information (for example, time-axis waveform which shows a living body sound) which shows change with time of a living body sound. On the other hand, the second information is desirably information that accurately indicates the timing at which the abnormal sound in the first information is generated. For this reason, it is preferable that the second information is information prepared in advance using the first information.

When the first information and the second information are acquired, the correspondence between the first information and the second information is learned. Specifically, parameters for deriving the second information from the first information are learned. There may be multiple types of this parameter. In order to make the learning result more accurate, the learning step is preferably executed a plurality of times using a plurality of first information and second information.

After the learning process, abnormal sounds included in the input body sound information are determined based on the learning result. The “input body sound information” is information regarding the body sound to be analyzed by the body sound analysis method according to the present embodiment, and is input separately from the first information and the second information described above. is there. Particularly in the present embodiment, since the correspondence between the first information and the second information is learned in advance, it is possible to accurately determine the timing at which the abnormal sound is generated from the input body sound information. Therefore, it is possible to appropriately discriminate abnormal sounds included in the body sound information.

<2>
In one aspect of the biological sound analysis method according to the present embodiment, the method further includes a first generation step of generating feature amount information indicating a feature amount in the first information based on the first information, and the learning step includes Instead of the correspondence relationship between the first information and the second information, the correspondence relationship between the feature amount information and the second information is learned.

According to this aspect, when the first information is acquired, the feature amount information indicating the feature amount in the first information is generated. Note that the “feature amount” is a value indicating the size (degree) of a feature that can be used to determine an abnormal sound included in a body sound.

In this aspect, in particular, the correspondence relationship between the feature amount information and the second information is learned instead of the correspondence relationship between the first information and the second information. Therefore, a more suitable learning result can be obtained in order to discriminate abnormal sounds included in the input body sound information.

<3>
In one aspect of the biological sound analysis method according to the present embodiment, the method further comprises a dividing step of dividing the first information and the second information into predetermined frame units, and the learning step learns in the predetermined frame units. .

According to this aspect, the first information and the second information are divided into predetermined frame units while learning is performed. The predetermined frame unit is set as a period in which an appropriate learning result can be obtained more easily. For this reason, it is possible to obtain a learning result more suitably by performing learning in units of predetermined frames.

<4>
In one aspect of the biological sound analysis method according to the present embodiment, a third acquisition step of acquiring third information indicating presence / absence of occurrence of the abnormal sound in the first information, and the first information and the learning step, Based on the learning result, a calculation step of calculating fourth information indicating a ratio of the period in which the abnormal sound is generated to the period in which the first information is acquired, and the third information and the fourth information And a determination step of determining a threshold for determining whether or not abnormal sound is included in the input body sound information.

According to this aspect, the third information indicating whether or not an abnormal sound is generated in the first information is acquired. It is desirable that the third information is information that accurately indicates whether or not an abnormal sound is generated in the first information. For this reason, it is preferable that the third information is information prepared in advance using the first information.

In this aspect, based on the first information and the learning result, fourth information indicating the ratio of the period in which the abnormal sound is generated to the period in which the first information is acquired is calculated. Specifically, the fourth information is calculated by analyzing the first information using the learning result.

When the third information is acquired and the fourth information is calculated, a threshold for determining whether or not an abnormal sound is included in the input body sound information is determined based on the third information and the fourth information. The This threshold value is a threshold value for determining whether or not an abnormal sound is actually included when determining an abnormal sound included in the input body sound information. It is a threshold value that is compared with a value indicating a ratio of a period in which abnormal sound is generated to a period in which information is acquired.

Using this threshold value, for example, it can be determined that an abnormal sound has occurred when the ratio of the period in which the abnormal sound related to the input body sound information is occurring is equal to or greater than the determined threshold value. On the other hand, when the ratio of the period when the abnormal sound related to the input body sound information is generated is less than the determined threshold, it can be determined that no abnormal sound is generated.

In this aspect, in particular, since the threshold value is determined based on the fourth information calculated using the learning result, it is possible to more accurately determine whether or not an abnormal sound has occurred.

<5>
The program according to the present embodiment causes the biological sound analysis apparatus to execute the biological sound analysis method described above.

According to the program according to the present embodiment, since the biological sound analysis method according to the present embodiment described above can be executed, it is possible to suitably determine abnormal sounds included in the biological sound information.

<6>
The storage medium according to the present embodiment stores the above-described program.

According to the storage medium according to the present embodiment, since the program according to the present embodiment described above can be executed, it is possible to appropriately determine an abnormal sound included in the body sound information.

<7>
The biological sound analysis apparatus according to the present embodiment includes a first acquisition unit that acquires biological sound information related to a biological sound, and a determination unit that determines abnormal sounds included in the biological sound based on a learning result. The learning result learns the correspondence between the first information and the second information based on the first information about the body sound and the second information indicating the timing at which the abnormal sound in the body sound is generated. The learning result.

According to the biological sound analysis apparatus according to the present embodiment, it is possible to appropriately determine abnormal sounds included in the biological sound information based on the learning result, similarly to the biological sound analysis method described above.

Note that the biological sound analysis apparatus according to the present embodiment can also adopt various aspects similar to the various aspects of the biological sound analysis method according to the present embodiment described above.

The operation and other gains of the biological sound analysis method, the program, the storage medium, and the biological sound analysis apparatus according to the present embodiment will be described in more detail in the following examples.

Hereinafter, embodiments of the body sound analysis method, program, storage medium, and body sound analysis apparatus will be described in detail with reference to the drawings. Hereinafter, a biological sound analysis method for analyzing respiratory sounds will be described as an example.

<Teacher data>
First, teacher data used in the body sound analysis method according to the present embodiment will be described.

Teacher data is data in which three sets of information of teacher voice signal, teacher frame information, and whole teacher information are set as one set, and a plurality of sets are prepared in advance.

The teacher voice signal is a signal (for example, a time axis waveform) indicating a temporal change in the breathing sound. The teacher frame information is information indicating the generation timing of abnormal sound in the teacher voice signal for each sound type. The entire teacher information is information indicating whether or not an abnormal sound is generated in the teacher voice signal for each sound type.

The teacher data is used for a learning operation described later, and the larger the number, the higher the learning effect (in other words, the accuracy of biological sound analysis).

<Frame judgment learning>
Next, frame determination learning of the biological sound analysis method according to the present embodiment will be described with reference to FIGS. Note that the frame determination learning is a learning operation for increasing the determination accuracy of the frame determination process for determining the occurrence of abnormal sound in units of frames.

<Configuration of learning device>
First, the configuration of a frame determination learner used for frame determination learning will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of a frame determination learner according to the present embodiment.

As shown in FIG. 1, the frame determination learner according to the present embodiment includes a teacher speech signal input unit 110, a teacher frame information input unit 120, a processing unit 200, and a learning result output unit 300. ing.

The teacher voice signal input unit 110 is configured to acquire a teacher voice signal included in the teacher data and output it to the processing unit 200.

The teacher frame information input unit 120 is configured to acquire teacher frame information included in the teacher data and to output it to the processing unit 200.

The processing unit 200 includes a plurality of arithmetic circuits and memories. The processing unit 200 includes a frame dividing unit 210, a first local feature value calculating unit 220, a frequency analyzing unit 230, a second local feature value calculating unit 240, and a learning unit 250.

The frame dividing unit 210 is configured to be able to execute a dividing process for dividing the teacher speech signal input from the teacher speech signal input unit 110 into a plurality of frames. The respiratory sound signal divided by the frame dividing unit 210 is output to the first local feature value calculating unit 220 and the frequency analyzing unit 230.

The first local feature amount calculation unit 220 is configured to be able to calculate the first local feature amount based on the waveform of the teacher speech signal. The processing executed by the first local feature quantity calculation unit 220 will be described in detail later. The first local feature value calculated by the first local feature value calculation unit 220 is output to the learning unit 250.

The frequency analysis unit 230 is configured to be able to execute a time frequency analysis process (for example, an FFT process) on the teacher audio signal input from the teacher audio signal input unit 110. The analysis result of the time frequency analysis unit 230 is configured to be output to the second local feature amount calculation unit 240.

The second local feature quantity calculation unit 240 is configured to be able to calculate the second local feature quantity based on the analysis result of the frequency analysis unit 230. The processing executed by the second local feature quantity calculation unit 240 will be described in detail later. The second local feature value calculated by the second local feature value calculation unit 240 is output to the learning unit 250.

The learning unit 250 learns the correspondence between the local feature values calculated by the first local feature value calculating unit 220 and the second local feature value calculating unit 240 and the teacher frame information input from the teacher frame information input unit 120. It is configured to be possible. The processing executed by the learning unit 250 will be described in detail later. The learning result of the learning unit 250 is output to the learning result output unit 300.

The learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 in such a manner that it can be used for the analysis of the body sound. For example, the learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 to a memory or the like of the biological sound analyzer.

<Description of operation>
Next, the flow of the frame determination learning operation executed by the above-described frame determination learning device will be described with reference to FIG. FIG. 2 is a flowchart illustrating the flow of the frame determination learning operation according to the present embodiment.

As shown in FIG. 2, during the frame learning operation according to the present embodiment, the learning unit 250 is first initialized (step S100). Subsequently, a teacher voice signal is acquired by the teacher voice signal input unit 110 (step S101). The teacher voice signal input unit 110 outputs the acquired teacher voice signal to the processing unit 200. The teacher voice signal is a specific example of “first information”.

Subsequently, the breathing sound is divided into a plurality of frames by the frame dividing unit 210 (step S102). Hereinafter, frame division of the respiratory sound signal will be specifically described with reference to FIG. FIG. 3 is a conceptual diagram showing a frame division process of the teacher voice signal.

As shown in FIG. 3, the teacher voice signal is divided into a plurality of frames at a predetermined interval. This frame is set as a processing unit for suitably executing a local feature amount calculation process to be described later, and the period per frame is, for example, 12 msec.

Referring back to FIG. 2, the teacher audio signal divided into frames is input to the first local feature amount calculation unit 220, and the first local feature amount is calculated (step S103). Further, the teacher voice signal divided into frames is subjected to frequency analysis by the frequency analysis unit 230 and input to the second local feature amount calculation unit 240. In the second local feature amount calculation unit 240, the second local feature amount is calculated based on the teacher voice signal subjected to frequency analysis (for example, a spectrum indicating frequency characteristics).

Hereinafter, the calculation process of the first local feature value by the first local feature value calculation unit 220 and the calculation process of the second local feature value by the second local feature value calculation unit 240 will be described with reference to FIGS. 4 to 6. This will be described in detail. FIG. 4 is a flowchart showing the calculation process of the first local feature quantity. FIG. 5 is a flowchart showing the calculation process of the second local feature quantity. FIG. 6 is a diagram illustrating a local feature vector obtained from a waveform and a spectrum.

As shown in FIG. 4, at the time of calculating the first local feature, first, the waveform of the teacher speech signal is acquired (step S201), and prefiltering is performed (step S202). The pre-filter process is a process using a high-pass filter, for example, and can remove an extra component included in the teacher voice signal.

Subsequently, a local variance value is calculated using the teacher voice signal that has been subjected to the prefiltering process (step S203). The local variance value is calculated as, for example, a first variance value indicating variation in the teacher speech signal in the first period w1, and a second variance value indicating variation in the teacher speech signal in the second period w2 including the first period w1. . The local variance calculated in this way functions as a local feature amount for determining an intermittent rale (for example, a water bubble sound) among abnormal sounds.

When the local variance value is calculated, the maximum value of the local variance value in each frame of the teacher speech signal is calculated (step S204) and output as the first local feature amount (step S205).

As shown in FIG. 5, when calculating the second local feature, first, a spectrum obtained by frequency analysis is acquired (step S301), and a CMN (Cepstral Mean Normalization) process is executed (step S302). In the CMN processing, it is possible to remove characteristics such as sensors and environment that are constantly convoluted from the teacher voice signal.

Further, a liftering process for extracting an envelope component (step S303) and a liftering process for extracting a fine component (step S304) are performed on the teacher speech signal subjected to the CMN process. The liftering process is a process of cutting a predetermined cefency component from the cepstrum.

According to the above-described CMN processing and liftering processing, it is possible to make it easy to determine continuous rales (for example, analogy sounds, whistle sounds, haircut sounds, etc.) that are buried in other biological sounds. Since the CMN process and the liftering process are existing techniques, a detailed description thereof is omitted here.

For the teacher speech signal that has been subjected to the liftering process for extracting the fine components, the enhancement process using the KL information amount is executed to calculate the feature quantity. The KL information amount is a parameter calculated using the observed value P and the reference value Q (for example, theoretical value, model value, predicted value, etc.), and a characteristic observed value P appears with respect to the reference value Q. Then, the KL information amount is calculated as a large value. According to the processing using the KL information amount, the tone component (that is, the component for discriminating the continuous rarity) included in the teacher speech signal is emphasized and clarified.

On the other hand, HAAR-LIKE features are also calculated for the spectrum obtained by frequency analysis (step S306). The HAAR-LIKE feature is a technique mainly used in the field of image processing. Here, the HAAR-LIKE feature is calculated from the spectrum by a similar method by associating the amplitude value for each frequency with the pixel value in the image processing. Note that the calculation of the HAAR-LIKE feature is an existing technique, and thus detailed description thereof is omitted here.

As described above, when various processes are performed on the teacher speech signal and a plurality of feature amounts are calculated for each frequency, an average value is calculated for each frequency band (step S307), and is output as the second local feature amount ( Step S307).

As shown in FIG. 6, a plurality of types of local feature quantities (that is, a first local feature quantity and a second local feature quantity) are obtained from the waveform and spectrum of the respiratory sound signal by the above-described processing. These are output as local feature vectors for each frame.

2 again, when the local feature amount is calculated, teacher frame information is acquired by the teacher frame information input unit 120 (step S106). The teacher frame information is a specific example of “second information”. The acquired teacher frame information is output to the learning unit 250 together with the local feature amount.

Hereinafter, the learning operation in the learning unit 250 will be described with reference to FIG. FIG. 7 is a diagram illustrating teacher frame information associated with a local feature amount.

As shown in FIG. 7, in the learning unit 250, the local feature amount and the teacher frame information are set as a pair of data for each frame (step S107). Thereby, a local feature-value and the generation | occurrence | production timing of abnormal sound are matched.

More specifically, the local feature amount vector associated with the teacher frame information indicating the timing at which the abnormal sound is generated is learned as the local feature amount when the abnormal sound is generated. On the other hand, the local feature vector associated with the teacher frame information indicating the timing at which no abnormal sound is generated is learned as a local feature when no abnormal sound is generated.

Returning to FIG. 2 again, the association between the local feature amount and the teacher frame information described above is executed for each frame. When the setting is completed, it is determined whether or not the setting is completed for all the frames (step S108). ) If the setting has not been completed for all the frames (step S108: NO), the processes after steps S103 and S104 are repeated for the unset frames.

On the other hand, if the setting has been completed for all the frames (step S108: YES), it is determined whether or not the processing has been completed for all the plurality of teacher data (step S109). If it is determined that the processing has not been completed for all the teacher data (step S109: NO), the processing after step S101 is executed again. By repeating the process in this way, the correspondence is performed for all the frames of all the teacher data.

Thereafter, the learning process by the learning unit 250 is actually executed (step S110). The learning process is performed using a machine learning algorithm such as AdaBoost, for example. Note that, in addition to the above-described AdaBoost, an existing method can be appropriately employed for the learning process, and thus detailed description thereof is omitted here.

Since the learning process described above is performed for each abnormal sound type, it is determined whether or not the learning process has been completed for all sound types after the process ends (step S111), and the learning process has been completed for all sound types. If not (step S111: NO), the learning process of step S110 is executed again for the incomplete sound type.

On the other hand, when the learning process is completed for all the sound types (step S111: YES), the learning result is output to the body sound analyzer (step S112).

<Effects of frame judgment learning>
The learning result of the frame determination learning described above is used for the analysis of the respiratory sound by the body sound analyzer. When analyzing the respiratory sound, the same processing as the processing for the teacher speech signal in the frame determination learning operation described above is executed for the respiratory sound signal to be analyzed. Specifically, the input respiratory sound signal is divided into frames, and a local feature amount is calculated for each frame.

In this embodiment, since the relationship between the local feature amount and the generation timing of the abnormal sound is learned by the learning operation described above, the abnormal sound is preferably generated using the local feature amount calculated from the respiratory sound. The timing (in other words, the temporal position of the frame where the abnormal sound is generated) can be detected. Therefore, it is possible to determine abnormal sounds extremely accurately as compared with the case where the learning operation is not performed in advance.

<Determination of optimal threshold>
Next, the optimum threshold value determining operation according to the present embodiment will be described with reference to FIGS. 8 and 10. Note that the optimum threshold value determining operation is a ratio of a period in which it is determined that an abnormal sound is occurring. This is an operation for determining, as an optimum value, a threshold value used when determining the presence or absence of actual occurrence of abnormal sound.

<Configuration of threshold determination unit>
First, the configuration of the threshold value determination unit that executes the optimum threshold value determination operation will be described with reference to FIG. FIG. 8 is a block diagram illustrating the configuration of the threshold value determination unit according to the present embodiment.

As illustrated in FIG. 8, the threshold value determination unit according to the present embodiment includes a frame determination result input unit 410, an entire teacher information input unit 420, a determination unit 500, and a threshold output unit 600. .

The frame determination result input unit 410 acquires the result of the frame determination process of the teacher audio signal using the learning result (that is, the process of determining whether or not abnormal sound is generated for each frame), and determines the determination unit 500 It is configured to be able to output to.

The entire teacher information input unit 120 is configured to be able to acquire the entire teacher information included in the teacher data and output it to the determination unit 500.

The determination unit 500 includes a global feature amount calculation unit 510, an ROC analysis (Receiver Operating Characteristic analysis) unit 520, and an optimum threshold value calculation unit 530.

The global feature quantity calculation unit 510 is configured to be able to calculate the ratio of the occurrence time when the abnormal sound is generated to the period during which the teacher voice signal is input, based on the determination result of the frame determination process. Information indicating the ratio of occurrence time of abnormal sound calculated by the global feature amount calculation unit 510 is output to the ROC analysis unit 520.

The ROC analysis unit 520 determines, based on the relationship between the information indicating the ratio of the abnormal sound occurrence time and the entire teacher information, a certain threshold for the ratio of the abnormal sound occurrence time and the threshold. An ROC analysis for acquiring a relationship with performance as an ROC curve is configured to be executable. Since the ROC analysis is an existing technique, detailed description thereof is omitted here. The analysis result of the ROC analysis unit 520 is configured to be output to the optimum threshold value calculation unit 530.

The optimum threshold value calculation unit 530 uses the analysis result of the ROC analysis unit 520 to determine whether or not the abnormal sound is actually generated from the ratio of the generation time during which the abnormal sound is generated. A threshold value (threshold value that gives a point closest to the reference point (0, 1) in the ROC curve) can be calculated (see FIG. 9). The threshold value calculated by the optimum threshold value calculation unit 530 is output to the threshold value output unit 600.

The threshold value output unit 600 is configured to be able to output the threshold value calculated by the optimum threshold value calculation unit 530 in a manner that can be used for analysis of a body sound. For example, the threshold value output unit 300 is configured to be able to output the threshold value calculated by the optimal threshold value calculation unit 530 to a memory or the like of the biological sound analyzer.

<Description of operation>
Next, the flow of the optimum threshold value determination operation executed by the above-described threshold value determination unit will be described with reference to FIG. FIG. 10 is a flowchart showing the flow of the optimum threshold value determining operation according to the present embodiment.

As shown in FIG. 10, in the optimum threshold value determining operation according to this embodiment, first, the frame determination result of the teacher voice signal is acquired by the frame determination result input unit 410 (step S201). Subsequently, it is determined whether or not the frame determination results for all the frames have been acquired (step S202). When the frame determination results for all the frames have not been acquired (step S202: NO), the process of step S201 is executed again for the unacquired frames.

On the other hand, when the frame determination results of all the frames have been acquired (step S202: YES), the acquired frame determination results are output to the global feature amount calculation unit 510, and the frames of the entire section of the teacher speech signal The ratio of the number of frames determined to have an abnormal sound to the number (that is, the global feature amount) is calculated (step S203). The global feature amount is a specific example of “fourth information”.

Subsequently, the entire teacher information included in the teacher data is acquired by the entire teacher information input unit 420 (step S204). The entire teacher information is a specific example of “third information”.

The entire teacher information is output to the ROC analysis unit 520 together with the global feature quantity, and the global feature quantity and the overall teacher information are set as a pair of data (step S205). That is, the ratio of the number of frames determined to have an abnormal sound and the information indicating whether or not the abnormal sound has occurred are associated with each other. Thereafter, it is determined whether or not the setting has been completed for all teacher data (step S206). When the setting of all the teacher data has not been completed (step S206: NO), the processing after step S201 is executed again for the unset teacher data.

On the other hand, when the setting of all teacher data has been completed (step S206: YES), ROC analysis is performed by the ROC analysis unit 520, and the relationship between the threshold value and the discrimination performance is obtained as an ROC curve (step S207). . When the ROC analysis ends, the optimum threshold value calculation unit 530 calculates an optimum threshold value according to the ROC analysis result (step S208).

Since the above-described ROC analysis and threshold value calculation process is executed for each abnormal sound type, it is determined whether or not the process has been completed for all sound types after the completion (step S209). If the processing has not been completed for all sound types (step S209: NO), the processing of step S207 and step S208 is executed for the incomplete sound types. If the processing has been completed for all sound types (step S209: YES), a threshold is output (step S210), and the series of processing ends.

<Effect of determining optimum threshold value>
The threshold value determined as described above is used for analysis of respiratory sounds by the body sound analyzer. For example, when analyzing the respiratory sound, the same processing as the processing for the teacher speech signal in the frame determination learning operation described above is executed for the respiratory sound signal to be analyzed. Specifically, from the frame determination result of the input respiratory sound signal, the ratio of the period in which abnormal sound is occurring to the respiratory sound acquisition period (that is, the global feature value) is calculated. As a result, for example, when the ratio of the period in which abnormal sound is occurring is large, it can be determined that the abnormal sound is actually generated. On the other hand, when the ratio of the period in which the abnormal sound is generated is small, it can be determined that the abnormal sound is not actually generated although the abnormal sound is detected in the frame unit.

The determination regarding the presence or absence of such abnormal sound is realized by comparison with the threshold value determined by the above-described optimum threshold value determination operation. Specifically, it can be determined that the abnormal sound is generated when the ratio of the period in which the abnormal sound is generated is larger than the threshold, and it can be determined that the abnormal sound is not generated when the ratio is smaller than the threshold. Here, in the present embodiment, in particular, in the above-described optimum threshold value determination operation, the global feature amount and the entire teacher data (that is, information indicating the presence / absence of abnormal sound) are associated, and as a result, the optimum threshold value Is calculated. Therefore, it is possible to determine the presence / absence of abnormal sound extremely accurately based on the global feature amount calculated from the respiratory sound.

The present invention is not limited to the above-described embodiments, and can be appropriately changed without departing from the spirit or idea of the invention that can be read from the claims and the entire specification, and biological sound analysis accompanied by such changes A method, a program, a storage medium, and a biological sound analysis device are also included in the technical scope of the present invention.

DESCRIPTION OF SYMBOLS 110 Teacher speech signal input part 120 Teacher frame information input part 200 Processing part 210 Frame division part 220 1st local feature-value calculation part 230 Frequency analysis part 240 2nd local feature-value calculation part 250 Learning part 300 Learning result output part 410 Frame determination Result input unit 420 Overall teacher information input unit 500 Determination unit 510 Global feature amount calculation unit 520 ROC analysis unit 530 Optimal threshold calculation unit 600 Threshold output unit

Claims

A body sound analysis method used in a body sound analysis apparatus for analyzing body sounds,
A first acquisition step of acquiring first information related to a body sound;
A second acquisition step of acquiring second information indicating the timing of occurrence of abnormal sound in the first information;
A learning step of learning a correspondence relationship between the first information and the second information;
A discrimination step for discriminating abnormal sounds included in the input body sound information based on the learning result of the learning step;
A body sound analysis method comprising:
Based on the first information, further including a first generation step of generating feature amount information indicating a feature amount in the first information;
The living body according to claim 1, wherein the learning step learns a correspondence relationship between the feature amount information and the second information instead of a correspondence relationship between the first information and the second information. Sound analysis method.
A division step of dividing the first information and the second information into predetermined frame units;
The biological sound analysis method according to claim 1, wherein the learning step learns in units of the predetermined frame.
A third acquisition step of acquiring third information indicating presence or absence of occurrence of the abnormal sound in the first information;
A calculation step of calculating fourth information indicating a ratio of a period in which the abnormal sound is generated to a period in which the first information is acquired based on the first information and the learning result by the learning step;
A determination step of determining a threshold for determining whether or not abnormal sound is included in the input body sound information based on the third information and the fourth information;
The biological sound analysis method according to any one of claims 1 to 3, further comprising:
A program for causing the biological sound analysis apparatus to execute the biological sound analysis method according to claim 1.
A storage medium storing the program according to claim 5.
A first acquisition unit that acquires biological sound information related to the biological sound;
A discriminator for discriminating an abnormal sound included in the biological sound based on a learning result;
With
The learning result learns the correspondence between the first information and the second information based on the first information about the body sound and the second information indicating the timing at which the abnormal sound in the body sound is generated. The biological sound analysis device characterized by being a learning result.