JP6672478B2

JP6672478B2 - Body sound analysis method, program, storage medium, and body sound analysis device

Info

Publication number: JP6672478B2
Application number: JP2018558046A
Authority: JP
Inventors: 隆真亀谷
Original assignee: Pioneer Corp
Current assignee: Pioneer Corp
Priority date: 2016-12-20
Filing date: 2017-12-20
Publication date: 2020-03-25
Anticipated expiration: 2037-12-20
Also published as: WO2018117171A1; JPWO2018117171A1

Description

本発明は、例えば呼吸音等の生体音を解析する生体音解析方法、プログラム、記憶媒体及び生体音解析装置の技術分野に関する。 The present invention relates to a technical field of a body sound analysis method, a program, a storage medium, and a body sound analysis device for analyzing a body sound such as a breathing sound.

電子聴診器等によって検出される生体の呼吸音について、そこに含まれる異常音（即ち、正常な呼吸音とは異なる音）を検出しようとする装置が知られている。例えば特許文献１には、呼吸音に含まれる複数の異常音（副雑音）を、音種別に分解して検出するという技術が記載されている。 2. Description of the Related Art There has been known an apparatus that detects an abnormal sound (that is, a sound different from a normal respiratory sound) contained in a respiratory sound of a living body detected by an electronic stethoscope or the like. For example, Patent Literature 1 describes a technique of detecting a plurality of abnormal sounds (sub-noises) included in breathing sounds by decomposing the sounds into sound types.

国際公開第２０１６／００２００４号International Publication No. WO 2016/002004

生体音に含まれている異常音に関する解析を行う場合、取得された生体音情報と、予め記憶している異常音情報（具体的には、実際に異常音が発生している場合の生体音情報）とを比較することで、異常音の発生を判断することが可能である。しかしながら、生体音情報は個人差や測定環境等に応じて変動するため、単に情報を比較するだけでは、異常音が発生しているか否かを正確に判断することは難しい。このため、適切な判断基準を設定しておかなければ、異常音が発生しているのに検出できない、或いは異常音発生していないのに誤って検出してしまうという技術的問題点が生ずる。 When analyzing the abnormal sound included in the body sound, the acquired body sound information and the previously stored abnormal sound information (specifically, the body sound when the abnormal sound is actually generated) ) Can be determined to be an abnormal sound. However, since the body sound information fluctuates according to individual differences, measurement environments, and the like, it is difficult to accurately determine whether or not an abnormal sound is occurring simply by comparing the information. For this reason, unless an appropriate judgment criterion is set, there arises a technical problem that the abnormal sound cannot be detected even though the abnormal sound is generated, or is erroneously detected when the abnormal sound is not generated.

本発明が解決しようとする課題には、上記のようなものが一例として挙げられる。本発明は、生体音に含まれる異常音を好適に解析可能な生体音解析方法、プログラム、記憶媒体及び生体音解析装置を提供することを課題とする。 The problems to be solved by the present invention include, for example, those described above. An object of the present invention is to provide a body sound analysis method, a program, a storage medium, and a body sound analysis device capable of suitably analyzing abnormal sounds included in body sounds.

上記課題を解決するための生体音解析方法は、生体音を解析する生体音解析装置に利用される生体音解析方法であって、生体音に関する第１情報を取得する第１取得工程と、前記第１情報における、異常音が発生しているタイミングを示す第２情報を取得する第２取得工程と、前記第１情報と前記第２情報との対応関係を学習する学習工程と、前記学習工程による学習結果に基づいて、入力された生体音情報に含まれる異常音を判別する判別工程と、を含む。 A body sound analysis method for solving the above-mentioned problem is a body sound analysis method used in a body sound analysis device for analyzing a body sound, wherein a first acquisition step of acquiring first information regarding a body sound; A second acquisition step of acquiring second information indicating a timing at which an abnormal sound is occurring in the first information; a learning step of learning a correspondence relationship between the first information and the second information; Determining the abnormal sound included in the input body sound information based on the learning result by the above.

上記課題を解決するためのプログラムは、上述した生体音解析方法を、前記生体音解析装置に実行させる。 A program for solving the above-described problem causes the body sound analysis device to execute the above-described body sound analysis method.

上記課題を解決するための記憶媒体は、上述したプログラムを記憶している。 A storage medium for solving the above-mentioned problem stores the above-mentioned program.

上記課題を解決するための生体音解析装置は、生体音に関する生体音情報を取得する第１取得部と、学習結果に基づいて、前記生体音に含まれる異常音を判別する判別部と、を備え、前記学習結果は、生体音に関する第１情報と、前記生体音における異常音が発生しているタイミングを示す第２情報とに基づいて、前記第１情報と前記第２情報との対応関係を学習した学習結果である。 A body sound analysis device for solving the above-mentioned problem includes a first acquisition unit that acquires body sound information related to body sound, and a determination unit that determines an abnormal sound included in the body sound based on a learning result. The learning result is a correspondence relationship between the first information and the second information based on first information on the body sound and second information indicating a timing at which an abnormal sound is generated in the body sound. Is the learning result of learning.

本実施例に係るフレーム判定学習器の構成を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration of a frame determination learning device according to the embodiment. 本実施例に係るフレーム判定学習動作の流れを示すフローチャートである。5 is a flowchart illustrating a flow of a frame determination learning operation according to the embodiment. 教師音声信号のフレーム分割処理を示す概念図である。It is a conceptual diagram which shows the frame division process of a teacher audio | voice signal. 第１局所特徴量の算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of a 1st local feature-value. 第２局所特徴量の算出処理を示すフローチャートである。It is a flowchart which shows the calculation process of a 2nd local feature-value. 波形及びスペクトラムから得られる局所特徴量ベクトルを示す図である。FIG. 4 is a diagram illustrating a local feature vector obtained from a waveform and a spectrum. 局所特徴量と対応付けられる教師フレーム情報を示す図である。It is a figure showing teacher frame information matched with a local feature. 本実施例に係る閾値決定部の構成を示すブロック図である。FIG. 4 is a block diagram illustrating a configuration of a threshold value determining unit according to the embodiment. ＲＯＣ解析の処理内容の一例を示す概念図である。It is a conceptual diagram which shows an example of the processing content of ROC analysis. 本実施例に係る最適閾値決定動作の流れを示すフローチャートである。6 is a flowchart illustrating a flow of an optimal threshold value determining operation according to the embodiment.

＜１＞
本実施形態に係る生体音解析方法は、生体音を解析する生体音解析装置に利用される生体音解析方法であって、生体音に関する第１情報を取得する第１取得工程と、前記第１情報における、異常音が発生しているタイミングを示す第２情報を取得する第２取得工程と、前記第１情報と前記第２情報との対応関係を学習する学習工程と、前記学習工程による学習結果に基づいて、入力された生体音情報に含まれる異常音を判別する判別工程と、を含む。<1>
The body sound analysis method according to the present embodiment is a body sound analysis method used in a body sound analysis device that analyzes body sound, and includes a first acquisition step of acquiring first information related to body sound, A second acquisition step of acquiring second information indicating a timing at which an abnormal sound occurs in the information, a learning step of learning a correspondence relationship between the first information and the second information, and a learning by the learning step Discriminating the abnormal sound included in the input body sound information based on the result.

本実施形態に係る生体音解析方法によれば、生体音（例えば、呼吸音）に関する第１情報が取得されると共に、第１情報における異常音（例えば、副雑音）が発生しているタイミングを示す第２情報が取得される。第１情報は、生体音の経時的な変化を示す情報（例えば生体音を示す時間軸波形）として取得される。一方で、第２情報は、第１情報における異常音が発生しているタイミングを正確に示す情報であることが望まれる。このため、第２情報は、第１情報を利用して予め用意された情報であることが好ましい。 According to the body sound analysis method according to the present embodiment, the first information on the body sound (for example, respiratory sound) is obtained, and the timing at which the abnormal sound (for example, auxiliary noise) in the first information is generated is determined. The indicated second information is obtained. The first information is acquired as information indicating a temporal change of the body sound (for example, a time axis waveform indicating the body sound). On the other hand, it is desired that the second information is information that accurately indicates the timing at which the abnormal sound in the first information is occurring. For this reason, the second information is preferably information prepared in advance using the first information.

第１情報及び第２情報が取得されると、第１情報と第２情報との対応関係が学習される。具体的には、第１情報から第２情報を導き出すためのパラメータが学習される。このパラメータは複数種類存在してもよい。学習工程は、学習結果をより正確なものとするために、複数の第１情報及び第２情報を利用して、複数回実行されることが好ましい。 When the first information and the second information are obtained, the correspondence between the first information and the second information is learned. Specifically, a parameter for deriving the second information from the first information is learned. There may be more than one type of this parameter. The learning step is preferably performed a plurality of times using a plurality of pieces of first information and a plurality of pieces of second information in order to make the learning result more accurate.

学習工程後には、学習結果に基づいて、入力された生体音情報に含まれる異常音が判別される。なお「入力された生体音情報」とは、本実施形態に係る生体音解析方法の解析対象となる生体音に関する情報であり、上述した第１情報や第２情報とは別に入力されるものである。本実施形態では特に、第１情報と第２情報との対応関係が予め学習されているため、入力された生体音情報から異常音が発生しているタイミングを正確に判定できる。よって、生体音情報に含まれる異常音を好適に判別することが可能である。 After the learning step, an abnormal sound included in the input body sound information is determined based on the learning result. The “input body sound information” is information on a body sound to be analyzed by the body sound analysis method according to the present embodiment, and is input separately from the above-described first information and second information. is there. In the present embodiment, in particular, since the correspondence between the first information and the second information is learned in advance, it is possible to accurately determine the timing at which the abnormal sound is generated from the input body sound information. Therefore, it is possible to appropriately determine the abnormal sound included in the body sound information.

＜２＞
本実施形態に係る生体音解析方法の一態様では、前記第１情報に基づいて、前記第１情報における特徴量を示す特徴量情報を生成する第１生成工程を更に含み、前記学習工程は、前記第１情報と前記第２情報との対応関係に代えて、前記特徴量情報と前記第２情報との対応関係を学習する。<2>
In one aspect of the body sound analysis method according to the present embodiment, the method further includes a first generation step of generating feature amount information indicating a feature amount in the first information based on the first information, and the learning step includes: The correspondence between the feature amount information and the second information is learned instead of the correspondence between the first information and the second information.

この態様によれば、第１情報が取得されると、第１情報における特徴量を示す特徴量情報が生成される。なお「特徴量」とは、生体音に含まれる異常音を判別するために利用可能な特徴の大きさ（度合い）を示す値である。 According to this aspect, when the first information is obtained, feature amount information indicating a feature amount in the first information is generated. The “feature amount” is a value indicating a magnitude (degree) of a feature that can be used to determine an abnormal sound included in the body sound.

本態様では特に、第１情報と第２情報との対応関係に代えて、特徴量情報と第２情報との対応関係が学習される。従って、入力された生体音情報に含まれる異常音を判別するために、より適した学習結果が得られる。 In this aspect, in particular, the correspondence between the feature amount information and the second information is learned instead of the correspondence between the first information and the second information. Therefore, a more suitable learning result can be obtained in order to determine an abnormal sound included in the input body sound information.

＜３＞
本実施形態に係る生体音解析方法の一態様では、前記第１情報及び前記第２情報を所定のフレーム単位に分割する分割工程を更に備え、前記学習工程は、前記所定のフレーム単位で学習する。<3>
In one aspect of the body sound analysis method according to the present embodiment, the method further includes a dividing step of dividing the first information and the second information into predetermined frame units, and the learning step learns in the predetermined frame units. .

この態様によれば、学習が行われる間に、第１情報及び第２情報が所定のフレーム単位に分割される。所定のフレーム単位は、より容易に適切な学習結果が得られるような期間として設定されている。このため、所定のフレーム単位で学習を行うことで、より好適に学習結果を得ることが可能とある。 According to this aspect, while learning is performed, the first information and the second information are divided into predetermined frame units. The predetermined frame unit is set as a period in which an appropriate learning result can be more easily obtained. For this reason, by performing learning in a predetermined frame unit, it is possible to more appropriately obtain a learning result.

＜４＞
本実施形態に係る生体音解析方法の一態様では、前記第１情報における前記異常音の発生の有無を示す第３情報を取得する第３取得工程と、前記第１情報及び前記学習工程による前記学習結果に基づいて、前記第１情報が取得された期間に対する、前記異常音が発生している期間の割合を示す第４情報を算出する算出工程と、前記第３情報及び前記第４情報に基づいて、前記入力された生体音情報に異常音が含まれるか否かを判定するための閾値を決定する決定工程と、を更に含む。<4>
In one aspect of the body sound analysis method according to the present embodiment, a third obtaining step of obtaining third information indicating whether or not the abnormal sound has occurred in the first information, and the first information and the learning step include: A calculating step of calculating, based on a learning result, fourth information indicating a ratio of a period during which the abnormal sound is generated to a period during which the first information is obtained, and calculating the third information and the fourth information. And determining a threshold value for determining whether or not the input body sound information includes an abnormal sound based on the input sound information.

この態様によれば、第１情報における異常音の発生の有無を示す第３情報が取得される。第３情報は、第１情報における異常音の発生の有無を正確に示す情報であることが望まれる。このため、第３情報は、第１情報を利用して予め用意された情報であることが好ましい。 According to this aspect, the third information indicating whether or not the abnormal sound has occurred in the first information is obtained. It is desired that the third information is information that accurately indicates whether or not an abnormal sound has occurred in the first information. For this reason, it is preferable that the third information is information prepared in advance using the first information.

本態様では更に、第１情報及び学習結果に基づいて、第１情報が取得された期間に対する、異常音が発生している期間の割合を示す第４情報が算出される。具体的には、第１情報が学習結果を利用して解析されることで、第４情報が算出される。 Further, in the present aspect, based on the first information and the learning result, fourth information indicating a ratio of a period during which the abnormal sound is generated to a period during which the first information is obtained is calculated. Specifically, the fourth information is calculated by analyzing the first information using the learning result.

第３情報が取得され第４情報が算出されると、第３情報及び第４情報に基づいて、入力された生体音情報に異常音が含まれるか否かを判定するための閾値が決定される。この閾値は、入力された生体音情報に含まれる異常音を判別する際に、実際に異常音が含まれるか否かを判別するための閾値であり、具体的には、入力された生体音情報が取得された期間に対する、異常音が発生している期間の割合を示す値と比較される閾値である。 When the third information is obtained and the fourth information is calculated, a threshold for determining whether or not the input body sound information includes an abnormal sound is determined based on the third information and the fourth information. You. This threshold value is a threshold value for determining whether or not an abnormal sound is actually included when determining an abnormal sound included in the input body sound information. This is a threshold value that is compared with a value indicating a ratio of a period during which abnormal sound is generated to a period during which information is acquired.

この閾値を利用すれば、例えば入力された生体音情報に関する異常音が発生している期間の割合が、決定された閾値以上である場合に異常音が発生していると判定できる。一方で、入力された生体音情報に関する異常音が発生している期間の割合が、決定された閾値未満である場合に異常音が発生していないと判定できる。 By using this threshold value, it is possible to determine that an abnormal sound has occurred, for example, when the ratio of the period during which the abnormal sound related to the input body sound information is occurring is equal to or greater than the determined threshold value. On the other hand, when the ratio of the period during which the abnormal sound related to the input body sound information is generated is less than the determined threshold, it can be determined that the abnormal sound is not generated.

本態様では特に、閾値が学習結果を利用して算出された第４情報に基づいて決定されるため、異常音の発生の有無をより正確に判定することが可能である。 Particularly in this aspect, since the threshold is determined based on the fourth information calculated using the learning result, it is possible to more accurately determine whether or not an abnormal sound has occurred.

＜５＞
本実施形態に係るプログラムは、上述した生体音解析方法を、前記生体音解析装置に実行させる。<5>
The program according to the present embodiment causes the body sound analyzer to execute the above-described body sound analysis method.

本実施形態に係るプログラムによれば、上述した本実施形態に係る生体音解析方法を実行させることができるため、生体音情報に含まれる異常音を好適に判別することが可能である。 According to the program according to the present embodiment, since the above-described body sound analysis method according to the present embodiment can be executed, it is possible to appropriately determine an abnormal sound included in body sound information.

＜６＞
本実施形態に係る記憶媒体は、上述したプログラムを記憶している。<6>
The storage medium according to the present embodiment stores the above-described program.

本実施形態に係る記憶媒体によれば、上述した本実施形態に係るプログラムを実行させることができるため、生体音情報に含まれる異常音を好適に判別することが可能である。 According to the storage medium according to the present embodiment, the program according to the above-described embodiment can be executed, so that abnormal sounds included in body sound information can be appropriately determined.

＜７＞
本実施形態に係る生体音解析装置は、生体音に関する生体音情報を取得する第１取得部と、学習結果に基づいて、前記生体音に含まれる異常音を判別する判別部と、を備え、前記学習結果は、生体音に関する第１情報と、前記生体音における異常音が発生しているタイミングを示す第２情報とに基づいて、前記第１情報と前記第２情報との対応関係を学習した学習結果である。<7>
The body sound analysis device according to the present embodiment includes a first acquisition unit that acquires body sound information related to body sound, and a determination unit that determines an abnormal sound included in the body sound based on a learning result, The learning result learns a correspondence relationship between the first information and the second information based on first information related to the body sound and second information indicating a timing at which an abnormal sound occurs in the body sound. It is the learning result that was done.

本実施形態に係る生体音解析装置によれば、上述した生体音解析方法と同様に、学習結果に基づいて、生体音情報に含まれる異常音を好適に判別することが可能である。 According to the body sound analysis apparatus according to the present embodiment, similarly to the above-described body sound analysis method, it is possible to appropriately determine the abnormal sound included in the body sound information based on the learning result.

なお、本実施形態に係る生体音解析装置においても、上述した本実施形態に係る生体音解析方法における各種態様と同様の各種態様を採ることが可能である。 Note that the body sound analysis device according to the present embodiment can also adopt various aspects similar to the various aspects in the above-described body sound analysis method according to the embodiment.

本実施形態に係る生体音解析方法、プログラム、記憶媒体及び生体音解析装置の作用及び他の利得については、以下に示す実施例において、より詳細に説明する。 The operation and other gains of the body sound analysis method, the program, the storage medium, and the body sound analysis device according to the present embodiment will be described in more detail in the following examples.

以下では、生体音解析方法、プログラム、記憶媒体及び生体音解析装置の実施例について、図面を参照しながら詳細に説明する。なお、以下では、呼吸音の解析を行う生体音解析方法を例に挙げて説明する。 Hereinafter, embodiments of a body sound analysis method, a program, a storage medium, and a body sound analysis device will be described in detail with reference to the drawings. Hereinafter, a body sound analysis method for analyzing a breathing sound will be described as an example.

＜教師データ＞
まず、本実施例に係る生体音解析方法で用いられる教師データについて説明する。<Teacher data>
First, teacher data used in the body sound analysis method according to the present embodiment will be described.

教師データは、教師音声信号、教師フレーム情報、及び教師全体情報の３つの情報を１セットとするデータであり、事前に複数セット用意される。 The teacher data is data in which three pieces of information of a teacher voice signal, teacher frame information, and entire teacher information are set as one set, and a plurality of sets are prepared in advance.

教師音声信号は、呼吸音の経時的変化を示す信号（例えば、時間軸波形）である。教師フレーム情報は、教師音声信号における異常音の発生タイミングを音種毎に示す情報である。教師全体情報は、教師音声信号における異常音の発生の有無を音種毎に示す情報である。 The teacher voice signal is a signal (for example, a time axis waveform) indicating a temporal change of the breathing sound. The teacher frame information is information indicating the generation timing of the abnormal sound in the teacher voice signal for each sound type. The entire teacher information is information indicating whether or not an abnormal sound has occurred in the teacher voice signal for each sound type.

教師データは、後述する学習動作に利用されるものであり、数が多いほど学習効果（言い換えれば、生体音解析の精度）を高めることができる。 The teacher data is used for a learning operation described later, and the greater the number, the higher the learning effect (in other words, the accuracy of the body sound analysis).

＜フレーム判定学習＞
次に、本実施例に係る生体音解析方法のフレーム判定学習について、図１から図７を参照して説明する。なお、フレーム判定学習とは、異常音の発生をフレーム単位で判定するフレーム判定処理の判定精度を高めるための学習動作である。<Frame judgment learning>
Next, frame determination learning of the body sound analysis method according to the present embodiment will be described with reference to FIGS. Note that the frame determination learning is a learning operation for improving the determination accuracy of the frame determination process for determining the occurrence of abnormal sound on a frame-by-frame basis.

＜学習器の構成＞
まず、フレーム判定学習に用いられるフレーム判定学習器の構成について、図１を参照して説明する。図１は、本実施例に係るフレーム判定学習器の構成を示すブロック図である。<Structure of learning device>
First, the configuration of a frame determination learning device used for frame determination learning will be described with reference to FIG. FIG. 1 is a block diagram illustrating a configuration of the frame determination learning device according to the present embodiment.

図１に示すように、本実施例に係るフレーム判定学習器は、教師音声信号入力部１１０と、教師フレーム情報入力部１２０と、処理部２００と、学習結果出力部３００とを備えて構成されている。 As shown in FIG. 1, the frame determination learning device according to the present embodiment includes a teacher voice signal input unit 110, a teacher frame information input unit 120, a processing unit 200, and a learning result output unit 300. ing.

教師音声信号入力部１１０は、教師データに含まれる教師音声信号を取得して、処理部２００に出力可能に構成されている。 The teacher voice signal input unit 110 is configured to acquire a teacher voice signal included in teacher data and output it to the processing unit 200.

教師フレーム情報入力部１２０は、教師データに含まれる教師フレーム情報を取得して、処理部２００に出力可能に構成されている。 The teacher frame information input unit 120 is configured to acquire teacher frame information included in teacher data and output the acquired teacher frame information to the processing unit 200.

処理部２００は、複数の演算回路やメモリ等を含んで構成されている。処理部２００は、フレーム分割部２１０と、第１局所特徴量算出部２２０と、周波数解析部２３０と、第２局所特徴量算出部２４０と、学習部２５０とを備えて構成されている。 The processing unit 200 includes a plurality of arithmetic circuits, memories, and the like. The processing unit 200 includes a frame dividing unit 210, a first local feature calculating unit 220, a frequency analyzing unit 230, a second local feature calculating unit 240, and a learning unit 250.

フレーム分割部２１０は、教師音声信号入力部１１０から入力された教師音声信号を複数のフレームに分割する分割処理を実行可能に構成されている。フレーム分割部２１０で分割された呼吸音信号は、第１局所特徴量算出部２２０及び周波数解析部２３０に出力される構成となっている。 The frame division unit 210 is configured to be capable of executing a division process of dividing the teacher voice signal input from the teacher voice signal input unit 110 into a plurality of frames. The respiratory sound signal divided by the frame division unit 210 is output to the first local feature value calculation unit 220 and the frequency analysis unit 230.

第１局所特徴量算出部２２０は、教師音声信号の波形に基づいて、第１局所特徴量を算出可能に構成されている。第１局所特徴量算出部２２０が実行する処理については、後に詳述する。第１局所特徴量算出部２２０で算出された第１局所特徴量は、学習部２５０に出力される構成となっている。 The first local feature calculating section 220 is configured to be able to calculate the first local feature based on the waveform of the teacher voice signal. The processing executed by the first local feature calculating unit 220 will be described later in detail. The first local feature value calculated by the first local feature value calculation unit 220 is configured to be output to the learning unit 250.

周波数解析部２３０は、教師音声信号入力部１１０から入力された教師音声信号に対して時間周波数解析処理（例えば、ＦＦＴ処理等）を実行可能に構成されている。時間周波数解析部２３０の解析結果は、第２局所特徴量算出部２４０に出力される構成となっている。 The frequency analysis unit 230 is configured to be able to execute a time-frequency analysis process (for example, an FFT process) on the teacher voice signal input from the teacher voice signal input unit 110. The analysis result of the time-frequency analyzer 230 is output to the second local feature calculator 240.

第２局所特徴量算出部２４０は、周波数解析部２３０の解析結果に基づいて、第２局所特徴量を算出可能に構成されている。第２局所特徴量算出部２４０が実行する処理については、後に詳述する。第２局所特徴量算出部２４０で算出された第２局所特徴量は、学習部２５０に出力される構成となっている。 The second local feature calculation unit 240 is configured to be able to calculate the second local feature based on the analysis result of the frequency analysis unit 230. The processing executed by the second local feature value calculation unit 240 will be described later in detail. The second local feature calculated by the second local feature calculator 240 is output to the learning unit 250.

学習部２５０は、第１局所特徴量算出部２２０及び第２局所特徴量算出部２４０で算出された局所特徴量と、教師フレーム情報入力部１２０から入力される教師フレーム情報との対応関係を学習可能に構成されている。学習部２５０が実行する処理については、後に詳述する。学習部２５０の学習結果は、学習結果出力部３００に出力される構成となっている。 The learning unit 250 learns the correspondence between the local feature amounts calculated by the first local feature amount calculation unit 220 and the second local feature amount calculation unit 240 and the teacher frame information input from the teacher frame information input unit 120. It is configured to be possible. The processing executed by the learning unit 250 will be described later in detail. The learning result of the learning unit 250 is configured to be output to the learning result output unit 300.

学習結果出力部３００は、学習部２５０の学習結果を生体音の解析に利用できるような態様で出力可能に構成されている。例えば、学習結果出力部３００は、学習部２５０の学習結果を、生体音解析装置のメモリ等に出力可能に構成されている。 The learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 in a form that can be used for analyzing the body sound. For example, the learning result output unit 300 is configured to be able to output the learning result of the learning unit 250 to a memory or the like of the body sound analyzer.

＜動作説明＞
次に、上述したフレーム判定学習器で実行されるフレーム判定学習動作の流れについて、図２を参照して説明する。図２は、本実施例に係るフレーム判定学習動作の流れを示すフローチャートである。<Operation description>
Next, a flow of a frame determination learning operation performed by the above-described frame determination learning device will be described with reference to FIG. FIG. 2 is a flowchart illustrating the flow of the frame determination learning operation according to the present embodiment.

図２に示すように、本実施例に係るフレーム学習動作時には、まず学習部２５０が初期化される（ステップＳ１００）。続いて、教師音声信号入力部１１０によって教師音声信号が取得される（ステップＳ１０１）。教師音声信号入力部１１０は、取得した教師音声信号を処理部２００に出力する。教師音声信号は、「第１情報」の一具体例である。 As shown in FIG. 2, at the time of the frame learning operation according to the present embodiment, first, the learning unit 250 is initialized (step S100). Subsequently, a teacher voice signal is acquired by the teacher voice signal input unit 110 (step S101). The teacher voice signal input unit 110 outputs the acquired teacher voice signal to the processing unit 200. The teacher voice signal is a specific example of “first information”.

続いて、フレーム分割部２１０によって、呼吸音が複数のフレームに分割される（ステップＳ１０２）。以下では、呼吸音信号のフレーム分割について、図３を参照して具体的に説明する。図３は、教師音声信号のフレーム分割処理を示す概念図である。 Subsequently, the respiratory sound is divided into a plurality of frames by the frame dividing unit 210 (step S102). Hereinafter, the frame division of the respiratory sound signal will be specifically described with reference to FIG. FIG. 3 is a conceptual diagram showing a frame division process of the teacher voice signal.

図３に示すように、教師音声信号は、所定の間隔で複数のフレームに分割される。このフレームは、後述する局所特徴量の算出処理を好適に実行するための処理単位として設定されるものであり、１フレーム当たりの期間は、例えば１２ｍｓｅｃとされている。 As shown in FIG. 3, the teacher voice signal is divided into a plurality of frames at predetermined intervals. This frame is set as a processing unit for suitably executing a later-described local feature amount calculation process, and a period per frame is, for example, 12 msec.

図２に戻り、フレーム分割された教師音声信号は、第１局所特徴量算出部２２０に入力され、第１局所特徴量が算出される（ステップＳ１０３）。また、フレーム分割された教師音声信号は、周波数解析部２３０によって周波数解析され、第２局所特徴量算出部２４０に入力される。第２局所特徴量算出部２４０では、周波数解析された教師音声信号（例えば、周波数特性を示すスペクトラム）に基づいて、第２局所特徴量が算出される。 Returning to FIG. 2, the frame-divided teacher voice signal is input to the first local feature calculator 220, and the first local feature is calculated (step S103). Further, the teacher voice signal that has been divided into frames is frequency-analyzed by the frequency analysis unit 230, and is input to the second local feature value calculation unit 240. The second local feature calculating unit 240 calculates a second local feature based on the frequency-analyzed teacher voice signal (for example, a spectrum indicating frequency characteristics).

以下では、第１局所特徴量算出部２２０による第１局所特徴量の算出処理、及び第２局所特徴量算出部２４０による第２局所特徴量の算出処理について、図４から図６を参照して詳細に説明する。図４は、第１局所特徴量の算出処理を示すフローチャートである。図５は、第２局所特徴量の算出処理を示すフローチャートである。図６は、波形及びスペクトラムから得られる局所特徴量ベクトルを示す図である。 In the following, the calculation process of the first local feature by the first local feature calculator 220 and the calculation of the second local feature by the second local feature calculator 240 will be described with reference to FIGS. This will be described in detail. FIG. 4 is a flowchart illustrating a process of calculating the first local feature value. FIG. 5 is a flowchart illustrating a calculation process of the second local feature value. FIG. 6 is a diagram showing a local feature vector obtained from a waveform and a spectrum.

図４に示すように、第１局所特徴量の算出時には、まず教師音声信号の波形が取得され（ステップＳ２０１）、プリフィルター処理が施される（ステップＳ２０２）。プリフィルター処理は、例えばハイパスフィルターを用いた処理であり、教師音声信号に含まれる余分な成分を除去することが可能である。 As shown in FIG. 4, when calculating the first local feature value, first, the waveform of the teacher voice signal is obtained (step S201), and a pre-filter process is performed (step S202). The pre-filter process is, for example, a process using a high-pass filter, and can remove an extra component included in the teacher voice signal.

続いて、プリフィルター処理が施された教師音声信号を用いて、局所分散値が算出される（ステップＳ２０３）。局所分散値は、例えば第１期間ｗ１における教師音声信号のばらつきを示す第１分散値、及び第１期間ｗ１を含む第２期間ｗ２における教師音声信号のばらつきを示す第２分散値として算出される。このようにして算出される局所分散値は、異常音の中でも特に断続性ラ音（例えば、水泡音）を判定するための局所特徴量として機能する。 Subsequently, a local variance value is calculated using the teacher sound signal that has been subjected to the pre-filter processing (step S203). The local variance value is calculated, for example, as a first variance value indicating a variation in the teacher voice signal in the first period w1 and a second variance value indicating a variation in the teacher voice signal in the second period w2 including the first period w1. . The local variance value calculated in this manner functions as a local feature value for determining intermittent rales (for example, water bubbles) among abnormal sounds.

局所分散値が算出されると、教師音声信号の各フレームにおける局所分散値の最大値が算出され（ステップＳ２０４）、第１局所特徴量として出力される（ステップＳ２０５）。 When the local variance value is calculated, the maximum value of the local variance value in each frame of the teacher voice signal is calculated (step S204), and output as the first local feature value (step S205).

図５に示すように、第２局所特徴量の算出時には、まず周波数解析によって得られたスペクトラムが取得され（ステップＳ３０１）、ＣＭＮ（Cepstral Mean Normalization）処理が実行される（ステップＳ３０２）。ＣＭＮ処理では、教師音声信号からセンサや環境等の定常的に畳み込まれている特性を除去することができる。 As shown in FIG. 5, when calculating the second local feature, first, a spectrum obtained by frequency analysis is obtained (step S301), and a CMN (Cepstral Mean Normalization) process is executed (step S302). In the CMN process, it is possible to remove the constantly convoluted characteristics such as the sensor and the environment from the teacher voice signal.

ＣＭＮ処理が施された教師音声信号には更に、包絡成分を抽出するためのリフタリング処理（ステップＳ３０３）及び微細成分を抽出するためのリフタリング処理（ステップＳ３０４）が実行される。リフタリング処理は、ケプストラムから所定のケフレンシー成分をカットする処理である。 A lifter process (step S303) for extracting an envelope component and a lifter process (step S304) for extracting a fine component are further performed on the teacher voice signal subjected to the CMN process. The liftering process is a process of cutting a predetermined cffrency component from the cepstrum.

上述したＣＭＮ処理及びリフタリング処理によれば、他の生体音に埋もれてしまう連続性ラ音（例えば、類鼾音、笛声音、捻髪音等）を判別し易い状態にすることができる。なお、ＣＭＮ処理及びリフタリング処理については、既存の技術であるため、ここでのより詳細な説明については省略する。 According to the above-described CMN processing and liftering processing, it is possible to make a state in which a continuous rattle (for example, a snoring sound, a whistling sound, a crouching sound, etc.) buried in other body sounds can be easily distinguished. Note that the CMN processing and the lifting processing are existing techniques, and therefore, a more detailed description thereof will be omitted.

微細成分を抽出するリフタリング処理が行われた教師音声信号については、ＫＬ情報量を用いた強調処理が実行され特徴量が算出される。ＫＬ情報量は、観測値Ｐと基準値Ｑ（例えば、理論値、モデル値、予測値等）とを用いて算出されるパラメータであり、基準値Ｑに対して特徴のある観測値Ｐが現れると、ＫＬ情報量は大きな値として算出される。ＫＬ情報量を用いた処理によれば、教師音声信号に含まれているトーン性成分（即ち、連続性ラ音を判別するための成分）が強調され明確になる。 With respect to the teacher voice signal on which the lifter processing for extracting the fine component has been performed, the enhancement processing using the KL information amount is executed, and the feature amount is calculated. The KL information amount is a parameter calculated using the observed value P and a reference value Q (for example, a theoretical value, a model value, a predicted value, etc.), and an observed value P having a characteristic with respect to the reference value Q appears. And the KL information amount is calculated as a large value. According to the process using the KL information amount, the tone component (that is, the component for determining the continuous ra tone) included in the teacher voice signal is emphasized and clarified.

他方、周波数解析によって得られたスペクトラムには、ＨＡＡＲ−ＬＩＫＥ特徴の算出も実行される（ステップＳ３０６）。ＨＡＡＲ−ＬＩＫＥ特徴は主に画像処理の分野で用いられる技術であるが、ここでは周波数ごとの振幅値を画像処理における画素値に対応付けることによって、同様の手法でスペクトラムから算出される。なお、ＨＡＡＲ−ＬＩＫＥ特徴の算出については、既存の技術であるため、ここでの詳細な説明は省略する。 On the other hand, the calculation of the HAAR-LIKE feature is also performed on the spectrum obtained by the frequency analysis (step S306). The HAAR-LIKE feature is a technique mainly used in the field of image processing. Here, the HAAR-LIKE feature is calculated from a spectrum by a similar method by associating an amplitude value for each frequency with a pixel value in image processing. Note that the calculation of the HAAR-LIKE feature is an existing technique, and a detailed description thereof will be omitted.

以上のように教師音声信号に対し各種処理を施して複数の特徴量が周波数ごとに算出されると、周波数帯域別に平均値が算出され（ステップＳ３０７）、第２局所特徴量として出力される（ステップＳ３０７）。 As described above, when a plurality of feature values are calculated for each frequency by performing various processes on the teacher voice signal, an average value is calculated for each frequency band (step S307) and output as a second local feature value (step S307). Step S307).

図６に示すように、呼吸音信号の波形及びスペクトラムからは、上述した処理によって、複数種類の局所特徴量（即ち、第１局所特徴量及び第２局所特徴量）が得られる。これらは、フレーム毎に局所特徴量ベクトルとして出力される。 As shown in FIG. 6, a plurality of types of local feature amounts (that is, a first local feature amount and a second local feature amount) are obtained from the waveform and the spectrum of the respiratory sound signal by the above-described processing. These are output as local feature vectors for each frame.

再び図２に戻り、局所特徴量が算出されると、続いて教師フレーム情報入力部１２０により、教師フレーム情報が取得される（ステップＳ１０６）。教師フレーム情報は、「第２情報」の一具体例である。取得された教師フレーム情報は、局所特徴量と共に学習部２５０に出力される。 Returning to FIG. 2 again, when the local feature amount is calculated, subsequently, the teacher frame information input unit 120 acquires the teacher frame information (step S106). The teacher frame information is a specific example of “second information”. The acquired teacher frame information is output to the learning unit 250 together with the local feature amount.

以下では、学習部２５０における学習動作について、図７を参照して説明する。図７は、局所特徴量と対応付けられる教師フレーム情報を示す図である。 Hereinafter, the learning operation in the learning unit 250 will be described with reference to FIG. FIG. 7 is a diagram illustrating teacher frame information associated with local feature amounts.

図７に示すように、学習部２５０では、局所特徴量と、教師フレーム情報とが、フレーム毎に対のデータとしてセットされる（ステップＳ１０７）。これにより、局所特徴量と異常音の発生タイミングとが対応付けられる。 As shown in FIG. 7, in the learning unit 250, the local feature amount and the teacher frame information are set as paired data for each frame (step S107). Thus, the local feature amount and the timing of occurrence of the abnormal sound are associated with each other.

より具体的には、異常音が発生しているタイミングを示す教師フレーム情報と対応付けられた局所特徴量ベクトルは、異常音が発生している場合の局所特徴量として学習されることになる。他方、異常音が発生していないタイミングを示す教師フレーム情報と対応付けられた局所特徴量ベクトルは、異常音が発生していない場合の局所特徴量として学習されることになる。 More specifically, the local feature value vector associated with the teacher frame information indicating the timing at which the abnormal sound is generated is learned as the local feature value when the abnormal sound is generated. On the other hand, the local feature value vector associated with the teacher frame information indicating the timing at which no abnormal sound is generated is learned as the local feature value when no abnormal sound is generated.

再び図２に戻り、上述した局所特徴量と教師フレーム情報との対応づけはフレーム毎に実行されるため、セットが完了すると、全てのフレームについてセットが完了したか否かが判定され（ステップＳ１０８）、全フレームについてセットが完了していない場合には（ステップＳ１０８：ＮＯ）、未セットのフレームについて再びステップＳ１０３及びＳ１０４以降の処理が繰り返される。 Returning to FIG. 2 again, since the association between the local feature amount and the teacher frame information is performed for each frame, when the setting is completed, it is determined whether or not the setting has been completed for all the frames (step S108). If the setting has not been completed for all the frames (step S108: NO), the processing from step S103 and S104 is repeated again for the unset frames.

一方、全てのフレームについてセットが完了した場合には（ステップＳ１０８：ＹＥＳ）、複数の教師データ全てについて処理が完了したか否かが判定される（ステップＳ１０９）。そして、全ての教師データについて処理が完了していないと判定された場合には（ステップＳ１０９：ＮＯ）、ステップＳ１０１以降の処理が再び実行される。このように処理を繰り返すことで、全ての教師データの、全てのフレームについて、対応づけが行われることになる。 On the other hand, when the setting has been completed for all the frames (step S108: YES), it is determined whether or not the processing has been completed for all the plurality of teacher data (step S109). If it is determined that the processing has not been completed for all the teacher data (step S109: NO), the processing from step S101 is executed again. By repeating the processing in this way, all the frames of the teacher data are associated with each other.

その後、実際に学習部２５０による学習処理が実行される（ステップＳ１１０）。学習処理は、例えばＡｄａＢｏｏｓｔ等の機械学習アルゴリズムを利用して行われる。なお、学習処理には、上述したＡｄａＢｏｏｓｔの他、既存の手法を適宜採用することができるため、ここでの詳細な説明は省略する。 Then, the learning process is actually performed by the learning unit 250 (step S110). The learning process is performed using a machine learning algorithm such as AdaBoost. In the learning process, an existing method other than the above-described AdaBoost can be appropriately used, and thus a detailed description thereof is omitted.

上述した学習処理は異常音の音種毎に行われるため、処理終了後には、全音種について学習処理が終了したか否かが判定され、（ステップＳ１１１）、全音種について学習処理が完了していない場合には（ステップＳ１１１：ＮＯ）、未完了の音種について再びステップＳ１１０の学習処理が実行される。 Since the above-described learning processing is performed for each abnormal sound type, it is determined whether or not the learning processing has been completed for all sound types after the processing is completed (step S111), and the learning processing has been completed for all sound types. If not (step S111: NO), the learning process of step S110 is executed again for uncompleted tone types.

一方、全ての音種について学習処理が完了した場合には（ステップＳ１１１：ＹＥＳ）、生体音解析装置に学習結果が出力される（ステップＳ１１２）。 On the other hand, when the learning process has been completed for all the sound types (step S111: YES), the learning result is output to the body sound analyzer (step S112).

＜フレーム判定学習による効果＞
上述したフレーム判定学習の学習結果は、生体音解析装置による呼吸音の解析に利用される。呼吸音の解析時には、上述したフレーム判定学習動作における教師音声信号に対する処理と同様の処理が、解析対象である呼吸音信号に対して実行される。具体的には、入力された呼吸音信号がフレーム分割され、フレームごとに局所特徴量が算出される。<Effects of frame judgment learning>
The learning result of the above-described frame determination learning is used for analysis of respiratory sounds by the body sound analyzer. At the time of analyzing the breathing sound, the same processing as that for the teacher voice signal in the above-described frame determination learning operation is performed on the breathing sound signal to be analyzed. Specifically, the input respiratory sound signal is divided into frames, and a local feature value is calculated for each frame.

本実施例では、上述した学習動作により、局所特徴量と異常音の発生タイミングとの関係が学習されているため、呼吸音から算出された局所特徴量を利用して、好適に異常音の発生タイミング（言い換えれば、異常音が発生しているフレームの時間的位置）を検出することができる。よって、学習動作を事前に行わない場合と比べると、極めて正確に異常音の判別を行うことができる。 In the present embodiment, since the relationship between the local feature value and the occurrence timing of the abnormal sound is learned by the above-described learning operation, the generation of the abnormal sound is preferably performed using the local feature value calculated from the respiratory sound. The timing (in other words, the temporal position of the frame where the abnormal sound occurs) can be detected. Therefore, the abnormal sound can be determined very accurately as compared with the case where the learning operation is not performed in advance.

＜最適閾値の決定＞
次に、本実施例に係る最適閾値決定動作について、図８及び図１０を参照して説明する、なお、最適閾値決定動作とは、異常音が発生していると判定された期間の割合に基づいて、実際の異常音の発生の有無を判定する際に用いられる閾値を、最適な値として決定するための動作である。<Determination of optimal threshold>
Next, the optimal threshold value determining operation according to the present embodiment will be described with reference to FIGS. 8 and 10. Note that the optimal threshold value determining operation is defined as the ratio of the period during which it is determined that an abnormal sound is occurring. This is an operation for determining a threshold value used for determining whether or not an actual abnormal sound is actually generated as an optimum value based on the threshold value.

＜閾値決定部の構成＞
まず、最適閾値決定動作を実行する閾値決定部の構成について、図８を参照して説明する。図８は、本実施例に係る閾値決定部の構成を示すブロック図である。<Configuration of threshold value determination unit>
First, the configuration of the threshold value determining unit that performs the optimal threshold value determining operation will be described with reference to FIG. FIG. 8 is a block diagram illustrating a configuration of the threshold value determination unit according to the present embodiment.

図８に示すように、本実施例に係る閾値決定部は、フレーム判定結果入力部４１０と、教師全体情報入力部４２０と、決定部５００と、閾値出力部６００とを備えて構成されている。 As illustrated in FIG. 8, the threshold determination unit according to the present embodiment includes a frame determination result input unit 410, an entire teacher information input unit 420, a determination unit 500, and a threshold output unit 600. .

フレーム判定結果入力部４１０は、学習結果を利用した教師音声信号のフレーム判定処理（即ち、フレーム毎に異常音が発生しているか否かを判定する処理）の結果を取得して、決定部５００に出力可能に構成されている。 The frame determination result input unit 410 acquires the result of the frame determination process of the teacher voice signal using the learning result (that is, the process of determining whether or not an abnormal sound is generated for each frame), and determines the determination unit 500. It is configured to be able to output to.

教師全体情報入力部１２０は、教師データに含まれる教師全体情報を取得して、決定部５００に出力可能に構成されている。 The teacher entire information input unit 120 is configured to acquire the teacher entire information included in the teacher data and output the acquired teacher information to the determination unit 500.

決定部５００は、大局特徴量算出部５１０と、ＲＯＣ解析（Receiver Operating Characteristic analysis）部５２０と、最適閾値算出部５３０とを備えて構成されている。 The determination unit 500 includes a global feature amount calculation unit 510, an ROC analysis (Receiver Operating Characteristic analysis) unit 520, and an optimum threshold value calculation unit 530.

大局特徴量算出部５１０は、フレーム判定処理の判定結果に基づいて、教師音声信号が入力された期間に対する、異常音が発生している発生時間の割合を算出可能に構成されている。大局特徴量算出部５１０で算出された異常音の発生時間の割合を示す情報は、ＲＯＣ解析部５２０に出力される構成となっている。 The global feature amount calculation unit 510 is configured to be able to calculate the ratio of the occurrence time during which the abnormal sound is generated to the period during which the teacher voice signal is input, based on the determination result of the frame determination process. Information indicating the ratio of the abnormal sound occurrence time calculated by the global feature amount calculation unit 510 is output to the ROC analysis unit 520.

ＲＯＣ解析部５２０は、異常音の発生時間の割合を示す情報と教師全体情報との関係に基づいて、異常音の発生時間の割合に対するある閾値とその閾値を用いて判別を行った場合の判別性能との関係性をＲＯＣ曲線として取得するＲＯＣ解析を実行可能に構成されている。なお、ＲＯＣ解析については既存の技術であるため、ここでの詳細な説明は省略する。ＲＯＣ解析部５２０の解析結果は、最適閾値算出部５３０に出力される構成となっている。 The ROC analysis unit 520 determines, based on the relationship between the information indicating the ratio of the abnormal sound occurrence time and the entire teacher information, a certain threshold value for the ratio of the abnormal sound occurrence time and a determination using the threshold value. An ROC analysis for acquiring a relationship with performance as an ROC curve is configured to be executable. Since the ROC analysis is an existing technique, a detailed description is omitted here. The analysis result of the ROC analysis unit 520 is output to the optimum threshold value calculation unit 530.

最適閾値算出部５３０は、ＲＯＣ解析部５２０の解析結果を利用して、異常音が発生している発生時間の割合から、異常音が実際に発生しているか否かを判定するための最適な閾値（ＲＯＣ曲線において基準点（０，１）に最も近い点を与える閾値）を算出することが可能に構成されている（図９参照）。最適閾値算出部５３０で算出された閾値は、閾値出力部６００に出力される構成となっている。 The optimum threshold value calculation unit 530 uses the analysis result of the ROC analysis unit 520 to determine, from the ratio of the occurrence time of the abnormal sound, whether or not the abnormal sound is actually occurring. A threshold (a threshold that gives a point closest to the reference point (0, 1) in the ROC curve) can be calculated (see FIG. 9). The threshold calculated by the optimum threshold calculator 530 is output to the threshold output unit 600.

閾値出力部６００は、最適閾値算出部５３０で算出された閾値を、生体音の解析に利用できるような態様で出力可能に構成されている。例えば、閾値出力部３００は、最適閾値算出部５３０で算出された閾値を、生体音解析装置のメモリ等に出力可能に構成されている。 The threshold value output unit 600 is configured to be able to output the threshold value calculated by the optimum threshold value calculation unit 530 in such a manner that it can be used for analysis of body sounds. For example, the threshold output unit 300 is configured to be able to output the threshold calculated by the optimum threshold calculator 530 to a memory or the like of the body sound analyzer.

＜動作説明＞
次に、上述した閾値決定部で実行される最適閾値決定動作の流れについて、図１０を参照して説明する。図１０は、本実施例に係る最適閾値決定動作の流れを示すフローチャートである。<Operation description>
Next, the flow of the optimal threshold value determining operation performed by the above-described threshold value determining unit will be described with reference to FIG. FIG. 10 is a flowchart illustrating the flow of the optimum threshold value determining operation according to the present embodiment.

図１０に示すように、本実施例に係る最適閾値決定動作時には、まずフレーム判定結果入力部４１０によって教師音声信号のフレーム判定結果が取得される（ステップＳ２０１）。続いて、全フレームのフレーム判定結果が取得されたか否かが判定される（ステップＳ２０２）。全フレームのフレーム判定結果が取得されていない場合には（ステップＳ２０２：ＮＯ）、未取得のフレームについて再びステップＳ２０１の処理が実行される。 As shown in FIG. 10, at the time of the optimum threshold value determining operation according to the present embodiment, first, the frame determination result of the teacher voice signal is acquired by the frame determination result input unit 410 (step S201). Subsequently, it is determined whether or not frame determination results for all frames have been obtained (step S202). If the frame determination results for all the frames have not been acquired (step S202: NO), the process of step S201 is executed again for the frames that have not been acquired.

一方で、全フレームのフレーム判定結果が取得されている場合には（ステップＳ２０２：ＹＥＳ）、取得されたフレーム判定結果が、大局特徴量算出部５１０に出力され、教師音声信号の区間全体のフレーム数に対する、異常音が発生していると判定されたフレームの数の割合（即ち、大局特徴量）が算出される（ステップＳ２０３）。大局特徴量は、「第４情報」の一具体例である。 On the other hand, if the frame determination results of all the frames have been obtained (step S202: YES), the obtained frame determination results are output to the global feature amount calculation unit 510, and the frames of the entire section of the teacher voice signal are output. The ratio of the number of frames determined to have an abnormal sound to the number (that is, global feature amount) is calculated (step S203). The global feature is a specific example of “fourth information”.

続いて、教師全体情報入力部４２０によって教師データに含まれる教師全体情報が取得される（ステップＳ２０４）。教師全体情報は、「第３情報」の一具体例である。 Subsequently, the entire teacher information included in the teacher data is acquired by the entire teacher information input unit 420 (step S204). The teacher entire information is a specific example of “third information”.

教師全体情報は、大局特徴量と共にＲＯＣ解析部５２０に出力され、大局特徴量と教師全体情報が対のデータとしてセットされる（ステップＳ２０５）。即ち、異常音が発生していると判定されたフレームの数の割合と、異常音の発生の有無を示す情報とが対応づけられる。その後、全教師データについてセットが完了したか否かが判定される（ステップＳ２０６）。全教師データのセットが完了していない場合には（ステップＳ２０６：ＮＯ）、未セットの教師データについて再びステップＳ２０１以降の処理が実行される。 The overall teacher information is output to the ROC analysis unit 520 together with the global feature amount, and the global feature amount and the overall teacher information are set as a pair of data (step S205). That is, the ratio of the number of frames determined to have an abnormal sound is associated with information indicating whether or not an abnormal sound has occurred. Thereafter, it is determined whether or not the setting has been completed for all the teacher data (step S206). If the setting of all the teacher data has not been completed (step S206: NO), the processing from step S201 is executed again on the unset teacher data.

一方で、全教師データのセットが完了している場合には（ステップＳ２０６：ＹＥＳ）、ＲＯＣ解析部５２０によるＲＯＣ解析が実行され、閾値と判別性能の関係がＲＯＣ曲線として求められる（ステップＳ２０７）。ＲＯＣ解析が終了すると、最適閾値算出部５３０によってＲＯＣ解析結果に応じた最適な閾値が算出される（ステップＳ２０８）。 On the other hand, if the setting of all the teacher data has been completed (step S206: YES), the ROC analysis is performed by the ROC analysis unit 520, and the relationship between the threshold and the discrimination performance is obtained as an ROC curve (step S207). . When the ROC analysis ends, the optimum threshold value is calculated by the optimum threshold value calculation unit 530 according to the ROC analysis result (step S208).

上述したＲＯＣ解析及び閾値算出処理は、異常音の音種毎に実行されるため、終了後には全音種について処理が終了したか否かが判定される（ステップＳ２０９）。そして、全音種について処理が完了していない場合には（ステップＳ２０９：ＮＯ）、未完了の音種について、ステップＳ２０７及びステップＳ２０８の処理が実行される。全音種について処理が完了している場合には（ステップＳ２０９：ＹＥＳ）、閾値が出力され（ステップＳ２１０）、一連の処理は終了する。 Since the above-described ROC analysis and threshold value calculation processing are performed for each sound type of the abnormal sound, it is determined whether or not the processing has been completed for all sound types after the end (step S209). If the processing has not been completed for all tone types (step S209: NO), the processing of steps S207 and S208 is executed for uncompleted tone types. If the processing has been completed for all tone types (step S209: YES), a threshold is output (step S210), and the series of processing ends.

＜最適閾値決定による効果＞
上述したように決定された閾値は、生体音解析装置による呼吸音の解析に利用される。例えば、呼吸音の解析時には、上述したフレーム判定学習動作における教師音声信号に対する処理と同様の処理が、解析対象である呼吸音信号に対して実行される。具体的には、入力された呼吸音信号のフレーム判定結果から、呼吸音の取得期間に対する、異常音が発生している期間の割合（即ち、大局特徴量）が算出される。この結果、例えば異常音が発生している期間の割合が多い場合には、実際に異常音が発生していると判定できる。一方で、異常音が発生している期間の割合が少ない場合には、フレーム単位では異常音が検出されているものの、実際には異常音が発生していないと判定できる。<Effect of determining the optimal threshold>
The threshold value determined as described above is used for analysis of respiratory sounds by the body sound analyzer. For example, when analyzing a breathing sound, the same processing as that for the teacher voice signal in the above-described frame determination learning operation is performed on the breathing sound signal to be analyzed. Specifically, the ratio of the period during which the abnormal sound is generated to the period during which the breathing sound is acquired (that is, the global feature amount) is calculated from the frame determination result of the input breathing sound signal. As a result, for example, when the ratio of the period during which the abnormal sound is generated is large, it can be determined that the abnormal sound is actually generated. On the other hand, when the ratio of the period during which the abnormal sound is generated is small, it can be determined that the abnormal sound is not actually generated although the abnormal sound is detected in the frame unit.

このような異常音の有無に関する判定は、上述した最適閾値決定動作によって決定された閾値との比較によって実現される。具体的には、異常音が発生している期間の割合が閾値より大きい場合には異常音が発生していると判定でき、閾値より小さい場合には異常音が発生していないと判定できる。ここで本実施例では特に、上述した最適閾値決定動作において、大局特徴量と教師全体データ（即ち、異常音の発生の有無を示す情報）との対応づけが行われ、その結果として最適な閾値が算出されている。よって、呼吸音から算出された大局特徴量に基づいて、極めて正確に異常音の発生の有無を判別することができる。 The determination regarding the presence or absence of such an abnormal sound is realized by comparison with the threshold determined by the above-described optimal threshold determination operation. Specifically, when the ratio of the period during which the abnormal sound occurs is greater than the threshold, it can be determined that the abnormal sound has occurred, and when it is smaller than the threshold, it can be determined that the abnormal sound has not occurred. Here, in the present embodiment, in particular, in the above-described optimal threshold value determining operation, the global feature amount and the entire teacher data (that is, information indicating whether or not an abnormal sound is generated) are associated with each other. Is calculated. Therefore, the presence or absence of occurrence of an abnormal sound can be determined extremely accurately based on the global feature amount calculated from the breathing sound.

本発明は、上述した実施形態に限られるものではなく、特許請求の範囲及び明細書全体から読み取れる発明の要旨或いは思想に反しない範囲で適宜変更可能であり、そのような変更を伴う生体音解析方法、プログラム、記憶媒体及び生体音解析装置もまた本発明の技術的範囲に含まれるものである。 The present invention is not limited to the above-described embodiment, and can be appropriately modified within a scope not contrary to the gist or idea of the invention which can be read from the claims and the entire specification. The method, the program, the storage medium, and the body sound analyzer are also included in the technical scope of the present invention.

１１０教師音声信号入力部
１２０教師フレーム情報入力部
２００処理部
２１０フレーム分割部
２２０第１局所特徴量算出部
２３０周波数解析部
２４０第２局所特徴量算出部
２５０学習部
３００学習結果出力部
４１０フレーム判定結果入力部
４２０教師全体情報入力部
５００決定部
５１０大局特徴量算出部
５２０ＲＯＣ解析部
５３０最適閾値算出部
６００閾値出力部Reference Signs List 110 teacher voice signal input unit 120 teacher frame information input unit 200 processing unit 210 frame division unit 220 first local feature calculation unit 230 frequency analysis unit 240 second local feature calculation unit 250 learning unit 300 learning result output unit 410 frame determination Result input unit 420 Teacher whole information input unit 500 Determination unit 510 Global feature amount calculation unit 520 ROC analysis unit 530 Optimal threshold calculation unit 600 Threshold output unit

Claims

A body sound analysis method used in a body sound analyzer for analyzing body sound,
A first acquisition step of acquiring first information that is information indicating a temporal change of a body sound;
A second acquisition step of acquiring second information indicating a timing at which an abnormal sound is occurring in the first information;
A learning step of learning a correspondence between the first information and the second information;
A determining step of determining an abnormal sound included in the input body sound information based on a learning result of the learning step;
A third obtaining step of obtaining third information indicating whether or not the abnormal sound has occurred in the first information;
A calculating step of calculating, based on the first information and the learning result obtained by the learning step, fourth information indicating a ratio of a period during which the abnormal sound is generated to a period during which the first information is obtained;
Based on the third information and the fourth information, a determining step of determining a threshold for determining whether the input body sound information includes an abnormal sound,
A body sound analysis method, comprising:

A first generation step of generating feature amount information indicating a feature amount in the first information based on the first information,
The living body according to claim 1, wherein the learning step learns a correspondence between the feature amount information and the second information instead of a correspondence between the first information and the second information. Sound analysis method.

A dividing step of dividing the first information and the second information into predetermined frame units;
The body sound analysis method according to claim 1, wherein the learning step performs learning in units of the predetermined frame.

A program for causing the body sound analyzer to execute the body sound analysis method according to claim 1.

A storage medium storing the program according to claim 4.

A first acquisition unit that acquires body sound information related to body sounds,
A determination unit configured to determine an abnormal sound included in the body sound based on a learning result;
With
The learning result is based on first information that is information indicating a temporal change of the body sound, and second information indicating a timing at which an abnormal sound in the body sound is generated. It is a learning result of learning a correspondence relationship with the second information,
A third obtaining unit that obtains third information indicating whether or not the abnormal sound has occurred in the first information;
A calculating unit configured to calculate, based on the first information and the learning result, fourth information indicating a ratio of a period during which the abnormal sound is generated to a period during which the first information is acquired;
A determination unit based on the third information and the fourth information, determines a threshold value for determining whether or not included before Kisei body sounds abnormal sound information,
A biological sound analyzer, further comprising: