WO2022195657A1

WO2022195657A1 - Muscle sound extraction method, muscle sound extraction device, and program

Info

Publication number: WO2022195657A1
Application number: PCT/JP2021/010332
Authority: WO
Inventors: 有信新島
Original assignee: 日本電信電話株式会社
Priority date: 2021-03-15
Filing date: 2021-03-15
Publication date: 2022-09-22

Abstract

In the muscle sound extraction method according to an embodiment, a computer executes: a measurement procedure for measuring sound in which muscle sound is contained; a feature amount extraction procedure for extracting a feature amount from the sound every prescribed time width; a determining procedure for determining, every aforementioned time width, whether or not the sound during the time width is abnormal sound, using an abnormal sound sensing model learned in advance and the feature amount of the time width; and a correction procedure for performing, on the sound determined to be abnormal sound, a correction for attenuating the sound.

Description

Muscle sound extraction method, muscle sound extraction device and program

The present invention relates to a muscle sound extraction method, a muscle sound extraction device, and a program.

Muscle sounds, which are minute vibrations generated when muscles contract, can be easily measured with a condenser microphone or the like, as described in Non-Patent Document 1, for example. In this case, a signal with a high SN ratio (Signal to Noise Ratio) can be measured by measuring muscle sounds in a noise-free environment such as a laboratory environment.

However, when measuring muscle sounds using a microphone in a general environment other than a laboratory environment, various noises may be mixed in. For example, there is a possibility that noises such as environmental sounds, conversational voices, and microphone contact noises are mixed. Since muscle sounds are in a low frequency band, the signal can be extracted using a bandpass filter of 1-250 Hz, etc. However, since noise also contains frequency components in the same band, applying a filter alone will not detect signals other than muscle sounds. There is a problem that the signal is also extracted and the SN ratio is lowered.

An embodiment of the present invention has been made in view of the above points, and aims to enable muscle sound extraction with a high SN ratio.

In order to achieve the above object, a muscle sound extraction method according to an embodiment includes: a measurement procedure for measuring a sound that includes muscle sounds; a determination procedure for determining whether or not the sound in the time width is an abnormal sound using an abnormal sound detection model learned in advance for each time width and the feature amount of the time width; and a correction procedure for performing correction for attenuating the sound determined to be an abnormal sound.

It is possible to extract muscle sounds with a high SN ratio.

It is a figure which shows an example of the hardware constitutions of the muscle sound extraction apparatus which concerns on this embodiment. It is a figure which shows an example of the functional structure of the muscle sound extraction apparatus which concerns on this embodiment. 6 is a flowchart showing an example of model learning processing according to the embodiment; 6 is a flow chart showing an example of muscle sound extraction processing according to the present embodiment. FIG. 10 is a diagram showing an example of application results for muscle sounds; FIG. 10 is a diagram showing an example of application results for microphone contact noise;

An embodiment of the present invention will be described below. In this embodiment, a muscle sound extracting apparatus 10 that enables muscle sound extraction with a high SN ratio even under a general environment where noise may be mixed will be described. Here, the muscle sound extraction apparatus 10 according to the present embodiment applies a machine learning algorithm for abnormal sound detection to sounds (speech) including muscle sounds for each predetermined window width, and extracts the sound of the window width. Determine whether it is muscle sound or noise. By correcting and reducing the amplitude of sounds determined to be noise, muscle sounds with a high SN ratio can be extracted.

Note that the muscle sound extraction apparatus 10 according to the present embodiment includes a model learning time for learning a model for detecting an abnormal sound by a machine learning algorithm (hereinafter referred to as an "abnormal sound detection model"), and an abnormal sound detection model. There is a muscle sound extraction time in which a muscle sound with a high SN ratio is extracted using a detection model. Although the same muscle sound extraction device 10 performs model learning and muscle sound extraction will be described below, model learning and muscle sound extraction may be performed by different devices. Also, at this time, a device that performs model learning may be called a “learning device”, a “model learning device”, or the like.

<Hardware Configuration of Muscle Sound Extraction Device 10>
First, the hardware configuration of the muscle sound extraction device 10 according to this embodiment will be described with reference to FIG. FIG. 1 is a diagram showing an example of the hardware configuration of a muscle sound extraction device 10 according to this embodiment.

As shown in FIG. 1, the muscle sound extraction device 10 according to this embodiment includes an input device 101, a display device 102, an external I/F 103, a communication I/F 104, a processor 105, and a memory device 106. have. Each of these pieces of hardware is communicably connected via a bus 107 .

The input device 101 is, for example, a keyboard, mouse, touch panel, or the like. The display device 102 is, for example, a display. Note that the muscle sound extraction device 10 may not have at least one of the input device 101 and the display device 102 .

The external I/F 103 is an interface with various external devices such as a microphone 108. The muscle sound extraction device 10 can measure the sound as a signal with the microphone 108 via the external I/F 103 . Examples of external devices other than the microphone 108 include various recording media (CD (Compact Disc), DVD (Digital Versatile Disk), SD memory card (Secure Digital memory card), USB (Universal Serial Bus) memory card etc.).

The communication I/F 104 is an interface for connecting the muscle sound extraction device 10 to a communication network. The processor 105 is, for example, various arithmetic units such as a CPU (Central Processing Unit) and a GPU (Graphics Processing Unit). The memory device 106 is, for example, various storage devices such as HDD (Hard Disk Drive), SSD (Solid State Drive), RAM (Random Access Memory), ROM (Read Only Memory), and flash memory.

By having the hardware configuration shown in FIG. 1, the muscle sound extraction device 10 according to the present embodiment can implement various types of processing described later. The hardware configuration shown in FIG. 1 is merely an example, and the muscle sound extraction device 10 may have other hardware configurations. For example, the muscle sound extraction device 10 may have multiple processors 105 and may have multiple memory devices 106 .

<Functional Configuration of Muscle Sound Extracting Device 10>
Next, the functional configuration of the muscle sound extraction device 10 according to this embodiment will be described with reference to FIG. FIG. 2 is a diagram showing an example of the functional configuration of the muscle sound extraction device 10 according to this embodiment.

As shown in FIG. 2, the muscle sound extraction device 10 according to the present embodiment includes a measurement unit 201, a feature amount extraction unit 202, a model learning unit 203, an abnormal sound detection unit 204, a sound correction unit 205, and an output unit 206 . These units are implemented by, for example, processing that one or more programs installed in the muscle sound extraction device 10 cause the processor 105 to execute.

The muscle sound extraction device 10 according to this embodiment also has a model storage unit 207 . This model storage unit 207 is realized by the memory device 106, for example. However, the model storage unit 207 may be realized by, for example, a database server or the like connected to the muscle sound extraction device 10 via a communication network.

The measurement unit 201 measures sound as a signal using the microphone 108 . The feature quantity extraction unit 202 extracts the feature quantity of the signal measured by the measurement unit 201 . During model learning, the model learning unit 203 learns the parameters of the abnormal sound detection model using the feature amount extracted by the feature amount extraction unit 202 . The parameters of this learned abnormal sound detection model (hereinafter also referred to as “learned model parameters”) are stored in the model storage unit 207 . At the time of muscle sound extraction, the abnormal sound detection unit 204 uses the learned model parameters stored in the model storage unit 207 and the feature amount extracted by the feature amount extraction unit 202 to determine the window width of each window. Determine whether the sound is muscle sound or noise (abnormal sound). The sound correction unit 205 corrects the amplitude of the sound determined as noise by the abnormal sound detection unit 204 . The output unit 206 outputs a signal of the sound of the window width (corrected sound if corrected by the sound correction unit 205) to a predetermined output destination. Note that this output destination can be any predetermined output destination. For example, the sound signal may be stored in the memory device 106 or the like, may be transmitted to another device connected via a communication network, or may be displayed on the display device 102 as a waveform or the like. Alternatively, it may be output as a sound wave from a speaker or the like.

<Model learning processing>
Next, model learning processing for learning an abnormal sound detection model will be described with reference to FIG. FIG. 3 is a flowchart showing an example of model learning processing according to this embodiment.

First, the measurement unit 201 measures muscle sound as a signal using the microphone 108 (step S101). In this step S101, muscle sounds are measured in an environment without noise, such as a laboratory environment.

Next, the feature quantity extraction unit 202 extracts the feature quantity of the signal measured in step S101 above for each predetermined window width (step S102). Here, the feature amount extraction unit 202 applies a Hanning window for each predetermined window width (for example, a time width of 500 ms or the like), and then performs fast Fourier transform (FFT) on each window. Then, the feature amount extraction unit 202 uses, for each window, a vector whose elements are values obtained by normalizing the power of each frequency band to 0 or more and 1 or less as a feature amount (feature vector). As a result, for example, the number of windows is N, the frequency _bands are _f ₁ , . _m ⁽ⁿ⁾ ^, feature quantities (f ₁ ⁽¹⁾ _, . . . , f _M ⁽¹⁾ ), _. ^(N) ) is obtained. It should be noted that these N feature amounts are information representing the features of muscle sounds, and thus can be regarded as learning data of positive examples.

Then, the model learning unit 203 learns the parameters of the abnormal sound detection model by unsupervised learning using the feature amount (that is, learning data of positive examples) extracted in step S102 (step S103). Here, as the abnormal sound detection model, for example, a machine learning model learned by an unsupervised learning method such as One-class SVM may be used. Note that the learned model parameters of this abnormal sound detection model are stored in the model storage unit 207 .

As described above, during model learning, the muscle sound extraction apparatus 10 measures only muscle sounds, and then learns an abnormal sound detection model by an unsupervised learning method using feature values extracted from the muscle sounds. .

<Muscle sound extraction processing>
Next, a muscle sound extraction process for extracting muscle sounds using the abnormal sound detection model learned in the model learning process will be described with reference to FIG. FIG. 4 is a flowchart showing an example of muscle sound extraction processing according to the present embodiment. It is assumed that learned model parameters are stored in the model storage unit 207 .

First, the measurement unit 201 measures a sound including muscle sound as a signal using the microphone 108 (step S201). It should be noted that this step S201 corresponds to, for example, a case where muscle sounds are measured under a general environment including an environment in which noise may be mixed.

Next, the feature quantity extraction unit 202 extracts the feature quantity of the signal measured in step S201 above for each predetermined window width (step S202). Here, the feature quantity extraction unit 202 extracts the feature quantity of the signal measured in step S201, as in step S102 of FIG. That is, the feature quantity extraction unit 202 applies a Hanning window every predetermined window width (for example, a time width of 500 ms or the like), and then performs fast Fourier transform on each window. Then, the feature amount extraction unit 202 uses, for each window, a vector whose elements are values obtained by normalizing the power of each frequency band to 0 or more and 1 or less as a feature amount (feature vector). As a result, for example, the number of windows is N′, the frequency _bands are _f ₁ , . is f _m ⁽ⁿ⁾ ^, feature quantities (f ₁ ⁽¹⁾ _, . . . , f _M ⁽¹⁾ ), . , f _M ^(N′) ) is obtained.

The subsequent steps S203 to S206 are repeatedly executed for each window. Steps S203 to S206 for a certain window will be described below.

The abnormal sound detection unit 204 uses the learned model parameters stored in the model storage unit 207 and the feature amount of the window to detect the sound of the window width by the abnormal sound detection model in which the learned model parameters are set. It is determined whether the sound is muscle sound or abnormal sound (that is, noise) (step S203).

After that, when the sound in the window width is determined to be an abnormal sound in step S203 (NO in step S204), the sound correction unit 205 corrects the amplitude of the sound in the window width to 0 (step S205). ). On the other hand, if it is determined in step S203 that the sound in the window width is a muscle sound (YES in step S204), nothing is done (that is, the sound in the window width is not corrected). .).

Then, the output unit 206 outputs a signal of the sound of the window width (the corrected sound if corrected in step S205) to a predetermined output destination (step S206).

As described above, the muscle sound extraction apparatus 10 at the time of muscle sound extraction uses the feature amount extracted from the sound including the muscle sound to determine whether it is a muscle sound or noise by the abnormal sound detection model for each predetermined window width. . Then, when it is determined to be noise, the amplitude of the sound in the window width is corrected to zero. As a result, the noise is removed (attenuated) from the sound including the muscle sound, so that the muscle sound with a high SN ratio can be extracted.

<effect>
As an example, the result of applying the muscle sound extraction device 10 to sounds measured for 10 seconds at a sampling rate of 48 kHz with a microphone of a smartphone will be described. FIG. 5 shows the result of measuring only the muscle sound of the forearm, and FIG. 6 shows the result of measuring only the contact noise of the microphone.

As shown in Fig. 5, it can be seen that the signal is hardly attenuated for muscle sounds. On the other hand, as shown in FIG. 6, it can be seen that the contact noise of the microphone is almost attenuated. Therefore, according to the muscle sound extraction apparatus 10 according to the present embodiment, it is possible to attenuate only the noise while hardly attenuating the muscle sound, so that it is possible to extract the muscle sound with a high SN ratio.

The present invention is not limited to the specifically disclosed embodiments described above, and various modifications, alterations, combinations with known techniques, etc. are possible without departing from the scope of the claims. .

10 muscle sound extraction device 101 input device 102 display device 103 external I/F
104 Communication I/F
105 processor 106 memory device 107 bus 108 microphone 201 measurement unit 202 feature extraction unit 203 model learning unit 204 abnormal sound detection unit 205 sound correction unit 206 output unit 207 model storage unit

Claims

a measurement procedure for measuring a sound containing muscle sound;
A feature amount extraction procedure for extracting a feature amount from the sound for each predetermined time width;
a determination procedure for determining whether or not the sound in the time width is an abnormal sound using an abnormal sound detection model learned in advance for each time width and the feature amount of the time width;
a correction procedure for performing a correction for attenuating the sound determined to be the abnormal sound;
A computer-implemented muscle sound extraction method.
The method of extracting muscle sounds according to claim 1, wherein the abnormal sound detection model is a machine learning model learned by an unsupervised learning method using features extracted only from muscle sounds as positive examples.
The feature quantity extraction procedure includes:
3. The muscle sound extraction method according to claim 1, wherein for each time width, the sound is fast Fourier transformed, and the feature amount is extracted as a vector whose elements are values obtained by normalizing the power of each frequency band. .
The correction procedure includes:
4. The muscle sound extraction method according to any one of claims 1 to 3, wherein correction is performed so that the amplitude of the sound determined to be the abnormal sound is zero.
a measurement unit that measures sound containing muscle sound;
a feature quantity extraction unit that extracts a feature quantity from the sound for each predetermined time width;
a determination unit that determines whether or not the sound in the time width is an abnormal sound, using an abnormal sound detection model learned in advance for each time width and the feature amount of the time width;
a correction unit that performs correction for attenuating the sound determined to be the abnormal sound;
Muscle sound extraction device having
A program that causes a computer to execute the muscle sound extraction method according to any one of claims 1 to 4.