WO2017135127A1 - Bioacoustic extraction device, bioacoustic analysis device, bioacoustic extraction program, and computer-readable storage medium and stored device - Google Patents
Bioacoustic extraction device, bioacoustic analysis device, bioacoustic extraction program, and computer-readable storage medium and stored device Download PDFInfo
- Publication number
- WO2017135127A1 WO2017135127A1 PCT/JP2017/002592 JP2017002592W WO2017135127A1 WO 2017135127 A1 WO2017135127 A1 WO 2017135127A1 JP 2017002592 W JP2017002592 W JP 2017002592W WO 2017135127 A1 WO2017135127 A1 WO 2017135127A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bioacoustic
- data
- auditory image
- unit
- auditory
- Prior art date
Links
- UCJOAMOXKLJGST-UHFFFAOYSA-N CC(C)N=C(C)C Chemical compound CC(C)N=C(C)C UCJOAMOXKLJGST-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Detecting, measuring or recording devices for evaluating the respiratory organs
Definitions
- the present invention relates to a bioacoustic extraction device, a bioacoustic analysis device, a bioacoustic extraction program, a computer-readable recording medium, and a recorded device.
- Bioacoustic analysis is performed to analyze bioacoustics, which are sounds generated by humans, and to determine and analyze cases of diseases and diseases.
- the analysis target acoustic data includes only bioacoustic data.
- acoustics other than bioacoustics such as noise are excluded, and necessary acoustic data is obtained. Extraction work is required. If noise is included, it will affect the accuracy of case analysis, judgment, and diagnosis. However, even if the original acoustic data is removed together with the noise, it will also affect the reliability of judgment results and so on. Therefore, in the bioacoustic analysis, it is required to accurately select only bioacoustic data.
- Snoring sounds are taken up as an example of bioacoustics.
- SAS Sleep Apnea Syndrome
- OSAS Obstructive ⁇ Sleep Apnea ⁇ ⁇ Syndrome
- cardiovascular diseases such as hypertension, stroke, angina pectoris, and myocardial infarction.
- the present invention has been made to solve such conventional problems.
- the main objects of the present invention are a bioacoustic extraction apparatus, a bioacoustic analysis apparatus, a bioacoustic extraction program, a computer-readable recording medium, and a computer-readable recording medium that can accurately extract necessary bioacoustic data from acoustic data including bioacoustics. To provide recorded equipment.
- the bioacoustic extraction apparatus is a bioacoustic extraction apparatus for extracting necessary bioacoustic data from original acoustic data including bioacoustic data.
- An input unit for acquiring original acoustic data including bioacoustic data, a voiced section estimating unit for estimating a voiced section from the original acoustic data input from the input unit, and the voiced section estimating unit An auditory image generation unit that generates an auditory image according to an auditory image model based on the sounded section estimated in step S1, and an acoustic feature amount that extracts an acoustic feature amount from the auditory image generated by the auditory image generation unit
- An extraction unit, a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and a biological body based on a predetermined threshold with respect to the acoustic feature amount classified by the classification unit Determine whether it
- the auditory image generation unit is configured to generate a stabilized auditory image using an auditory image model, and the acoustic feature amount extraction The unit can extract the acoustic feature amount based on the stabilized auditory image generated by the auditory image generator.
- the auditory image generation unit is configured to further generate a generalized stabilized auditory image and an auditory spectrum from the stabilized auditory image.
- the acoustic feature amount extraction unit can extract an acoustic feature amount based on the overall stabilized auditory image generated by the auditory image generation unit and the auditory spectrum.
- the acoustic feature quantity extraction unit includes the kurtosis, skewness, spectrum centroid, spectrum band of the auditory spectrum and / or the overall stabilized auditory image. At least one of width, spectrum w flatness, spectrum roll-off, spectrum entropy, and octave-based spectrum contrast can be extracted as an acoustic feature amount.
- the auditory image generation unit is configured to generate a neural activity pattern using an auditory image model, and the acoustic feature amount extraction The unit can extract the acoustic feature amount based on the neural activity pattern generated by the auditory image generation unit.
- the acoustic feature quantity extraction unit can also extract at least one of the total number of peaks, the appearance position, the amplitude, the center of gravity, the inclination, the increase, and the decrease as the acoustic feature quantity obtained from the acoustic spectrum.
- the bioacoustic extraction device is a bioacoustic extraction device for extracting necessary bioacoustic data from original acoustic data including bioacoustic data.
- An input unit for acquiring original sound data including data, a sound interval estimation unit for estimating a sound interval from the original sound data input from the input unit, and the sound interval estimation unit An auditory image generator that generates an auditory image according to an auditory image model based on a voiced section, an auditory spectrum generator that generates an auditory spectrum for the auditory image generated by the auditory image generator, and an auditory image
- a generalized stabilized auditory image generating unit that generates a generalized stabilized auditory image, an auditory spectrum generated by the auditory spectrum generating unit, and a generalized stable auditory image generated by the generalized stabilized auditory image generating unit
- An acoustic feature amount extraction unit that extracts an acoustic feature amount from an auditory image, a classification unit that classifies
- bioacoustic extraction apparatus it is possible to extract a section having a period from the original acoustic data.
- the sounded section estimation unit performs preprocessing by differentiating or subtracting original sound data, and the preprocessing.
- a squarer for squaring the preprocessed data pre-processed by the detector, a downsampler for downsampling the squared data squared by the squarer, and the downsampled data downsampled by the downsampler And a median filter for obtaining a median value from.
- the input unit can be a non-contact microphone that is installed in a non-contact manner with the patient to be examined.
- the discrimination of the bioacoustic data by the discrimination unit can be performed as non-language processing.
- language processing such as speaker identification and speech recognition. Can be applied widely, regardless of language.
- the original acoustic data is a bioacoustic acquired when the patient sleeps, and is necessary from the bioacoustic data acquired under sleep. Bioacoustic data can be extracted.
- the original acoustic data is sleep-related sounds collected during sleep of the patient, and the bioacoustic data is snoring sound data.
- the predetermined type can be classified into a snoring sound and a non-snoring sound.
- bioacoustic analyzer for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data.
- An auditory image generation unit that generates an auditory image according to an auditory image model based on the estimated voiced section, and an acoustic feature amount extraction that extracts an acoustic feature amount from the auditory image generated by the auditory image generation unit
- a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and the acoustic feature amount classified by the classification unit based on a predetermined threshold Determine if it is data
- the true value data determined with the biological sound data in the determination unit may include a screening unit for performing screening.
- the bioacoustic analyzer for extracting and analyzing necessary bioacoustic data from the original acoustic data including the bioacoustic data is provided.
- An auditory image generator that generates an auditory image according to an auditory image model based on the estimated voiced section; an auditory spectrum generator that generates an auditory spectrum for the auditory image generated by the auditory image generator;
- a generalized stabilized auditory image generating unit that generates a generalized stabilized auditory image for the auditory image, an auditory spectrum generated by the auditory spectrum generating unit, and a generalized auditory image generating unit.
- An acoustic feature amount extraction unit that extracts an acoustic feature amount from the generalized stabilized auditory image, a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and a classification by the classification unit
- a discriminating unit that discriminates whether or not the acoustic feature quantity is bioacoustic data based on a predetermined threshold value, and screening that performs screening on true value data discriminated as bioacoustic data by the discriminating unit A portion.
- the screening unit can be configured to perform disease screening on bioacoustic data extracted from the original acoustic data.
- the screening unit performs screening for obstructive sleep apnea syndrome on the bioacoustic data extracted from the original acoustic data. Can be configured.
- a bioacoustic extraction method for extracting necessary bioacoustic data from original acoustic data including bioacoustic data.
- a step of acquiring original sound data including data, a step of estimating a sound section from the acquired original sound data, and generating an auditory image according to an auditory image model based on the estimated sound section A step of extracting an acoustic feature amount from the generated auditory image, a step of classifying the extracted acoustic feature amount into a predetermined type, and the classified acoustic feature amount. And determining whether or not the data is bioacoustic data based on a predetermined threshold value.
- a bioacoustic extraction method for extracting necessary bioacoustic data from original acoustic data including bioacoustic data.
- a step of generating, a step of generating an overall stabilized auditory image from the stabilized auditory image, a step of extracting a predetermined acoustic feature obtained from the generated overall stabilized auditory image, and the extracted A step of determining whether or not the acoustic feature value is bioacoustic data based on a predetermined threshold value.
- the bioacoustic extraction method of the nineteenth aspect of the present invention in the step of generating an auditory spectrum from the stabilized auditory image and extracting the predetermined acoustic feature amount, the overall stabilization In addition to the auditory image, a predetermined acoustic feature amount obtained from the generated auditory spectrum can be extracted.
- the acoustic features that contribute to the identification from the extracted acoustic feature amounts prior to the step of extracting the predetermined acoustic feature amount.
- a step of selecting an amount can be included.
- the step of determining whether or not the bioacoustic data is a classification of a snoring sound or a non-snoring sound using a multinomial distribution logistic regression analysis it can.
- the bioacoustic analysis method is a bioacoustic analysis method for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data.
- the true value data may include the step of screening.
- the step of performing the screening comprises an obstructive sleep apnea syndrome or a non-obstructive sleep apnea using a multinomial logistic regression analysis. Can be screened for syndrome.
- the bioacoustic extraction program is a bioacoustic extraction program for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
- An input function for acquiring original sound data including data, a sound section estimation function for estimating a sound section from the original sound data input by the input function, and the sound section estimation function
- An auditory image generation function for generating an auditory image according to an auditory image model based on a voiced section
- an acoustic feature amount extraction function for extracting an acoustic feature amount from the auditory image generated by the auditory image generation function
- a classification function for classifying the acoustic feature quantity extracted by the acoustic feature quantity extraction function into a predetermined type, and a biological sound based on a predetermined threshold with respect to the acoustic feature quantity classified by the classification function
- a discrimination function of discriminating whether the data or not can be realized on the computer.
- a bioacoustic extraction program for extracting necessary bioacoustic data from original acoustic data including bioacoustic data
- An input function for acquiring original sound data including data, a sound section estimation function for estimating a sound section from the original sound data input by the input function, and the sound section estimation function
- a stabilized auditory image generation function that generates a stabilized auditory image according to an auditory image model based on a voiced section, a function that generates a general stabilized auditory image from the stabilized auditory image, and the generated general stable image
- Function for extracting a predetermined acoustic feature amount from the auditory auditory image, and a classification function for classifying the predetermined acoustic feature amount extracted by the acoustic feature amount extraction function into a predetermined type , To the acoustic feature quantity classified in the classifier, and a determination function of determining whether the bio
- the bioacoustic analysis program is a bioacoustic analysis program for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data.
- An input function for acquiring original sound data including bioacoustic data, a sound section estimation function for estimating a sound section from the original sound data input by the input function, and the sound section estimation function A stabilized auditory image generation function that generates a stabilized auditory image according to an auditory image model based on the estimated voiced section, a function that generates a generalized stabilized auditory image from the stabilized auditory image, and the generated
- the acoustic feature extraction function for extracting a predetermined acoustic feature from the overall stabilized auditory image, and the predetermined acoustic feature extracted by the acoustic feature extraction function are classified into predetermined types.
- the function of screening data can be realized by a computer.
- a computer-readable recording medium or recorded device stores the above program.
- the program includes a program distributed in a download manner through a network line such as the Internet, in addition to a program stored and distributed in the recording medium.
- the recording medium includes a device capable of recording the program, for example, a general purpose or dedicated device in which the program is implemented in a state where the program can be executed in the form of software, firmware, or the like.
- a device capable of recording the program for example, a general purpose or dedicated device in which the program is implemented in a state where the program can be executed in the form of software, firmware, or the like.
- each process and function included in the program may be executed by computer-executable program software, or each part of the process or hardware may be executed by hardware such as a predetermined gate array (FPGA, ASIC), or program software.
- FPGA field-programmable gate array
- ASIC application specific integrated circuit
- FIG. 9A is the original acoustic data
- FIG. 9B is the preprocessed data
- FIG. 9A is the original acoustic data
- FIG. 9B is the preprocessed data
- FIG. 9C is the square data
- FIG. 9D is the down-sampling data
- FIG. 9E is a graph showing the median waveform.
- FIG. 10A is the original acoustic data
- FIG. 10B is the ZCR processing result according to Comparative Example 1
- FIG. 10C is the STE processing result according to Comparative Example 2
- FIG. 10D is the waveform of the voiced section estimation processing result according to Example 1. It is a graph which shows.
- the embodiments described below exemplify a bioacoustic extraction device, a bioacoustic analysis device, a bioacoustic extraction program, a computer-readable recording medium, and a recorded device for embodying the technical idea of the present invention.
- the bioacoustic extraction device, the bioacoustic analysis device, the bioacoustic extraction program, the computer-readable recording medium, and the recorded device are not specified as follows. Further, the present specification by no means specifies the members shown in the claims to the members of the embodiments.
- each element constituting the present invention may be configured such that a plurality of elements are constituted by the same member and the plurality of elements are shared by one member, and conversely, the function of one member is constituted by a plurality of members. It can also be realized by sharing.
- a bioacoustic extraction apparatus that automatically extracts snoring sound as ecological acoustic data to be extracted from sleep-related sound as original acoustic data will be described.
- a bioacoustic extraction apparatus according to an embodiment of the present invention is shown in the block diagram of FIG.
- the bioacoustic extraction apparatus 100 shown in this figure includes an input unit 10, a sound section estimation unit 20, an auditory image generation unit 30, an acoustic feature amount extraction unit 40, a classification unit 50, and a determination unit 60.
- the input unit 10 is a member for acquiring original acoustic data including bioacoustic data.
- the input unit 10 includes a microphone unit and a preamplifier unit, and inputs the collected original sound data to a computer constituting the bioacoustic extraction device 100.
- a non-contact microphone that is preferably installed in a non-contact manner with the patient to be examined can be used for the microphone section.
- the voiced section estimation unit 20 is a member for estimating a voiced section from the original acoustic data input from the input unit 10. As shown in the block diagram of FIG. 2, the voiced section estimation unit 20 performs pre-processing by differentiating or subtracting the original sound data and pre-processing data pre-processed by the pre-processing unit 21. To obtain a median value from the downsampled data downsampled by the downsampler 23, a downsampler 23 for downsampling the squared data squared by the squarer 22, Median filter 24.
- the auditory image generation unit 30 is a member for generating an auditory image according to the established auditory image model (AIM) based on the voiced section estimated by the voiced section estimation unit 20.
- AIM auditory image model
- the acoustic feature amount extraction unit 40 is a member for extracting feature amounts from the auditory image generated by the auditory image generation unit 30.
- the acoustic feature amount extraction unit 40 is generated by synchronously adding an auditory spectrum (AS) generated by synchronously adding a stabilized auditory image (Stabilized auditory image: SAI) in the horizontal axis direction and SAI in the vertical axis direction.
- the feature amount can be extracted based on the generalized stabilized auditory image (SSAI).
- SSAI generalized stabilized auditory image
- the acoustic feature quantity extraction unit 40 extracts at least one of kurtosis, distortion, spectrum centroid, spectrum bandwidth, spectrum flatness, spectrum roll-off, spectrum entropy, and OSC of the auditory spectrum as a feature quantity. .
- the acoustic feature quantity extraction unit 40 can also extract at least one of the total number of peaks, the appearance position, the amplitude, the center of gravity, the inclination, the increase, and the decrease as the feature quantity obtained from the acoustic spectrum.
- the classification unit 50 is a member for classifying the feature amount extracted by the acoustic feature amount extraction unit 40 into a predetermined type.
- the discriminating unit 60 is a member for discriminating whether or not the feature quantity classified by the classifying unit 50 is bioacoustic data based on a predetermined threshold value.
- Bioacoustic analyzer 110 it is possible to automatically extract a snoring sound with high accuracy by constructing the bioacoustic extraction device 100 that simulates from the human auditory pathway to the learning mechanism.
- a bioacoustic analysis device for analyzing bioacoustic data extracted by the bioacoustic extraction device can also be configured.
- the bioacoustic analysis apparatus 110 further includes a screening unit 70 that performs screening on true value data determined by the determination unit 60 as bioacoustic data.
- the bioacoustic extraction device and the bioacoustic analysis device described above can be implemented as software by a program in addition to being configured by dedicated hardware. For example, by installing a bioacoustic extraction program or a bioacoustic analysis program on a general-purpose or dedicated computer, loading or downloading and executing it, a virtual bioacoustic extraction device or bioacoustic analysis device is realized. You can also. (Acoustic analysis of conventional snoring sound)
- (I) a method using a network in which Mel-frequency cepstral coefficients (MFCC) and a hidden Markov model (HMM) are interconnected;
- Subband spectral energy a method using robust linear regression (RLR) or principal component analysis (PCA),
- RLR robust linear regression
- PCA principal component analysis
- FCM unsupervised Fuzzy C-Means
- (Iv) 34 feature amounts combining a plurality of acoustic analysis methods and a method using Ada Boost have been proposed.
- the snoring sound and the non-snoring sound can be automatically classified with high accuracy.
- the performance evaluation of sound classification methods is based on manual classification, which is considered a gold standard technique, that is, classification of manual work by human ears. Therefore, the present inventors have thought that a high-performance sound classifier can be constructed by imitating human hearing ability, and have achieved the present invention.
- a bioacoustic extraction device that can automatically classify snoring sounds / non-snoring sounds using an auditory image model (AIM) has been achieved.
- AIM auditory image model
- AIM is an auditory image model that imitates an “auditory image” that is considered to be an expression in the brain that humans use to perceive sound.
- AIM is a model of an auditory image that simulates the function from the peripheral system of the human auditory system including the cochlear basement membrane to the central system.
- Such AIM has been established mainly in research on hearing and spoken language perception, and is used in the field of speaker recognition and speech recognition, but is used to discriminate bioacoustics such as snoring sounds and intestinal sounds. There are no reported examples as far as the present inventors know. (Example)
- FIG. 3 shows a flowchart of the bioacoustic extraction method using the AIM according to the present embodiment. (Sound section estimation)
- the input unit 10 includes a non-contact type microphone that is a form of a microphone unit and a preamplifier unit, and collects the obtained audio data by a computer.
- the non-contact type microphone was installed at a position about 50 cm away from the patient's mouth.
- the microphone used for recording was Model NT3 manufactured by Australia RODE
- the preamplifier was Mobile-Pre USB manufactured by M-AUDIO, USA
- the sampling frequency during recording was 44.1 kHz
- the digital resolution was 16 bits / sample.
- the voiced section estimation unit 20 uses a short-term energy method (STE) and a median filter 24.
- the STE method is a method of detecting signal energy equal to or higher than a certain threshold value as a sound section.
- the k-th short-term energy Ek of the sleep-related sound s (n) can be expressed by the following equation.
- n is the sample number and N is the segment length.
- a 10th-order median filter was used to smooth Ek.
- the AE is extracted by detecting a sound having an SNR of 5 dB or more in the segment.
- the background noise is used as an average value of all frames of short-term energy obtained by performing the STE method from a signal of only background noise for one second.
- Non-Patent Document 2 it is reported that in a listening experiment in which the relationship between singing voice and voice identification and sound duration is investigated, the signal length is 200 ms or more and the identification rate exceeds 70%. Accordingly, in this embodiment, a detection sound having a signal length of 200 ms or more is defined as AE. (Generation of auditory image model)
- an auditory image is generated using an auditory image model (Auditory Image Model: AIM).
- the auditory image generation unit 30 analyzes a voiced section (AE) using AIM.
- AE voiced section
- an AIM simulator is provided by the Patterson group. Although the simulator can operate in a C language environment, in this embodiment, AIM 2006 ⁇ http://www.pdn.cam.ac.uk/groups/cnbh/aim2006 which can be used for MATLAB. /> (Module: gm2002, dcgc, hl, sf2003, ti2003) was used.
- PCP Pre-cochlea processing
- BMM Basilar membrane motion
- NAP Neural activity pattern
- STROBES Strobe identification
- SAI Stabilized auditory image
- AIM processing is shown in the block diagram of FIG.
- PCP pre-cochlea processing
- filter processing using a band pass filter is performed.
- filters are arranged at regular intervals, like an equivalent rectangular bandwidth (ERB), to represent the spectral analysis performed in the basement membrane of the cochlea.
- Auditory filter banks gamma-chirp filter bank, gamma tone filter bank
- the output from each filter in the filter bank can be obtained.
- a gamma chirp filter bank is used in which 50 filters having different center frequencies and bandwidths are arranged for each location between 100 Hz and 6000 Hz. Note that the number of filters to be used may be adjusted as appropriate.
- the output of each filter of the BMM is low-pass filtered and half-wave rectified to represent the neural signal conversion process performed by the inner hair cells.
- the auditory image when a local maximum point is detected in each frequency channel, a 35 ms frame is created with the local maximum point as the origin, and a buffer storing past NAP expressions is stored.
- the auditory image is generated by converting the time axis into the time interval axis by integrating with the information from the time.
- This series of processing is called STI (Strobed temporal integration), and the auditory image can be output as SAI for each frame.
- STI can generate a stable auditory image by time-integrating the NAP expression over time. Therefore, in the present embodiment, the auditory spectrum (AS) and SSAI of the 10th and subsequent frames of the auditory image obtained from one episode of AE are analyzed. (Hearing spectrum: AS)
- an auditory image is shown in FIG.
- the vertical axis represents the center frequency axis of the auditory filter
- the horizontal axis represents the time interval axis.
- AS is an expression corresponding to an excitation pattern of the auditory nerve, and is a spectrum in the frequency domain where the maximum point of the formant can be confirmed.
- the number of AS dimensions corresponds to the number of auditory filters.
- SSAI is a time-domain spectrum that has vertices only at specific intervals because the output of each channel includes only a limited time interval when the signal is stationary and periodic.
- the number of dimensions of SSAI is determined by the frame size and the sampling rate of the input signal.
- AS and SSAI are normalized with the maximum amplitude 1 in order to minimize the influence of the signal amplitude envelope between frames.
- step S304 the acoustic feature amount obtained from the AIM is extracted.
- AS and SSAI in each SAI frame of AE can be calculated.
- a method for extracting feature amounts from AS and SSAI will be described. Since AS and SSAI have a shape similar to a spectrum, features are extracted using the following eight types of feature amounts.
- Kurtosis is a feature value that measures the tendency of protrusion of the spectrum per average value.
- the formula for kurtosis is shown below.
- skewness is a characteristic amount for measuring the asymmetry of the spectrum per average value.
- the equation for skewness is shown below.
- the spectral centroid is a feature quantity for calculating the spectral centroid.
- the equation of the spectrum centroid is shown below.
- Spectral bandwidth is a feature quantity that quantifies the frequency bandwidth of a signal.
- the equation for the spectral bandwidth is shown below.
- Spectral flatness is a feature value that quantifies sound quality.
- the equation for spectral flatness is shown below.
- Spectral roll-off is a feature value for evaluating a frequency that occupies c ⁇ 100% of the entire band of the spectrum distribution.
- the equation for the spectrum roll-off is shown below.
- Spectral entropy is a feature that indicates the whiteness of the signal.
- the equation for spectral entropy is shown below.
- i is a spectrum sample point
- N is a total number of spectrum sample points
- k is a frame number
- X is a spectrum amplitude.
- X> 0 and c 0.95.
- микл ⁇ -based spectral contrast is a feature quantity that represents spectral contrast.
- the spectrum is divided into subbands by an octave filter bank.
- the number of subbands is set to 3 for AS and 5 for SSAI in consideration of the number of dimensions of the spectrum.
- the spectral peak Peak k (b), spectral valley Valley k (b), and spectral contrast OSC k (b) of the b-th subband are respectively expressed by the following equations.
- X ′ is a feature vector rearranged in descending order within the subband
- j is a sample point of the spectrum within the subband
- N b is the total number of sample points within the subband
- the spectral flatness was applied only to AS in this example because the value of SF k approached zero as much as possible when integrating SSAI and could not be quantified.
- the average value and standard deviation of each feature value are defined as the feature values obtained from AE. That is, (i) a 20-dimensional AS feature vector, (ii) a 22-dimensional SSAI feature vector, and (iii) a 42-dimensional feature vector can be extracted from the AE. In addition to these, it is also possible to use feature quantities obtained from the spectrum, such as spectral asymmetry, band energy ratio, and the like.
- each feature vector is referred to as (i) ASF: Auditory spectrum features, (ii) SSAIF: Summary SAI features, and (iii) AIMF: AIM features. (Classification of snoring / non-snoring using MLR)
- step S306 learning is performed based on the MLR model using the feature vector in step S306, the snoring sound / non-snoring sound classification using the MLR is performed in step S305, and the discrimination based on the threshold is performed in step S307.
- a multinomial distribution logistic regression using a feature vector extracted from the AE Multi-nomial logistic regression (MLR) analysis was used.
- MLR analysis is an excellent statistical analysis technique as a discriminator for binary identification that classifies a plurality of measurement values into one of two categories using a logistic curve.
- the equation of MLR is shown in the following equation.
- p indicates the probability that the sound to be classified is classified into the snoring sound category.
- a model of ⁇ d estimated by learning based on the maximum likelihood method and a dependent variable Y is constructed.
- each AE can be classified into one of two categories (snoring sound or non-snoring sound), and the classification of the snoring sound and the non-snoring sound can be performed by the classifier.
- This simulation was performed using Statistics Toolbox Version 9.0 of MATLAB (R2014a, The MathWorks, Inc., Natick, MA, USA).
- sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) are used as indicators of classification performance. It was.
- “Sensitivity” here is the ability of the discrimination result to detect snoring. Specificity is the proportion of non-snoring that results in a determination result equal to or less than a threshold value.
- Positive predictive value (PPV) represents the probability of actually snoring when the determination result is equal to or greater than a threshold value.
- negative predictive value (NPV) represents the probability of non-snoring when the determination result is equal to or less than a threshold value.
- the ROC (Receiver Operating Characteristic) curve is plotted with the false positive rate (1-specificity) on the horizontal axis and the true positive rate (sensitivity) on the vertical axis.
- the ROC curve can be constructed by P thre .
- the optimum threshold value of the ROC curve that is, the optimum P thre can be obtained using the method of Youden's index.
- the ROC curve draws a large arc at the upper left in the case of an ideal classification unit. Due to this property, an area under the curve (AUC), which is the area of the lower region of the ROC curve, can be used as an index representing the performance of the classifier or the classification algorithm.
- the AUC value takes a value in the range of 0.5 to 1, and is a classification accuracy evaluation index having a characteristic approaching 1 when the classification accuracy is good. (Learning dataset and test dataset)
- AEs extracted from 40 people are divided into a learning data set 16141 (snoring sound 13406, non-snoring sound 2735) and a test data set 10651 (snoring sound 7346, non-snoring sound 3305). . (labeling)
- the AE labeling work is performed based on the listening results.
- Three evaluators listened to the AE flowing from the headphones (SHURE SRH840) and labeled the AE by consensus. In this way, the snoring sound was not selected without everyone's consent.
- Table 2 shows the AEs that were determined to be non-snore during such labeling work.
- FIG. 6 shows the relationship between the feature vector index and Accuracy. From this figure, the feature quantity effective for the classification of the snoring sound / non-snoring sound can be understood.
- Table 4 shows feature vectors (ASF opt ′ , SSAIF opt. ) Extracted from AS and SSAI and contributing to accuracy improvement of 1% or more. From this result, it was confirmed that the number of AS dimensions can be reduced to 4 dimensions, and the number of dimensions of SSAI can be significantly reduced to 5 dimensions.
- AS auditory spectrum
- SSAI summary SAI
- OT optimum threshold
- TP true positive
- FP false positive
- TN true negative
- FN false negative
- S Sensitivity
- Spe. Specificity
- AUC area under the curve
- Acc. Accuracy
- PPV positive predictive value
- NPV negative predictive value
- Table 7 summarizes the classification performance and the accuracy Acc obtained by classifying the data of 40 subjects by the above-described method of this example. As is clear from this table, the subject data and conditions for classification are different, but the method of this example achieves classification accuracy superior to any reported example.
- Acc. Accuracy
- PPV positive predictive value
- MFCCs mel-frequency cepstrum coefficients
- OSAS obstructive sleep apnea syndrome
- SED subband energy distribution
- ACs autocorrelation coefficients
- LPCs linear predictive coding coefficients
- AIMF AIM feature (Duckitt et al.)
- Non-Patent Document 6 used only the breathing sound and silence period to classify the non-snoring sounds using normalized AC's, LPCs, etc., and the speech sounds and noises were 10 minutes or 20 minutes before the patient fell asleep. It is reported that it can be avoided by excluding it (Non-Patent Document 6). Therefore, in order to investigate the proportion of breathing sounds and other non-snoring sounds, a non-snoring sound in the database used in this example was subjected to an audibility evaluation test by three examples, breathing sounds, cough, voice The number of episodes was investigated by classifying into four classes (sleeping, moaning, speaking) and noise (bed squeak, metal, siren, etc.).
- Azarbarzin et al. Reported that the data set of simple snore sound and OSAS snore sound was classified using SED of 500 Hz, and the accuracy of 93.1% was obtained (Non-patent Document 9). However, the classification target data is extracted from only 15 minutes. In contrast, the present embodiment achieves 97.3% for data as long as 2 hours. (Dafna et al.)
- the result when both AS and SSAI are used is as follows. The highest accuracy is shown. Further, even when only SSAI is used, the accuracy of 94% is obtained, and SSAI information is also effectively used.
- the feature amount obtained from the acoustic spectrum for example, the feature amount corresponding to the total number, position, and amplitude of the peak, the center of gravity, inclination, and increase of the spectrum.
- the feature amount corresponding to the decrease can be extracted from the AS or SSAI.
- screening can be performed only for a section having a pitch (period) in the extracted snoring sounds.
- a section having a pitch may be extracted when the snoring sound is extracted in advance.
- a sound having an SNR of 5 dB or more in the segment is used as the AE, but a sound with SNR ⁇ 5 dB can also be used by using the sounded section detection method.
- a non-contact microphone was used to record sleep-related sounds. This approach is often discussed in sleep-related sound classification studies compared to contact microphones.
- the non-contact microphone has an advantage that recording can be performed without imposing a load on the subject, while the magnitude of the SNR during recording is a problem.
- noise reduction processing such as a spectral subtraction method is used for preprocessing as an approach to improve the SNR of a signal.
- the spectrum subtraction process generates synthesized speech called musical noise, and it becomes difficult to estimate the fundamental frequency at a low SNR.
- the gamma chirp filter bank used in the BMM of this embodiment can effectively extract voice from a noisy environment even with a low SNR such as -2 dB without causing musical noise. This is presumably because AIM has the characteristic of preserving the fine structure of periodic sounds rather than noise. AIM-based feature vectors are also reported to have higher noise suppression than MFCC. For the above reasons, it can be said that AIM has excellent noise resistance against recording in a real environment. (Sound section estimation)
- step S801 sleep related sounds are collected.
- the sleep related sound used as original sound data (FIG. 9A) is recorded from the patient during sleep using a non-contact type microphone.
- step S802 the original sound data is differentiated or differentiated.
- This process is performed by the pre-processor 21 shown in FIG.
- the original acoustic data of FIG. 9A is differentiated by a differentiator which is the pre-processor 21, and the signal waveform of the pre-processed data obtained as a result is shown in FIG. 9B.
- the difference can be performed by a first-order FIR (Finite Impulse Response) filter which is one of digital filters.
- FIR Finite Impulse Response
- step S803 the preprocess data is squared. This process is performed by the squarer 22 shown in FIG. FIG. 9C shows a signal waveform of the square data obtained as a result of squaring the preprocessed data in FIG. 9B by the squarer 22.
- step S804 the square data is down-sampled.
- This processing is performed by the downsampling device 23 shown in FIG. FIG. 9D shows a signal waveform of the down-sampling data obtained as a result of down-sampling the square data of FIG. 9C by the down-sampler 23.
- step S805 the median value is acquired from the downsampling data.
- This processing is performed by the median filter 24 shown in FIG. FIG. 9E shows a signal waveform obtained as a result of obtaining the median value by the median filter 24 from the down-sampling data of FIG. 9D.
- the median filter 24 shown in FIG. 9E shows a signal waveform obtained as a result of obtaining the median value by the median filter 24 from the down-sampling data of FIG. 9D.
- it is possible to accurately extract a sound section (breathing sound, snoring sound, etc.).
- the detection of the voiced section may be realized using a learning machine such as a neural network, other time series analysis techniques, and signal analysis / modeling techniques in addition to using the difference, the square, and the like as described above.
- a learning machine such as a neural network, other time series analysis techniques, and signal analysis / modeling techniques in addition to using the difference, the square, and the like as described above.
- a conventionally known method as a method for detecting a speech section is applied as a comparative example.
- a method using a zero-crossing rate (ZCR) is referred to as Comparative Example 1
- an STE method based on the energy of an audio signal is referred to as Comparative Example 2
- FIGS. 10B and 10C show each of the sound sections of the original acoustic data in FIG.
- FIGS. 10B and 10C The results of the automatic extraction are shown in FIGS. 10B and 10C, respectively.
- FIG. 10D shows an automatic extraction result obtained by the method according to Example 1 described above.
- These ZCR and STE methods are typical techniques as a voice section detection method for detecting only a voice section from a voice signal input from the outside. (ZCR)
- ZCR is defined by the following equation.
- the ZCR is mainly used in a scene where the ZCR of the voiced section to be uttered is considerably smaller than the ZCR of the silent section. This method depends on the type of voice (strong harmony structure). (STE method)
- Equation 1 the STE function of sound is defined as shown in Equation 1 above.
- This STE method is mainly used in a scene in which the value of E k in a voiced section is considerably larger than E k in a silent section, the SNR is high, and E k in a voiced section can be clearly read from background noise. Has been.
- sleep-related sound data is collected for 10 subjects as original sound data, and the length of the sleep-related sound is 120 s. It has been classified in advance as sound (Breath).
- screening based on the AIM that is, sieving is performed by the screening unit 70 of FIG. 1 on the data classified into the snoring sound and the non-snoring sound.
- the screening part 70 determines whether it is OSAS (obstructive sleep apnea syndrome) or non-OSAS from the snoring sound.
- Example 3 was performed in order to confirm whether the screening unit 70 can appropriately classify OSAS and non-OSAS.
- an AE data set extracted from 31 subjects was prepared as a data set for classifying snoring sounds and non-snoring sounds. Of these, 20 were used as learning data sets and 11 were used as test data sets.
- the data of the subject's sleep for 1 hour was extracted for 2 hours.
- AE is pre-labeled into snoring sound or non-snoring sound by hand. Details of the dataset used to classify snoring sounds and non-snoring sounds are shown in the table below.
- the data set used for OSAS and non-OSAS classification is shown in the following table.
- 50 subjects were used, 35 of which were used as the learning data set and 15 were used as the test data set.
- the classification performance of the snoring sound or the non-snoring sound by the classifying unit 50 and the discriminating unit 60 of FIG. 1 is extracted from a 6-dimensional feature vector (kurtosis, skewness, spectral centroid, spectral bandwidth, and SSAI extracted from AS. Kurtosis and skewness).
- a 6-dimensional feature vector kurtosis, skewness, spectral centroid, spectral bandwidth, and SSAI extracted from AS. Kurtosis and skewness.
- the eight-dimensional features of the snoring sound discriminated by the classifying unit 50 and the discriminating unit 60 of FIG. 1 (distortion degree extracted from AS, spectral centroid, spectral roll-off, kurtosis extracted from SSAI, Based on the skewness, spectral bandwidth, spectral roll-off, spectral entropy) vectors, the screening unit 70 classified OSAS and non-OSAS.
- the above-described combination of feature amounts can be used in the classification of the snoring sound or the non-snoring sound, and the classification of the OSAS and the non-OSAS.
- the above-described combination of feature amounts can be used.
- spectral asymmetry, band energy ratio, and the like can also be used.
- the evaluation of the OSAS and non-OSAS classification was performed by a 10-fold cross-validation test. Here, 9fold randomly selected from the data set was used for learning, and the remaining 1fold was used for testing.
- the threshold value used as the judgment standard of apnea-hypopnea index was set to 15 events / h and the OSAS patients were screened by the screening unit 70, as shown in Table 14 above. Excellent results were obtained with a sensitivity of 85.00% ⁇ 26.87 and a specificity of 95.00% ⁇ 15.81, confirming the usefulness of this example.
- the AHI is not limited to this value, and may be 5 events / h, 10 events / h, or the like. Further, in the analysis in the classification unit 50, the determination unit 60, and the screening unit 70, classification, determination, and sieving in consideration of the characteristics of each gender are possible.
- MLR multi-class classification problem
- a multi-class classification problem can be considered using the learning machine. For example, it can be classified directly into OSAS snoring (1), non-OSAS snoring (2), and non-snoring (3) based on the feature amount.
- automatic extraction can be classified into multiple classes such as snoring (1), breathing sound (2), and cough (3).
- Example 4 it was verified whether snoring / non-snoring discrimination and OSAS / non-OSAS discrimination were possible with the number of subjects increased, that is, with the subject database expanded.
- AE Audio event
- PSG polysomnography
- SAI Stabilized auditory image
- AS Auditory spectrum
- SSAI Summary stabilized auditory
- Each frame was normalized so that the maximum amplitude of AS and SSAI was 1.
- AS Auditory spectrum
- SSAI Summary stabilized auditory
- From AS eight characteristics of Kurtosis, Skewness, Spectral centroid, Spectral bandwidth, Spectral roll-off, Spectral entropy, Spectral contrast, Spectral flatness were used.
- SSAI seven feature values other than Spectral flatness were used.
- the average value of each feature amount is used as the feature amount obtained from the AE.
- a stepwise method which is a feature selection algorithm, was used for each male, female, and male / female data set.
- step S307 discrimination based on the threshold value was performed.
- Table 16 shows the performance evaluation results (Leave-one-out cross-validation) of automatic snoring sound classification using AIM.
- Table 17 shows the performance evaluation results of the OSAS screening based on the snoring sound, which was automatically extracted using the AIM by the method described above.
- Table 18 shows the performance evaluation results of OSAS screening using only the snoring sound, which is extracted manually without labeling and automatically extracted.
- Table 17 suggests that OSAS screening can be performed with high accuracy in any data set based on snoring sounds that are automatically extracted using only AIM, imitating human auditory ability. From Tables 17 and 18, it was found that the performance of OSAS screening based on snoring sounds manually extracted by labeling was higher in all subject sets than in the case of automatic extraction. This result suggests that the OSAS screening performance is improved by further improving the performance of automatic extraction of snoring sound using AIM. In order to improve the snoring automatic extraction performance, it is possible to change the normalization method in the AS and SSAI frames, for example, normalize in one episode instead of normalizing in frames. In addition of feature amounts, for example, signal processing such as pitch information and formant frequency information, speech recognition, feature vectors used for speech signal processing, and the like as described in Non-Patent Document 1 are possible. .
- AIM processing was performed for a high-sounding sound section.
- the SAI is obtained for every 35 ms frame by the AIM processing.
- AS and SSAI are obtained from each SAI.
- feature values (kurtosis, skewness, spectrum bandwidth, spectrum centroid, spectrum entropy, spectrum roll-off, spectrum flatness, etc.) are extracted from AS and SSAI, respectively. Since the feature amount is obtained by the number of frames, each feature amount obtained from one sound of the sounded section can be used as an average value and a standard deviation by averaging.
- the average value and standard deviation of feature values obtained from AS and SSAI were used.
- the snoring sound has been described as an example, but the subject of the present invention is not limited to the snoring sound, but can be used for other sounds (biological sound) generated by biological objects, and from the detected bioacoustics, It can be applied to discovery and diagnosis of various cases. For example, by detecting sleep sound, OSAS screening as described above and discrimination of sleep disorders can be performed. Diagnosis of asthma, pneumonia, etc. can be made from lung sounds, respiratory sounds, coughs, and the like. Alternatively, various heart diseases can be made from heart sounds, and various intestinal diseases such as functional gastrointestinal disorders can be screened by analyzing intestinal sounds. In addition, it can also be applied to detection of fetal movement sounds, muscle sounds, and the like.
- this invention can be utilized not only for a human but for other living organisms. For example, it can be suitably used for health examinations of pets and animals bred at zoos.
- the bioacoustic extraction device, the bioacoustic analysis device, the bioacoustic extraction program, the computer-readable recording medium, and the recorded device according to the present invention measure snoring sounds together with or in place of a patient's polysomnographic examination, It can utilize suitably as a use which diagnoses.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Pulmonology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Engineering & Computer Science (AREA)
- Physiology (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- General Health & Medical Sciences (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
Provided is a bioacoustic extraction device (100) for precisely extracting necessary bioacoustic data from raw acoustic data that includes the bioacoustic data. The bioacoustic extraction device (100) comprises: an input unit (10) for acquiring raw acoustic data that include bioacoustic data; a sound section estimation unit (20) that estimates a sound section, from the raw acoustic data input from the input unit (10); an auditory image generation unit (30) that generates an auditory image in accordance with an auditory image model, on the basis of the sound section estimated by the sound section estimation unit (20); an acoustic feature amount extraction unit (40) that extracts an acoustic feature amount in respect to the auditory image generated by the auditory image generation unit (30); a classification unit (50) that classifies, into a prescribed type, the acoustic feature amount extracted by the acoustic feature amount extraction unit (40); and a determination unit (60) that determines, on the basis of a prescribed threshold value, whether the acoustic feature amount classified by the classification unit (50) is bioacoustic data.
Description
本発明は、生体音響抽出装置、生体音響解析装置、生体音響抽出プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器に関する。
The present invention relates to a bioacoustic extraction device, a bioacoustic analysis device, a bioacoustic extraction program, a computer-readable recording medium, and a recorded device.
ヒトの発する音である生体音響を解析して、疾患や疾病の症例判定や解析等を行う生体音響解析が行われている。このような生体音響解析を行うにあたっては、解析対象の音響データに、生体音響のデータのみが含まれていること、いいかえると生体音響以外の音響、例えばノイズ等を排除し、必要な音響データを抽出する作業が必要となる。ノイズが含まれていると、症例解析や判定、診断の精度に影響を与え、かといって本来の音響データがノイズと共に除去されてしまっても、同様に判定結果等の信頼性に影響を生じることから、これらの生体音響解析に当たっては、生体音響のデータのみを正確に選別することが求められる。従来、このような作業は人手による手作業で行われていたため、多大な負担となっており、音響データ中から必要な生体音響データのみを精度よく自動抽出可能なシステムが求められている。しかしながら、実用的な精度で自動抽出可能な手法は、未だ確立されていない。
Bioacoustic analysis is performed to analyze bioacoustics, which are sounds generated by humans, and to determine and analyze cases of diseases and diseases. In performing such bioacoustic analysis, the analysis target acoustic data includes only bioacoustic data. In other words, acoustics other than bioacoustics such as noise are excluded, and necessary acoustic data is obtained. Extraction work is required. If noise is included, it will affect the accuracy of case analysis, judgment, and diagnosis. However, even if the original acoustic data is removed together with the noise, it will also affect the reliability of judgment results and so on. Therefore, in the bioacoustic analysis, it is required to accurately select only bioacoustic data. Conventionally, such work has been performed manually by hand, which is a heavy burden, and there is a need for a system that can automatically and accurately extract only necessary bioacoustic data from acoustic data. However, a method capable of automatic extraction with practical accuracy has not yet been established.
生体音響の一例として、いびき音を取り上げる。近年、睡眠時無呼吸症候群(Sleep Apnea Syndrome:SAS)が注目されており、これは睡眠時に一定の無呼吸又は低呼吸を伴い、日中の過度の眠気や睡眠中の窒息感や喘ぎ、反復する中途覚醒等の症状を引き起こす病である。特に閉塞性睡眠時無呼吸症候群(Obstructive Sleep Apnea Syndrome:OSAS)は、高血圧症、脳卒中、狭心症、心筋梗塞等循環器病を合併する危険が指摘されており、早期の発見が求められている(例えば特許文献1、非特許文献1~4)。
Snoring sounds are taken up as an example of bioacoustics. In recent years, Sleep Apnea Syndrome (SAS) has attracted attention, which is accompanied by constant apnea or hypopnea during sleep, excessive daytime sleepiness, feeling of suffocation and panting during sleep, repetitiveness It is a disease that causes symptoms such as awakening. Obstructive 無 Sleep Apnea 特 に Syndrome (OSAS), in particular, has been pointed out to be associated with cardiovascular diseases such as hypertension, stroke, angina pectoris, and myocardial infarction. (For example, Patent Document 1, Non-Patent Documents 1 to 4).
現在、SASの診断には終夜睡眠ポリグラフ(Polysomnography:PSG)を用いた検査が行われている。これは、患者の検査入院を必要とする大掛かりな検査で、費用がかかる上、一晩中電極を体に貼り付ける必要があるため、患者の体に負担がかかってしまう。具体的には、患者の鼻口気流、気管音、酸素飽和度等を記録する必要があるため、測定機器や測定センサを多数、患者に装着する必要があった。またSASの簡易検査は睡眠時に実施されるため、これらの測定センサの取付状況等は測定結果に大きな影響を与える。例えば患者の寝返り等で測定センサが外れたり、装着位置がずれたり、衣服が擦れる音を拾う等の問題が発生する。また、睡眠時に測定センサ等が患者に装着された状況は、身体的、精神的な負担、苦痛ともなって望ましくない。このため、より簡便な検査方法の実現が期待されている。その1つのアプローチとして、いびき音の音響解析による方法に近年注目が集まっている。
Currently, examination using a polysomnography (PSG) is performed overnight for diagnosis of SAS. This is a large-scale examination requiring hospitalization of the patient, which is expensive and requires the electrodes to be attached to the body all night, which places a burden on the patient's body. Specifically, since it is necessary to record the patient's nasal airflow, tracheal sound, oxygen saturation, etc., it was necessary to attach a large number of measuring devices and sensors to the patient. In addition, since the SAS simple inspection is performed during sleep, the mounting state of these measurement sensors greatly affects the measurement results. For example, problems such as the measurement sensor coming off due to the patient turning over, the wearing position being displaced, and the clothes being rubbed are picked up. In addition, the situation where a measurement sensor or the like is attached to a patient during sleep is not desirable due to physical and mental burdens and pain. For this reason, realization of a simpler inspection method is expected. As one approach, attention has recently been focused on methods based on acoustic analysis of snoring sounds.
しかしながら、これまでのいびき音の研究では、終夜録音した睡眠時の録音(Sleep Related Sound:SRS、以下「睡眠関連音」と呼ぶ。)データから注目するいびきエピソードを抽出する作業を手作業で行っていることから、データ収集時の負担が大きく、この結果、比較的小規模ないびき音データしか分析できなかった。
However, in the past research on snoring sounds, the work of extracting the snoring episodes of interest from the recorded data during sleep (Sleep Related : Sound: SRS, hereinafter referred to as “sleep-related sounds”) was performed manually. Therefore, the burden at the time of data collection is large, and as a result, only relatively small snoring sound data can be analyzed.
診断技術としていびき音の分祈を行うためには、就寝中の長時間に渡って録音されたいびき音について着目する必要がある。そのためには、これまで手作業で行っていたいびきエピソードの抽出作業を自動化することが必須となる。
In order to pray for snoring sounds as a diagnostic technique, it is necessary to pay attention to snoring sounds recorded over a long time while sleeping. To that end, it is essential to automate the work of extracting snoring episodes that have been performed manually.
一方で、患者のいびき音を負担なく収集する方法として、接触式のマイロフォンでなく、非接触式マイクロフォンを採用することが考えられる。しかしながら、非接触としたことで患者とマイクロフォンとの距離が、接触式のマイクロフォンと比べて相対的に大きくなる分だけ、いびき音の音量(音響スペクトルの振幅値)が小さくなり、寝言、咳、呼吸等、いびき音以外の患者から発生する音や、さらに患者以外に起因する物音、例えばベッドのきしむ音、金属音等のノイズの成分が高くなって、相対的に信号対雑音比(SNR)が悪化することが懸念される。よって、実際に生体音響解析を行うにあたっては、その前処理としてノイズ成分を除去する必要があるところ、SNRが悪化した音響データから、いびき音のみといった、必要な音響データを正確に抽出することは極めて困難であり、実用的な方法が求められていた。
On the other hand, it is conceivable to adopt a non-contact type microphone instead of a contact type myophone as a method for collecting the snoring sound of the patient without burden. However, since the distance between the patient and the microphone is relatively larger than that of the contact-type microphone due to the non-contact, the volume of the snoring sound (the amplitude value of the acoustic spectrum) is reduced, and sleep, cough, The signal-to-noise ratio (SNR) is relatively high due to the increase in noise components such as breathing and other sounds generated by patients other than snoring sounds and noises caused by non-patients such as bed squeaks and metal sounds. There is a concern that it will get worse. Therefore, when actually performing the bioacoustic analysis, it is necessary to remove the noise component as a pre-processing. However, it is not possible to accurately extract the necessary acoustic data such as the snoring sound from the acoustic data whose SNR has deteriorated. It was extremely difficult and a practical method was required.
本発明は、従来のこのような問題点を解決するためになされたものである。本発明の主な目的は、生体音響を含む音響データ中から精度よく必要な生体音響データを抽出可能な生体音響抽出装置、生体音響解析装置、生体音響抽出プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器を提供することにある。
The present invention has been made to solve such conventional problems. The main objects of the present invention are a bioacoustic extraction apparatus, a bioacoustic analysis apparatus, a bioacoustic extraction program, a computer-readable recording medium, and a computer-readable recording medium that can accurately extract necessary bioacoustic data from acoustic data including bioacoustics. To provide recorded equipment.
上記目的を達成するため、本発明の第1の形態に係る生体音響抽出装置によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出装置であって、生体音響データを含む元音響データを取得するための入力部と、前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、前記聴覚像生成部で生成された聴覚像に対して、音響特徴量を抽出する音響特徴量抽出部と、前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部とを備えることができる。上記構成により、元音響データを聴覚イメージモデルを用いて聴覚像に変換した上で音響特徴量に基づいて分類することで、ノイズと必要な生体音響データとを精度よく判別することができる。
To achieve the above object, the bioacoustic extraction apparatus according to the first aspect of the present invention is a bioacoustic extraction apparatus for extracting necessary bioacoustic data from original acoustic data including bioacoustic data. An input unit for acquiring original acoustic data including bioacoustic data, a voiced section estimating unit for estimating a voiced section from the original acoustic data input from the input unit, and the voiced section estimating unit An auditory image generation unit that generates an auditory image according to an auditory image model based on the sounded section estimated in step S1, and an acoustic feature amount that extracts an acoustic feature amount from the auditory image generated by the auditory image generation unit An extraction unit, a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and a biological body based on a predetermined threshold with respect to the acoustic feature amount classified by the classification unit Determine whether it is acoustic data It may include a that discrimination unit. With the above configuration, noise and necessary bioacoustic data can be accurately discriminated by converting the original acoustic data into an auditory image using an auditory image model and then classifying it based on the acoustic feature amount.
また、本発明の第2の形態に係る生体音響抽出装置によれば、前記聴覚像生成部が、聴覚イメージモデルを用いて安定化聴覚像を生成するよう構成しており、前記音響特徴量抽出部が、前記聴覚像生成部で生成された安定化聴覚像に基づいて、音響特徴量を抽出することができる。
Further, according to the bioacoustic extraction apparatus according to the second aspect of the present invention, the auditory image generation unit is configured to generate a stabilized auditory image using an auditory image model, and the acoustic feature amount extraction The unit can extract the acoustic feature amount based on the stabilized auditory image generated by the auditory image generator.
さらに、本発明の第3の形態に係る生体音響抽出装置によれば、前記聴覚像生成部が、さらに安定化聴覚像から、総括安定化聴覚像と、聴覚スペクトルを生成するよう構成しており、前記音響特徴量抽出部が、前記聴覚像生成部で生成された総括安定化聴覚像と、聴覚スペクトルに基づいて、音響特徴量を抽出することができる。
Furthermore, according to the bioacoustic extraction device according to the third aspect of the present invention, the auditory image generation unit is configured to further generate a generalized stabilized auditory image and an auditory spectrum from the stabilized auditory image. The acoustic feature amount extraction unit can extract an acoustic feature amount based on the overall stabilized auditory image generated by the auditory image generation unit and the auditory spectrum.
さらにまた、本発明の第4の形態に係る生体音響抽出装置によれば、前記音響特徴量抽出部が、聴覚スペクトル及び/又は総括安定化聴覚像の尖度、歪度、スペクトル重心、スペクトルバンド幅、スペクトル フラットネス、スペクトルロールオフ、スペクトルエントロピー、オクターブベースのスペクトルコントラストの少なくともいずれかを音響特徴量として抽出することができる。
Furthermore, according to the bioacoustic extraction device according to the fourth aspect of the present invention, the acoustic feature quantity extraction unit includes the kurtosis, skewness, spectrum centroid, spectrum band of the auditory spectrum and / or the overall stabilized auditory image. At least one of width, spectrum w flatness, spectrum roll-off, spectrum entropy, and octave-based spectrum contrast can be extracted as an acoustic feature amount.
さらにまた、本発明の第5の形態に係る生体音響抽出装置によれば、前記聴覚像生成部が、聴覚イメージモデルを用いて神経活動パターンを生成するよう構成しており、前記音響特徴量抽出部が、前記聴覚像生成部で生成された神経活動パターンに基づいて、音響特徴量を抽出することができる。上記構成により、安定化聴覚像を用いる場合に比べ、処理負荷を軽減して処理速度の向上を図ることが可能となる。また前記音響特徴量抽出部が、音響スペクトルから得られる音響特徴量として、ピークの総数、出現位置、振幅、重心、傾斜、増加、減少の少なくともいずれかを抽出することもできる。
Furthermore, according to the bioacoustic extraction apparatus according to the fifth aspect of the present invention, the auditory image generation unit is configured to generate a neural activity pattern using an auditory image model, and the acoustic feature amount extraction The unit can extract the acoustic feature amount based on the neural activity pattern generated by the auditory image generation unit. With the above configuration, it is possible to reduce the processing load and improve the processing speed as compared with the case where a stabilized auditory image is used. The acoustic feature quantity extraction unit can also extract at least one of the total number of peaks, the appearance position, the amplitude, the center of gravity, the inclination, the increase, and the decrease as the acoustic feature quantity obtained from the acoustic spectrum.
さらにまた、本発明の第6の形態に係る生体音響抽出装置によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出装置であって、生体音響データを含む元音響データを取得するための入力部と、前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、前記聴覚像生成部で生成された聴覚像に対して、聴覚スペクトルを生成する聴覚スペクトル生成部と、聴覚像に対して、総括安定化聴覚像を生成する総括安定化聴覚像生成部と、前記聴覚スペクトル生成部で生成された聴覚スペクトルと、前記総括安定化聴覚像生成部で生成された総括安定化聴覚像から、音響特徴量を抽出する音響特徴量抽出部と、前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部とを備えることができる。
Furthermore, the bioacoustic extraction device according to the sixth aspect of the present invention is a bioacoustic extraction device for extracting necessary bioacoustic data from original acoustic data including bioacoustic data. An input unit for acquiring original sound data including data, a sound interval estimation unit for estimating a sound interval from the original sound data input from the input unit, and the sound interval estimation unit An auditory image generator that generates an auditory image according to an auditory image model based on a voiced section, an auditory spectrum generator that generates an auditory spectrum for the auditory image generated by the auditory image generator, and an auditory image On the other hand, a generalized stabilized auditory image generating unit that generates a generalized stabilized auditory image, an auditory spectrum generated by the auditory spectrum generating unit, and a generalized stable auditory image generated by the generalized stabilized auditory image generating unit An acoustic feature amount extraction unit that extracts an acoustic feature amount from an auditory image, a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and an acoustic classified by the classification unit The feature amount can include a determination unit that determines whether the feature amount is bioacoustic data based on a predetermined threshold.
さらにまた、本発明の第7の形態に係る生体音響抽出装置によれば、元音響データの内、周期を有する区間を抽出するよう構成できる。
Furthermore, according to the bioacoustic extraction apparatus according to the seventh aspect of the present invention, it is possible to extract a section having a period from the original acoustic data.
さらにまた、本発明の第8の形態に係る生体音響抽出装置によれば、前記有音区間推定部が、元音響データを微分又は差分して前処理するための前処理器と、前記前処理器で前処理された前処理データを二乗するための二乗器と、前記二乗器で二乗された二乗データをダウンサンプリングするためのダウンサンプリング器と、前記ダウンサンプリング器でダウンサンプリングされたダウンサンプリングデータから中央値を取得するためのメディアンフィルタとを備えることができる。
Furthermore, according to the bioacoustic extraction apparatus according to the eighth aspect of the present invention, the sounded section estimation unit performs preprocessing by differentiating or subtracting original sound data, and the preprocessing. A squarer for squaring the preprocessed data pre-processed by the detector, a downsampler for downsampling the squared data squared by the squarer, and the downsampled data downsampled by the downsampler And a median filter for obtaining a median value from.
さらにまた、本発明の第9の形態に係る生体音響抽出装置によれば、前記入力部を、検査対象の患者と非接触に設置される非接触式マイクロフォンとできる。
Furthermore, according to the bioacoustic extraction device according to the ninth aspect of the present invention, the input unit can be a non-contact microphone that is installed in a non-contact manner with the patient to be examined.
さらにまた、本発明の第10の形態に係る生体音響抽出装置によれば、前記判別部による生体音響データの判別を、非言語処理とできる。このように、従来の音声信号に対する処理、例えば発話者の識別や音声認識といった言語に関する処理でなく、いびき音や腸音のような生体音響データに対する症例や疾患の処理において、聴覚イメージモデルに基づいた処理を適用でき、言語によらず広く適用できる。
Furthermore, according to the bioacoustic extraction device according to the tenth aspect of the present invention, the discrimination of the bioacoustic data by the discrimination unit can be performed as non-language processing. As described above, based on the auditory image model in processing of cases and diseases for bioacoustic data such as snoring sounds and intestinal sounds, instead of conventional processing on speech signals, for example, language processing such as speaker identification and speech recognition. Can be applied widely, regardless of language.
さらにまた、本発明の第11の形態に係る生体音響抽出装置によれば、元音響データが、患者の睡眠時に取得される生体音響であり、睡眠下に取得された生体音響データから、必要な生体音響データを抽出することができる。
Furthermore, according to the bioacoustic extraction device according to the eleventh aspect of the present invention, the original acoustic data is a bioacoustic acquired when the patient sleeps, and is necessary from the bioacoustic data acquired under sleep. Bioacoustic data can be extracted.
さらにまた、本発明の第12の形態に係る生体音響抽出装置によれば、元音響データが、患者の睡眠時に集音される睡眠関連音であり、生体音響データが、いびき音のデータであり、前記所定の種別が、いびき音と非いびき音の別とできる。
Furthermore, according to the bioacoustic extraction device according to the twelfth aspect of the present invention, the original acoustic data is sleep-related sounds collected during sleep of the patient, and the bioacoustic data is snoring sound data. The predetermined type can be classified into a snoring sound and a non-snoring sound.
さらにまた、本発明の第13の形態に係る生体音響解析装置によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析装置であって、生体音響データを含む元音響データを取得するための入力部と、前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、前記聴覚像生成部で生成された聴覚像に対して、音響特徴量を抽出する音響特徴量抽出部と、前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部と、前記判別部で生体音響データと判別された真値データに対して、スクリーニングを行うスクリーニング部とを備えることができる。上記構成により、元音響データを聴覚イメージモデルを使用して聴覚像に変換した上で音響特徴量に基づいて分類することで、ノイズと必要な生体音響データとを精度よく判別することができる。
Furthermore, according to the bioacoustic analyzer according to the thirteenth aspect of the present invention, there is provided a bioacoustic analyzer for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data. An input unit for acquiring original acoustic data including bioacoustic data, a voiced section estimating unit for estimating a voiced section from the original acoustic data input from the input unit, and the voiced section estimating unit An auditory image generation unit that generates an auditory image according to an auditory image model based on the estimated voiced section, and an acoustic feature amount extraction that extracts an acoustic feature amount from the auditory image generated by the auditory image generation unit And a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and the acoustic feature amount classified by the classification unit based on a predetermined threshold Determine if it is data A determination section, the true value data determined with the biological sound data in the determination unit may include a screening unit for performing screening. With the above configuration, the original acoustic data is converted into an auditory image using an auditory image model and then classified based on the acoustic feature amount, whereby noise and necessary bioacoustic data can be accurately distinguished.
さらにまた、本発明の第14の形態に係る生体音響解析装置によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析装置であって、生体音響データを含む元音響データを取得するための入力部と、前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、前記聴覚像生成部で生成された聴覚像に対して、聴覚スペクトルを生成する聴覚スペクトル生成部と、聴覚像に対して、総括安定化聴覚像を生成する総括安定化聴覚像生成部と、前記聴覚スペクトル生成部で生成された聴覚スペクトルと、前記総括安定化聴覚像生成部で生成された総括安定化聴覚像から、音響特徴量を抽出する音響特徴量抽出部と、前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部と、前記判別部で生体音響データと判別された真値データに対して、スクリーニングを行うスクリーニング部とを備えることができる。
Furthermore, according to the bioacoustic analyzer according to the fourteenth aspect of the present invention, the bioacoustic analyzer for extracting and analyzing necessary bioacoustic data from the original acoustic data including the bioacoustic data is provided. An input unit for acquiring original acoustic data including bioacoustic data, a voiced section estimating unit for estimating a voiced section from the original acoustic data input from the input unit, and the voiced section estimating unit An auditory image generator that generates an auditory image according to an auditory image model based on the estimated voiced section; an auditory spectrum generator that generates an auditory spectrum for the auditory image generated by the auditory image generator; A generalized stabilized auditory image generating unit that generates a generalized stabilized auditory image for the auditory image, an auditory spectrum generated by the auditory spectrum generating unit, and a generalized auditory image generating unit. An acoustic feature amount extraction unit that extracts an acoustic feature amount from the generalized stabilized auditory image, a classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type, and a classification by the classification unit A discriminating unit that discriminates whether or not the acoustic feature quantity is bioacoustic data based on a predetermined threshold value, and screening that performs screening on true value data discriminated as bioacoustic data by the discriminating unit A portion.
さらにまた、本発明の第15の形態に係る生体音響解析装置によれば、前記スクリーニング部が、元音響データから抽出される生体音響データに対して疾患スクリーニングを行うよう構成できる。
Furthermore, according to the bioacoustic analyzer of the fifteenth aspect of the present invention, the screening unit can be configured to perform disease screening on bioacoustic data extracted from the original acoustic data.
さらにまた、本発明の第16の形態に係る生体音響解析装置によれば、前記スクリーニング部は、元音響データから抽出される生体音響データに対して閉塞型睡眠時無呼吸症候群のスクリーニングを行うよう構成できる。
Furthermore, according to the bioacoustic analyzer of the sixteenth aspect of the present invention, the screening unit performs screening for obstructive sleep apnea syndrome on the bioacoustic data extracted from the original acoustic data. Can be configured.
さらにまた、本発明の第17の形態に係る生体音響抽出方法によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出方法であって、生体音響データを含む元音響データを取得する工程と、前記取得された元音響データから、有音区間を推定する工程と、前記推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する工程と、前記生成された聴覚像に対して、音響特徴量を抽出する工程と、前記抽出された音響特徴量を、所定の種別に分類する工程と、前記分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する工程とを含むことができる。これにより、元音響データを聴覚イメージモデルを使用して聴覚像に変換した上で音響特徴量に基づいて分類することで、ノイズと必要な生体音響データとを精度よく判別することができる。
Furthermore, according to the bioacoustic extraction method according to the seventeenth aspect of the present invention, there is provided a bioacoustic extraction method for extracting necessary bioacoustic data from original acoustic data including bioacoustic data. A step of acquiring original sound data including data, a step of estimating a sound section from the acquired original sound data, and generating an auditory image according to an auditory image model based on the estimated sound section A step of extracting an acoustic feature amount from the generated auditory image, a step of classifying the extracted acoustic feature amount into a predetermined type, and the classified acoustic feature amount. And determining whether or not the data is bioacoustic data based on a predetermined threshold value. Thereby, after converting the original acoustic data into an auditory image using an auditory image model and then classifying based on the acoustic feature amount, it is possible to accurately determine noise and necessary bioacoustic data.
さらにまた、本発明の第18の形態に係る生体音響抽出方法によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出方法であって、生体音響データを含む元音響データを取得する工程と、前記取得された元音響データから、有音区間を推定する工程と、前記推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する工程と、前記安定化聴覚像から、総括安定化聴覚像を生成する工程と、前記生成された総括安定化聴覚像から得られる所定の音響特徴量を抽出する工程と、前記抽出された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する工程とを含むことができる。
Furthermore, according to the bioacoustic extraction method according to the eighteenth aspect of the present invention, there is provided a bioacoustic extraction method for extracting necessary bioacoustic data from original acoustic data including bioacoustic data. A step of acquiring original sound data including data, a step of estimating a sound section from the acquired original sound data, and a stabilized auditory image according to an auditory image model based on the estimated sound section. A step of generating, a step of generating an overall stabilized auditory image from the stabilized auditory image, a step of extracting a predetermined acoustic feature obtained from the generated overall stabilized auditory image, and the extracted A step of determining whether or not the acoustic feature value is bioacoustic data based on a predetermined threshold value.
さらにまた、本発明の第19の形態に係る生体音響抽出方法によれば、前記安定化聴覚像から、聴覚スペクトルを生成すると共に、前記所定の音響特徴量を抽出する工程において、前記総括安定化聴覚像に加え、前記生成された聴覚スペクトルから得られる所定の音響特徴量を抽出することができる。
Furthermore, according to the bioacoustic extraction method of the nineteenth aspect of the present invention, in the step of generating an auditory spectrum from the stabilized auditory image and extracting the predetermined acoustic feature amount, the overall stabilization In addition to the auditory image, a predetermined acoustic feature amount obtained from the generated auditory spectrum can be extracted.
さらにまた、本発明の第20の形態に係る生体音響抽出方法によれば、さらに、前記所定の音響特徴量を抽出する工程に先立ち、前記抽出された音響特徴量から、識別に寄与する音響特徴量を選択する工程を含むことができる。
Furthermore, according to the bioacoustic extraction method according to the twentieth aspect of the present invention, the acoustic features that contribute to the identification from the extracted acoustic feature amounts prior to the step of extracting the predetermined acoustic feature amount. A step of selecting an amount can be included.
さらにまた、本発明の第21の形態に係る生体音響抽出方法によれば、前記生体音響データか否かを判別する工程を、多項分布ロジスティック回帰分析を用いたいびき音又は非いびき音の分類とできる。
Furthermore, according to the bioacoustic extraction method according to the twenty-first aspect of the present invention, the step of determining whether or not the bioacoustic data is a classification of a snoring sound or a non-snoring sound using a multinomial distribution logistic regression analysis it can.
さらにまた、本発明の第22の形態に係る生体音響解析方法によれば、生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析方法であって、生体音響データを含む元音響データを取得する工程と、前記取得された元音響データから、有音区間を推定する工程と、前記推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する工程と、前記安定化聴覚像から、聴覚スペクトル及び総括安定化聴覚像を生成する工程と、前記生成された聴覚スペクトル及び総括安定化聴覚像から得られる所定の音響特徴量を抽出する工程と、前記抽出された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する工程と、前記判別工程で生体音響データと判別された真値データに対して、スクリーニングを行う工程とを含むことができる。
Furthermore, the bioacoustic analysis method according to the twenty-second aspect of the present invention is a bioacoustic analysis method for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data. A step of acquiring original sound data including bioacoustic data; a step of estimating a sound section from the acquired original sound data; and stabilization according to an auditory image model based on the estimated sound section. A step of generating an auditory image, a step of generating an auditory spectrum and an overall stabilized auditory image from the stabilized auditory image, and a predetermined acoustic feature obtained from the generated auditory spectrum and the overall stabilized auditory image. A step of extracting, a step of determining whether or not the extracted acoustic feature quantity is bioacoustic data based on a predetermined threshold value, and a step of determining the bioacoustic data in the determining step. And the true value data may include the step of screening.
さらにまた、本発明の第23の形態に係る生体音響解析方法によれば、前記スクリーニングを行う工程を、多項分布ロジスティック回帰分析を用いた閉塞型睡眠時無呼吸症候群又は非閉塞型睡眠時無呼吸症候群のスクリーニングとできる。
Furthermore, according to the bioacoustic analysis method according to the twenty-third aspect of the present invention, the step of performing the screening comprises an obstructive sleep apnea syndrome or a non-obstructive sleep apnea using a multinomial logistic regression analysis. Can be screened for syndrome.
さらにまた、本発明の第24の形態に係る生体音響抽出プログラムによれば、生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出プログラムであって、生体音響データを含む元音響データを取得するための入力機能と、前記入力機能で入力された元音響データから、有音区間を推定する有音区間推定機能と、前記有音区間推定機能で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成機能と、前記聴覚像生成機能で生成された聴覚像に対して、音響特徴量を抽出する音響特徴量抽出機能と、前記音響特徴量抽出機能で抽出された音響特徴量を、所定の種別に分類する分類機能と、前記分類機能で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別機能とをコンピュータに実現させることができる。上記構成により、元音響データを聴覚像を用いて聴覚像に変換した上で音響特徴量に基づいて分類することで、ノイズと必要な生体音響データとを精度よく判別することができる。
The bioacoustic extraction program according to the twenty-fourth aspect of the present invention is a bioacoustic extraction program for extracting necessary bioacoustic data from original acoustic data including bioacoustic data, An input function for acquiring original sound data including data, a sound section estimation function for estimating a sound section from the original sound data input by the input function, and the sound section estimation function An auditory image generation function for generating an auditory image according to an auditory image model based on a voiced section; an acoustic feature amount extraction function for extracting an acoustic feature amount from the auditory image generated by the auditory image generation function; A classification function for classifying the acoustic feature quantity extracted by the acoustic feature quantity extraction function into a predetermined type, and a biological sound based on a predetermined threshold with respect to the acoustic feature quantity classified by the classification function A discrimination function of discriminating whether the data or not can be realized on the computer. With the above configuration, the original acoustic data is converted into an auditory image using an auditory image and then classified based on the acoustic feature amount, whereby noise and necessary bioacoustic data can be accurately distinguished.
さらにまた、本発明の第25の形態に係る生体音響抽出プログラムによれば、生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出プログラムであって、生体音響データを含む元音響データを取得するための入力機能と、前記入力機能で入力された元音響データから、有音区間を推定する有音区間推定機能と、前記有音区間推定機能で推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する安定化聴覚像生成機能と、前記安定化聴覚像から、総括安定化聴覚像を生成する機能と、前記生成された総括安定化聴覚像に対して、所定の音響特徴量を抽出する音響特徴量抽出機能と、前記音響特徴量抽出機能で抽出された所定の音響特徴量を、所定の種別に分類する分類機能と、前記分類機能で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別機能とをコンピュータに実現させることができる。
Furthermore, according to the bioacoustic extraction program according to the twenty-fifth aspect of the present invention, there is provided a bioacoustic extraction program for extracting necessary bioacoustic data from original acoustic data including bioacoustic data, An input function for acquiring original sound data including data, a sound section estimation function for estimating a sound section from the original sound data input by the input function, and the sound section estimation function A stabilized auditory image generation function that generates a stabilized auditory image according to an auditory image model based on a voiced section, a function that generates a general stabilized auditory image from the stabilized auditory image, and the generated general stable image Function for extracting a predetermined acoustic feature amount from the auditory auditory image, and a classification function for classifying the predetermined acoustic feature amount extracted by the acoustic feature amount extraction function into a predetermined type , To the acoustic feature quantity classified in the classifier, and a determination function of determining whether the biometric acoustic data can be implemented in a computer based on a predetermined threshold value.
さらにまた、本発明の第26の形態に係る生体音響解析プログラムによれば、生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析プログラムであって、生体音響データを含む元音響データを取得するための入力機能と、前記入力機能で入力された元音響データから、有音区間を推定する有音区間推定機能と、前記有音区間推定機能で推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する安定化聴覚像生成機能と、前記安定化聴覚像から、総括安定化聴覚像を生成する機能と、前記生成された総括安定化聴覚像に対して、所定の音響特徴量を抽出する音響特徴量抽出機能と、前記音響特徴量抽出機能で抽出された所定の音響特徴量を、所定の種別に分類する分類機能と、前記分類機能で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別機能と、前記判別機能で生体音響データと判別された真値データに対して、スクリーニングを行う機能とをコンピュータに実現させることができる。
The bioacoustic analysis program according to the twenty-sixth aspect of the present invention is a bioacoustic analysis program for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data. An input function for acquiring original sound data including bioacoustic data, a sound section estimation function for estimating a sound section from the original sound data input by the input function, and the sound section estimation function A stabilized auditory image generation function that generates a stabilized auditory image according to an auditory image model based on the estimated voiced section, a function that generates a generalized stabilized auditory image from the stabilized auditory image, and the generated The acoustic feature extraction function for extracting a predetermined acoustic feature from the overall stabilized auditory image, and the predetermined acoustic feature extracted by the acoustic feature extraction function are classified into predetermined types. A discrimination function for discriminating whether or not it is bioacoustic data based on a predetermined threshold with respect to the acoustic features classified by the classification function, and a true value discriminated as bioacoustic data by the discrimination function The function of screening data can be realized by a computer.
さらにまた、本発明の第27の形態に係るコンピュータで読み取り可能な記録媒体又は記録した機器は、上記プログラムを格納したものである。記録媒体には、CD-ROM、CD-R、CD-RWやフレキシブルディスク、磁気テープ、MO、DVD-ROM、DVD-RAM、DVD-R、DVD+R、DVD-RW、DVD+RW、Blu-ray(登録商標)等の磁気ディスク、光ディスク、光磁気ディスク、半導体メモリその他のプログラムを格納可能な媒体が含まれる。またプログラムには、上記記録媒体に格納されて配布されるものの他、インターネット等のネットワーク回線を通じてダウンロードによって配布される形態のものも含まれる。さらに記録媒体にはプログラムを記録可能な機器、例えば上記プログラムがソフトウェアやファームウェア等の形態で実行可能な状態に実装された汎用もしくは専用機器を含む。さらにまたプログラムに含まれる各処理や機能は、コンピュータで実行可能なプログラムソフトウエアにより実行してもよいし、各部の処理を所定のゲートアレイ(FPGA、ASIC)等のハードウエア、又はプログラムソフトウエアとハードウェアの一部の要素を実現する部分的ハードウエアモジュールとが混在する形式で実現してもよい。
Furthermore, a computer-readable recording medium or recorded device according to the twenty-seventh aspect of the present invention stores the above program. CD-ROM, CD-R, CD-RW, flexible disk, magnetic tape, MO, DVD-ROM, DVD-RAM, DVD-R, DVD + R, DVD-RW, DVD + RW, Blu-ray (registered) (Trademark) and other magnetic disks, optical disks, magneto-optical disks, semiconductor memories, and other media that can store programs. The program includes a program distributed in a download manner through a network line such as the Internet, in addition to a program stored and distributed in the recording medium. Further, the recording medium includes a device capable of recording the program, for example, a general purpose or dedicated device in which the program is implemented in a state where the program can be executed in the form of software, firmware, or the like. Furthermore, each process and function included in the program may be executed by computer-executable program software, or each part of the process or hardware may be executed by hardware such as a predetermined gate array (FPGA, ASIC), or program software. And a partial hardware module that realizes a part of hardware elements may be mixed.
以下、本発明の実施の形態を図面に基づいて説明する。ただし、以下に示す実施の形態は、本発明の技術思想を具体化するための生体音響抽出装置、生体音響解析装置、生体音響抽出プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器を例示するものであって、本発明は生体音響抽出装置、生体音響解析装置、生体音響抽出プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器を以下のものに特定しない。また、本明細書は特許請求の範囲に示される部材を、実施の形態の部材に特定するものでは決してない。特に実施の形態に記載されている構成部品の寸法、材質、形状、その相対的配置等は特定的な記載がない限りは、本発明の範囲をそれのみに限定する趣旨ではなく、単なる説明例にすぎない。なお、各図面が示す部材の大きさや位置関係等は、説明を明確にするため誇張していることがある。さらに以下の説明において、同一の名称、符号については同一もしくは同質の部材を示しており、詳細説明を適宜省略する。さらに、本発明を構成する各要素は、複数の要素を同一の部材で構成して一の部材で複数の要素を兼用する態様としてもよいし、逆に一の部材の機能を複数の部材で分担して実現することもできる。
(生体音響抽出装置100) Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below exemplify a bioacoustic extraction device, a bioacoustic analysis device, a bioacoustic extraction program, a computer-readable recording medium, and a recorded device for embodying the technical idea of the present invention. In the present invention, the bioacoustic extraction device, the bioacoustic analysis device, the bioacoustic extraction program, the computer-readable recording medium, and the recorded device are not specified as follows. Further, the present specification by no means specifies the members shown in the claims to the members of the embodiments. In particular, the dimensions, materials, shapes, relative arrangements, and the like of the components described in the embodiments are not intended to limit the scope of the present invention only unless otherwise specified, and are merely illustrative examples. Only. Note that the size, positional relationship, and the like of the members shown in each drawing may be exaggerated for clarity of explanation. Furthermore, in the following description, the same name and symbol indicate the same or the same members, and detailed description thereof will be omitted as appropriate. Furthermore, each element constituting the present invention may be configured such that a plurality of elements are constituted by the same member and the plurality of elements are shared by one member, and conversely, the function of one member is constituted by a plurality of members. It can also be realized by sharing.
(Bioacoustic extraction apparatus 100)
(生体音響抽出装置100) Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiments described below exemplify a bioacoustic extraction device, a bioacoustic analysis device, a bioacoustic extraction program, a computer-readable recording medium, and a recorded device for embodying the technical idea of the present invention. In the present invention, the bioacoustic extraction device, the bioacoustic analysis device, the bioacoustic extraction program, the computer-readable recording medium, and the recorded device are not specified as follows. Further, the present specification by no means specifies the members shown in the claims to the members of the embodiments. In particular, the dimensions, materials, shapes, relative arrangements, and the like of the components described in the embodiments are not intended to limit the scope of the present invention only unless otherwise specified, and are merely illustrative examples. Only. Note that the size, positional relationship, and the like of the members shown in each drawing may be exaggerated for clarity of explanation. Furthermore, in the following description, the same name and symbol indicate the same or the same members, and detailed description thereof will be omitted as appropriate. Furthermore, each element constituting the present invention may be configured such that a plurality of elements are constituted by the same member and the plurality of elements are shared by one member, and conversely, the function of one member is constituted by a plurality of members. It can also be realized by sharing.
(Bioacoustic extraction apparatus 100)
以下、生体音響抽出装置の一例として、元音響データとして睡眠関連音から、抽出対象の生態音響データとしていびき音を自動抽出する生体音響抽出装置について説明する。本発明の一実施の形態に係る生体音響抽出装置を図1のブロック図に示す。この図に示す生体音響抽出装置100は、入力部10と、有音区間推定部20と、聴覚像生成部30と、音響特徴量抽出部40と、分類部50と、判別部60を備える。
Hereinafter, as an example of a bioacoustic extraction apparatus, a bioacoustic extraction apparatus that automatically extracts snoring sound as ecological acoustic data to be extracted from sleep-related sound as original acoustic data will be described. A bioacoustic extraction apparatus according to an embodiment of the present invention is shown in the block diagram of FIG. The bioacoustic extraction apparatus 100 shown in this figure includes an input unit 10, a sound section estimation unit 20, an auditory image generation unit 30, an acoustic feature amount extraction unit 40, a classification unit 50, and a determination unit 60.
入力部10は、生体音響データを含む元音響データを取得するための部材である。入力部10は、マイク部と、プレアンプ部を備えており、生体音響抽出装置100を構成するコンピュータに収集した元音響データを入力している。マイク部には、好ましくは検査対象の患者と非接触に設置される非接触式マイクロフォンが利用できる。
The input unit 10 is a member for acquiring original acoustic data including bioacoustic data. The input unit 10 includes a microphone unit and a preamplifier unit, and inputs the collected original sound data to a computer constituting the bioacoustic extraction device 100. A non-contact microphone that is preferably installed in a non-contact manner with the patient to be examined can be used for the microphone section.
有音区間推定部20は、入力部10から入力された元音響データから、有音区間を推定するための部材である。有音区間推定部20は、図2のブロック図に示すように、元音響データを微分又は差分して前処理するための前処理器21と、前処理器21で前処理された前処理データを二乗するための二乗器22と、二乗器22で二乗された二乗データをダウンサンプリングするためのダウンサンプリング器23と、ダウンサンプリング器23でダウンサンプリングされたダウンサンプリングデータから中央値を取得するためのメディアンフィルタ24とを備える。
The voiced section estimation unit 20 is a member for estimating a voiced section from the original acoustic data input from the input unit 10. As shown in the block diagram of FIG. 2, the voiced section estimation unit 20 performs pre-processing by differentiating or subtracting the original sound data and pre-processing data pre-processed by the pre-processing unit 21. To obtain a median value from the downsampled data downsampled by the downsampler 23, a downsampler 23 for downsampling the squared data squared by the squarer 22, Median filter 24.
聴覚像生成部30は、有音区間推定部20で推定された有音区間に基づいて、確立された聴覚イメージモデル(AIM)に従い聴覚像を生成するための部材である。
The auditory image generation unit 30 is a member for generating an auditory image according to the established auditory image model (AIM) based on the voiced section estimated by the voiced section estimation unit 20.
音響特徴量抽出部40は、聴覚像生成部30で生成された聴覚像に対して、特徴量を抽出するための部材である。音響特徴量抽出部40は、安定化聴覚像(Stabilized auditory image:SAI)を横軸方向に同期加算して生成される聴覚スペクトル(AS)と、SAIを縦軸方向に同期加算して生成される総括安定化聴覚像(SSAI)に基づいて、特徴量を抽出することができる。具体的には、音響特徴量抽出部40は聴覚スペクトルの尖度、歪度、スペクトル重心、スペクトルバンド幅、スペクトル フラットネス、スペクトルロールオフ、スペクトルエントロピー、OSCの少なくともいずれかを特徴量として抽出する。また音響特徴量抽出部40が、音響スペクトルから得られる特徴量として、ピークの総数、出現位置、振幅、重心、傾斜、増加、減少の少なくともいずれかを抽出することもできる。
The acoustic feature amount extraction unit 40 is a member for extracting feature amounts from the auditory image generated by the auditory image generation unit 30. The acoustic feature amount extraction unit 40 is generated by synchronously adding an auditory spectrum (AS) generated by synchronously adding a stabilized auditory image (Stabilized auditory image: SAI) in the horizontal axis direction and SAI in the vertical axis direction. The feature amount can be extracted based on the generalized stabilized auditory image (SSAI). Specifically, the acoustic feature quantity extraction unit 40 extracts at least one of kurtosis, distortion, spectrum centroid, spectrum bandwidth, spectrum flatness, spectrum roll-off, spectrum entropy, and OSC of the auditory spectrum as a feature quantity. . The acoustic feature quantity extraction unit 40 can also extract at least one of the total number of peaks, the appearance position, the amplitude, the center of gravity, the inclination, the increase, and the decrease as the feature quantity obtained from the acoustic spectrum.
分類部50は、音響特徴量抽出部40で抽出された特徴量を、所定の種別に分類するための部材である。
The classification unit 50 is a member for classifying the feature amount extracted by the acoustic feature amount extraction unit 40 into a predetermined type.
判別部60は、分類部50で分類された特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別するための部材である。
The discriminating unit 60 is a member for discriminating whether or not the feature quantity classified by the classifying unit 50 is bioacoustic data based on a predetermined threshold value.
このようにして、ヒトの聴覚経路から学習機構に至るまでをシミュレーションする生体音響抽出装置100を構築することによって、高精度にいびき音を自動抽出することが可能となる。
(生体音響解析装置110) In this way, it is possible to automatically extract a snoring sound with high accuracy by constructing thebioacoustic extraction device 100 that simulates from the human auditory pathway to the learning mechanism.
(Bioacoustic analyzer 110)
(生体音響解析装置110) In this way, it is possible to automatically extract a snoring sound with high accuracy by constructing the
(Bioacoustic analyzer 110)
さらに、生体音響抽出装置で抽出された生体音響データを解析するための生体音響解析装置を構成することもできる。生体音響解析装置110は、さらに判別部60で生体音響データと判別された真値データに対して、スクリーニングを行うスクリーニング部70を備えている。
Furthermore, a bioacoustic analysis device for analyzing bioacoustic data extracted by the bioacoustic extraction device can also be configured. The bioacoustic analysis apparatus 110 further includes a screening unit 70 that performs screening on true value data determined by the determination unit 60 as bioacoustic data.
以上の生体音響抽出装置や生体音響解析装置は、専用のハードウェアで構成する他、プログラムでソフトウェア的に実現することもできる。例えば、汎用あるいは専用のコンピュータに生体音響抽出プログラムあるいは生体音響解析プログラムをインストールし、ロードして、又はダウンロードして実行することで、仮想的な生体音響抽出装置や生体音響解析装置を実現することもできる。
(従来のいびき音の音響解析) The bioacoustic extraction device and the bioacoustic analysis device described above can be implemented as software by a program in addition to being configured by dedicated hardware. For example, by installing a bioacoustic extraction program or a bioacoustic analysis program on a general-purpose or dedicated computer, loading or downloading and executing it, a virtual bioacoustic extraction device or bioacoustic analysis device is realized. You can also.
(Acoustic analysis of conventional snoring sound)
(従来のいびき音の音響解析) The bioacoustic extraction device and the bioacoustic analysis device described above can be implemented as software by a program in addition to being configured by dedicated hardware. For example, by installing a bioacoustic extraction program or a bioacoustic analysis program on a general-purpose or dedicated computer, loading or downloading and executing it, a virtual bioacoustic extraction device or bioacoustic analysis device is realized. You can also.
(Acoustic analysis of conventional snoring sound)
近年、非接触マイクロフォンを用いた、いびき音の音響解析が行われている。それらの研究では、睡眠時に録音される音(睡眠関連音)からいびき音のみを抽出する必要があるため、様々な自動いびき音抽出法が提案されている。例えば、
(i)メル周波数ケプストラム係数(Mel-frequency cepstral coefficients:MFCC)と隠れマルコフモデル(Hidden Markov model:HMM)を相互接続したネットワークを利用した方法、
(ii)サブバンドスペクトルエネルギー、ロバスト線形回帰(Robust linear regression:RLR)や主成分分析(Principal component analysis:PCA)を利用した方法、
(iii)サブバンドエネルギー分布、PCA、教師なしFuzzy C-Means(FCM)クラスタリングを利用した方法、
(iv)複数の音響解析手法を組み合わせた34個の特徴量とAda Boostを利用した方法
等が提案されている。 In recent years, acoustic analysis of snoring sound using a non-contact microphone has been performed. In those studies, since it is necessary to extract only the snoring sound from the sound recorded during sleep (sleep-related sound), various automatic snoring sound extraction methods have been proposed. For example,
(I) a method using a network in which Mel-frequency cepstral coefficients (MFCC) and a hidden Markov model (HMM) are interconnected;
(Ii) Subband spectral energy, a method using robust linear regression (RLR) or principal component analysis (PCA),
(Iii) A method using subband energy distribution, PCA, unsupervised Fuzzy C-Means (FCM) clustering,
(Iv) 34 feature amounts combining a plurality of acoustic analysis methods and a method using Ada Boost have been proposed.
(i)メル周波数ケプストラム係数(Mel-frequency cepstral coefficients:MFCC)と隠れマルコフモデル(Hidden Markov model:HMM)を相互接続したネットワークを利用した方法、
(ii)サブバンドスペクトルエネルギー、ロバスト線形回帰(Robust linear regression:RLR)や主成分分析(Principal component analysis:PCA)を利用した方法、
(iii)サブバンドエネルギー分布、PCA、教師なしFuzzy C-Means(FCM)クラスタリングを利用した方法、
(iv)複数の音響解析手法を組み合わせた34個の特徴量とAda Boostを利用した方法
等が提案されている。 In recent years, acoustic analysis of snoring sound using a non-contact microphone has been performed. In those studies, since it is necessary to extract only the snoring sound from the sound recorded during sleep (sleep-related sound), various automatic snoring sound extraction methods have been proposed. For example,
(I) a method using a network in which Mel-frequency cepstral coefficients (MFCC) and a hidden Markov model (HMM) are interconnected;
(Ii) Subband spectral energy, a method using robust linear regression (RLR) or principal component analysis (PCA),
(Iii) A method using subband energy distribution, PCA, unsupervised Fuzzy C-Means (FCM) clustering,
(Iv) 34 feature amounts combining a plurality of acoustic analysis methods and a method using Ada Boost have been proposed.
これらの方法に関する報告例によれば、いびき音と非いびき音とを高精度に自動分類できると示されている。しかしながら、これらの方法では幾つかの信号処理技術を用いて特徴量を抽出する必要があった。一般的に、音の分類法の性能評価は、ゴールド・スタンダードな手法と考えられるマニュアル分類、すなわち人の耳による手作業の分類に基づく。このことから、本願発明者らは、ヒトの聴覚能力を模倣することにより、高性能の音の分類器を構成できるとの考え、本発明を成すに至った。具体的には、聴覚イメージモデル(Auditory Image Model:AIM)を用いて、自動的にいびき音/非いびき音を分類可能な生体音響抽出装置を成すに至った。AIMは、ヒトが音を知覚するときに使う脳内表現と思われる「聴覚像」を模した聴覚イメージモデルである。具体的には、AIMは蝸牛基底膜を含むヒトの聴覚の末梢系から中枢系に到る機能を模擬した聴覚像のモデルである。このようなAIMは、主に聴覚や音声言語知覚の研究において確立され、話者認識や音声認識等の分野で利用されているものの、いびき音や腸音のような生体音響の判別に用いられた報告例は本願発明者らの知る限り存在しない。
(実施例) According to the report example regarding these methods, it is shown that the snoring sound and the non-snoring sound can be automatically classified with high accuracy. However, in these methods, it is necessary to extract feature amounts using several signal processing techniques. In general, the performance evaluation of sound classification methods is based on manual classification, which is considered a gold standard technique, that is, classification of manual work by human ears. Therefore, the present inventors have thought that a high-performance sound classifier can be constructed by imitating human hearing ability, and have achieved the present invention. Specifically, a bioacoustic extraction device that can automatically classify snoring sounds / non-snoring sounds using an auditory image model (AIM) has been achieved. AIM is an auditory image model that imitates an “auditory image” that is considered to be an expression in the brain that humans use to perceive sound. Specifically, AIM is a model of an auditory image that simulates the function from the peripheral system of the human auditory system including the cochlear basement membrane to the central system. Such AIM has been established mainly in research on hearing and spoken language perception, and is used in the field of speaker recognition and speech recognition, but is used to discriminate bioacoustics such as snoring sounds and intestinal sounds. There are no reported examples as far as the present inventors know.
(Example)
(実施例) According to the report example regarding these methods, it is shown that the snoring sound and the non-snoring sound can be automatically classified with high accuracy. However, in these methods, it is necessary to extract feature amounts using several signal processing techniques. In general, the performance evaluation of sound classification methods is based on manual classification, which is considered a gold standard technique, that is, classification of manual work by human ears. Therefore, the present inventors have thought that a high-performance sound classifier can be constructed by imitating human hearing ability, and have achieved the present invention. Specifically, a bioacoustic extraction device that can automatically classify snoring sounds / non-snoring sounds using an auditory image model (AIM) has been achieved. AIM is an auditory image model that imitates an “auditory image” that is considered to be an expression in the brain that humans use to perceive sound. Specifically, AIM is a model of an auditory image that simulates the function from the peripheral system of the human auditory system including the cochlear basement membrane to the central system. Such AIM has been established mainly in research on hearing and spoken language perception, and is used in the field of speaker recognition and speech recognition, but is used to discriminate bioacoustics such as snoring sounds and intestinal sounds. There are no reported examples as far as the present inventors know.
(Example)
本発明の有効性を確認するため、40名の被験者から得られた睡眠関連音の大規模データベースを用いて確認を行った。図3に、本実施の形態に係るAIMを用いた生体音響抽出方法のフローチャートを示す。
(有音区間の推定) In order to confirm the effectiveness of the present invention, confirmation was performed using a large-scale database of sleep-related sounds obtained from 40 subjects. FIG. 3 shows a flowchart of the bioacoustic extraction method using the AIM according to the present embodiment.
(Sound section estimation)
(有音区間の推定) In order to confirm the effectiveness of the present invention, confirmation was performed using a large-scale database of sleep-related sounds obtained from 40 subjects. FIG. 3 shows a flowchart of the bioacoustic extraction method using the AIM according to the present embodiment.
(Sound section estimation)
まず、ステップS301において睡眠関連音を収集し、次にステップS302において有音区間の推定を行う。ここでは阿南共栄病院(徳島県阿南市羽ノ浦町中庄蔵ノホケ-36)の協力を得て、終夜睡眠ポリグラフ(Polysomnography:PSG)の検査中に、入力部10を用いて、患者から6時間の間、睡眠関連音を録音した。この入力部10は、マイク部の一形態である非接触式マイクロフォンと、プレアンプ部を備え、得られた音声データをコンピュータで収集している。ここで非接触式マイクロフォンは、患者の口から約50cm離れた位置に設置した。録音に用いたマイクロフォンはオーストラリアRODE社製ModelNT3で、プリアンプは米国M-AUDIO社製Mobile-Pre USBで、録音時のサンプリング周波数は44.1kHz、デジタル分解能は16bits/sampleとした。
First, sleep-related sounds are collected in step S301, and then a sound section is estimated in step S302. Here, with the cooperation of Anan Kyoei Hospital (Hanaura Nakanosho-no-ho-36, Ano-city, Anan-city, Tokushima Prefecture), during an overnight polysomnography (PSG) examination, using the input unit 10 for 6 hours from the patient Recorded sleep-related sounds. The input unit 10 includes a non-contact type microphone that is a form of a microphone unit and a preamplifier unit, and collects the obtained audio data by a computer. Here, the non-contact type microphone was installed at a position about 50 cm away from the patient's mouth. The microphone used for recording was Model NT3 manufactured by Australia RODE, the preamplifier was Mobile-Pre USB manufactured by M-AUDIO, USA, the sampling frequency during recording was 44.1 kHz, and the digital resolution was 16 bits / sample.
このようにして録音した睡眠関連音から、有音区間推定部20を用いて、ステップS302において有音区間(Audio events:AE)を検出する。有音区間推定部20は、短期エネルギー法(Short-Term Energy:STE)及びメディアンフィルタ24を用いている。STE法は、ある一定の閾値(しきいち)以上の信号エネルギを有音区間として検出する方法である。ここで、睡眠関連音s(n)のk番目の短期エネルギーEkは次式で表すことができる。
From the sleep-related sound recorded in this way, the sound section (Audio events: AE) is detected in step S302 using the sound section estimation unit 20. The voiced section estimation unit 20 uses a short-term energy method (STE) and a median filter 24. The STE method is a method of detecting signal energy equal to or higher than a certain threshold value as a sound section. Here, the k-th short-term energy Ek of the sleep-related sound s (n) can be expressed by the following equation.
上式において、nはサンプル番号、Nはセグメント長である。実施例においては、睡眠関連音s(n)をN=4096、シフト幅1024でセグメントに分割して、k番目のセグメントにおける信号エネルギーを計算した。またEkの平滑化を行うために、10次のメディアンフィルタを用いた。
In the above formula, n is the sample number and N is the segment length. In the example, the sleep-related sound s (n) was divided into segments with N = 4096 and a shift width of 1024, and the signal energy in the kth segment was calculated. A 10th-order median filter was used to smooth Ek.
さらに実施例においては、セグメントにおけるSNRが5dB以上の音を検出することで、AEを抽出する。ここで、SNRの計算時において、背景雑音とは、1秒間の背景雑音のみの信号からSTE法を行った短期エネルギーの全フレーム平均値として用いている。
Further, in the embodiment, the AE is extracted by detecting a sound having an SNR of 5 dB or more in the segment. Here, in calculating the SNR, the background noise is used as an average value of all frames of short-term energy obtained by performing the STE method from a signal of only background noise for one second.
なお、非特許文献2によれば、歌声と音声の識別と音の継続時間の関係性を調査した聴取実験において、信号長が200ms以上で識別率が70%を超えると報告されている。これに従い、本実施例では信号長が200ms以上の検出音をAEと定義する。
(聴覚イメージモデルの生成) According toNon-Patent Document 2, it is reported that in a listening experiment in which the relationship between singing voice and voice identification and sound duration is investigated, the signal length is 200 ms or more and the identification rate exceeds 70%. Accordingly, in this embodiment, a detection sound having a signal length of 200 ms or more is defined as AE.
(Generation of auditory image model)
(聴覚イメージモデルの生成) According to
(Generation of auditory image model)
次に、ステップS303において聴覚イメージモデル(Auditory Image Model:AIM)を用いて聴覚像を生成する。ここでは、聴覚像生成部30が、AIMを用いて有音区間(AE)を解析する。非特許文献3に示す通り、パターソングループによりAIMのシミュレータが提供されている。シミュレータはC言語の環境でも動作できるようになっているが、本実施例ではMATLABに上で使用することができるAIM2006<http://www.pdn.cam.ac.uk/groups/cnbh/aim2006/>(モジュール;gm2002、dcgc、hl、sf2003、ti2003)を用いた。利用できる主要な5つのステージとして、前蝸牛過程(Pre-cochlea processing:PCP)、基底膜振動(Basilar membrane motion:BMM)、神経活動パターン(Neural activity pattern:NAP)、ストローブ同定(strobe identification:STROBES)、安定化聴覚像(SAI)が挙げられる。これらのプロセスを経て、入力音を聴覚像として出力することが可能となる。
Next, in step S303, an auditory image is generated using an auditory image model (Auditory Image Model: AIM). Here, the auditory image generation unit 30 analyzes a voiced section (AE) using AIM. As shown in Non-Patent Document 3, an AIM simulator is provided by the Patterson group. Although the simulator can operate in a C language environment, in this embodiment, AIM 2006 <http://www.pdn.cam.ac.uk/groups/cnbh/aim2006 which can be used for MATLAB. /> (Module: gm2002, dcgc, hl, sf2003, ti2003) was used. The five main stages available are Pre-cochlea processing (PCP), Basilar membrane motion (BMM), Neural activity pattern (NAP), Strobe identification (STROBES) ), Stabilized auditory image (SAI). Through these processes, the input sound can be output as an auditory image.
AIMの処理の一例を図4のブロック図に示す。まず前蝸牛過程(Pre-cochlea processing:PCP)のステージでは、内耳の前庭窓までの応答特性を表現するために、バンドパスフィルタによるフィルタ処理が行われる。
An example of AIM processing is shown in the block diagram of FIG. First, in the stage of pre-cochlea processing (PCP), in order to express the response characteristics of the inner ear up to the vestibule window, filter processing using a band pass filter is performed.
基底膜振動(Basilar membrane motion:BMM)のステージでは、蝸牛の基底膜 において行われるスペクトル解析を表現するために、等価矩形帯域幅(Equivalent Rectangular Bandwidth:ERB)のように、フィルタが等間隔に並ぶ聴覚フィルタバンク(ガンマーチャープフィルタバンク、ガンマートーンフィルタバンク)が用いられる。BMMのステージでは、フィルタバンクの各フィルタからの出力を得ることができる。本実施例では、100Hz~6000Hzの間で、場所ごとに中心周波数と帯域幅が異なるフィルタが50個並んでいるガンマーチャープフィルタバンクを使用する。なお、使用するフィルタの数は適宜調整可能としてもよい。
At the basement membrane motion (BMM) stage, filters are arranged at regular intervals, like an equivalent rectangular bandwidth (ERB), to represent the spectral analysis performed in the basement membrane of the cochlea. Auditory filter banks (gamma-chirp filter bank, gamma tone filter bank) are used. At the BMM stage, the output from each filter in the filter bank can be obtained. In the present embodiment, a gamma chirp filter bank is used in which 50 filters having different center frequencies and bandwidths are arranged for each location between 100 Hz and 6000 Hz. Note that the number of filters to be used may be adjusted as appropriate.
神経活動パターン(Neural activity pattern:NAP)のステージでは、内有毛細胞により行われる神経信号変換処理を表現するために、BMMの各フィルタの出力がローパスフィルタリング、半波整流される。
In the neural activity pattern (NAP) stage, the output of each filter of the BMM is low-pass filtered and half-wave rectified to represent the neural signal conversion process performed by the inner hair cells.
ストローブ同定(strobe identification:STROBES)のステージでは、NAPの各フィルタの出力における極大点が適応しきい値処理により検出される。
(安定化聴覚像:SAI) In the stage of strobe identification (STROBES), the maximum point in the output of each filter of NAP is detected by adaptive threshold processing.
(Stabilized auditory image: SAI)
(安定化聴覚像:SAI) In the stage of strobe identification (STROBES), the maximum point in the output of each filter of NAP is detected by adaptive threshold processing.
(Stabilized auditory image: SAI)
さらに安定化聴覚像(Stabilized auditory image:SAI)のステージでは、各周波数チャネルで極大点が検出された時点でその極大点を原点とした35msフレームを作り、過去のNAP表現が記憶されているバッファからの情報と時間積分することで、時間軸を時間間隔軸に変換した聴覚像を生成する。この一連の処理をSTI(Strobed temporal integration)と呼び、聴覚像はSAIとしてフレームごとに出力可能である。STIは時間をかけてNAP表現を時間積分することによって、安定した聴覚像を生成することができる。そのため本実施例では、1エピソードのAEから得られる聴覚像の10フレーム目以降の聴覚スペクトル(Auditory Spectrum:AS)とSSAIを解析対象とする。
(聴覚スペクトル:AS) Furthermore, at the stage of the stabilized auditory image (SAI), when a local maximum point is detected in each frequency channel, a 35 ms frame is created with the local maximum point as the origin, and a buffer storing past NAP expressions is stored. The auditory image is generated by converting the time axis into the time interval axis by integrating with the information from the time. This series of processing is called STI (Strobed temporal integration), and the auditory image can be output as SAI for each frame. STI can generate a stable auditory image by time-integrating the NAP expression over time. Therefore, in the present embodiment, the auditory spectrum (AS) and SSAI of the 10th and subsequent frames of the auditory image obtained from one episode of AE are analyzed.
(Hearing spectrum: AS)
(聴覚スペクトル:AS) Furthermore, at the stage of the stabilized auditory image (SAI), when a local maximum point is detected in each frequency channel, a 35 ms frame is created with the local maximum point as the origin, and a buffer storing past NAP expressions is stored. The auditory image is generated by converting the time axis into the time interval axis by integrating with the information from the time. This series of processing is called STI (Strobed temporal integration), and the auditory image can be output as SAI for each frame. STI can generate a stable auditory image by time-integrating the NAP expression over time. Therefore, in the present embodiment, the auditory spectrum (AS) and SSAI of the 10th and subsequent frames of the auditory image obtained from one episode of AE are analyzed.
(Hearing spectrum: AS)
聴覚像の例を図5に示す。この図に示す聴覚像は、縦軸が聴覚フィルタの中心周波数軸、横軸が時間間隔軸を表す。ここで、聴覚像を横軸方向に同期加算して生成されるスペクトルを聴覚スペクトル(Auditory spectrum:AS)と呼ぶ。ASは聴神経の興奮パターン(Excitation pattern)に相当する表現であり、フォルマントの極大点を確認できる周波数領域のスペクトルである。また、ASの次元数は聴覚フィルタのフィルタ数に対応している。
(総括SAI:SSAI) An example of an auditory image is shown in FIG. In the auditory image shown in this figure, the vertical axis represents the center frequency axis of the auditory filter, and the horizontal axis represents the time interval axis. Here, a spectrum generated by synchronously adding auditory images in the horizontal axis direction is referred to as an auditory spectrum (AS). AS is an expression corresponding to an excitation pattern of the auditory nerve, and is a spectrum in the frequency domain where the maximum point of the formant can be confirmed. The number of AS dimensions corresponds to the number of auditory filters.
(General SAI: SSAI)
(総括SAI:SSAI) An example of an auditory image is shown in FIG. In the auditory image shown in this figure, the vertical axis represents the center frequency axis of the auditory filter, and the horizontal axis represents the time interval axis. Here, a spectrum generated by synchronously adding auditory images in the horizontal axis direction is referred to as an auditory spectrum (AS). AS is an expression corresponding to an excitation pattern of the auditory nerve, and is a spectrum in the frequency domain where the maximum point of the formant can be confirmed. The number of AS dimensions corresponds to the number of auditory filters.
(General SAI: SSAI)
さらに、縦軸方向に同期加算して生成されるスペクトルを総括SAI(Summary SAI:SSAI)と呼ぶ。SSAIは、信号が定常的でかつ周期的な場合、各チャネルの出力は限定された時間間隔のみを含むため、特定の間隔でのみ頂点を持つ時間領域のスペクトルである。またSSAIの次元数は、フレームのサイズ、入力信号のサンプリングレートによって決まる。本実施例では、フレーム間で信号の振幅包絡の影響を最小限にするため、AS、SSAIを最大振幅1で正規化している。
(AIMから得られた音響特徴量) Furthermore, a spectrum generated by synchronous addition in the vertical axis direction is called a summary SAI (Summary SAI). SSAI is a time-domain spectrum that has vertices only at specific intervals because the output of each channel includes only a limited time interval when the signal is stationary and periodic. The number of dimensions of SSAI is determined by the frame size and the sampling rate of the input signal. In the present embodiment, AS and SSAI are normalized with themaximum amplitude 1 in order to minimize the influence of the signal amplitude envelope between frames.
(Acoustic features obtained from AIM)
(AIMから得られた音響特徴量) Furthermore, a spectrum generated by synchronous addition in the vertical axis direction is called a summary SAI (Summary SAI). SSAI is a time-domain spectrum that has vertices only at specific intervals because the output of each channel includes only a limited time interval when the signal is stationary and periodic. The number of dimensions of SSAI is determined by the frame size and the sampling rate of the input signal. In the present embodiment, AS and SSAI are normalized with the
(Acoustic features obtained from AIM)
次にステップS304において、AIMから得られた音響特徴量を抽出する。ここではAEの各SAIフレームにおけるASとSSAIを計算することができる。ここで、ASとSSAIから特徴量を抽出する方法について説明する。ASやSSAIは、スペクトルと類似している形状を有することから、以下の8種類の特徴量を用いて特徴を抽出している。
Next, in step S304, the acoustic feature amount obtained from the AIM is extracted. Here, AS and SSAI in each SAI frame of AE can be calculated. Here, a method for extracting feature amounts from AS and SSAI will be described. Since AS and SSAI have a shape similar to a spectrum, features are extracted using the following eight types of feature amounts.
まず尖度(Kurtosis)は、平均値あたりのスペクトルの突起傾向を測定する特徴量である。尖度の式を次式に示す。
First, Kurtosis is a feature value that measures the tendency of protrusion of the spectrum per average value. The formula for kurtosis is shown below.
次に歪度(Skewness)は、平均値あたりのスペクトルの非対称性を測定する特徴量である。歪度の式を次式に示す。
Next, the skewness is a characteristic amount for measuring the asymmetry of the spectrum per average value. The equation for skewness is shown below.
さらにスペクトル重心(Spectral centroid)は、スペクトルの重心を計算する特徴量である。スペクトル重心の式を次式に示す。
Furthermore, the spectral centroid is a feature quantity for calculating the spectral centroid. The equation of the spectrum centroid is shown below.
スペクトルバンド幅(Spectral bandwidth)は、信号の周波数帯域幅を定量化する特徴量である。スペクトルバンド幅の式を次式に示す。
Spectral bandwidth is a feature quantity that quantifies the frequency bandwidth of a signal. The equation for the spectral bandwidth is shown below.
スペクトルフラットネス(Spectral flatness)は、音質を定量化する特徴量である。スペクトルフラットネスの式を次式に示す。
Spectral flatness is a feature value that quantifies sound quality. The equation for spectral flatness is shown below.
スペクトルロールオフ(Spectral roll-off)は、スペクトル分布の全帯域のc×100%を占める周波数を評価する特徴量である。スペクトルロールオフの式を次式に示す。
Spectral roll-off is a feature value for evaluating a frequency that occupies c × 100% of the entire band of the spectrum distribution. The equation for the spectrum roll-off is shown below.
ここで、X>0、c=0.95である。
Here, X> 0 and c = 0.95.
スペクトルエントロピー(Spectral entropy)は、信号の白色性を示した特徴量である。スペクトルエントロピーの式を次式に示す。
Spectral entropy is a feature that indicates the whiteness of the signal. The equation for spectral entropy is shown below.
ここで、iはスペクトルのサンプル点、Nはスペクトルのサンプル点の総数、kはフレーム番号、Xはスペクトルの振幅とする。ただし、X>0、c=0.95とする。
Here, i is a spectrum sample point, N is a total number of spectrum sample points, k is a frame number, and X is a spectrum amplitude. However, X> 0 and c = 0.95.
オクターブベースのスペクトルコントラスト(Octave-based spectral contrast:OSC)は、スペクトルのコントラストを表現する特徴量である。この手法ではオクターブフィルタバンクによってスペクトルをサブバンドに分割する。本実施例ではスペクトルの次元数を考慮して、サブバンド数をASでは3、SSAIでは5とする。b番目のサブバンドのスペクトルピーク(Spectral peak)Peakk(b)、スペクトルバレー(Spectral valley)Valleyk(b)、スペクトルコントラスト(Spectral contrast)OSCk(b)は、それぞれ次式で示される。
Octave-based spectral contrast (OSC) is a feature quantity that represents spectral contrast. In this method, the spectrum is divided into subbands by an octave filter bank. In this embodiment, the number of subbands is set to 3 for AS and 5 for SSAI in consideration of the number of dimensions of the spectrum. The spectral peak Peak k (b), spectral valley Valley k (b), and spectral contrast OSC k (b) of the b-th subband are respectively expressed by the following equations.
ここで、X’はサブバンド内で降順に並び替えた特徴ベクトル、jはサブバンド内のスペクトルのサンプル点、Nbはサブバンド内のサンプル点の総数、αは安定したピークとバレーの値を抽出するためのパラメータを表す。本実施例ではα=0.2とする。ただし、スペクトルフラットネスに関しては、SSAIを積分する際にSFkの値が限りなく0に近づいてしまい、定量化できなかったため、本実施例ではASのみに適用した。
Where X ′ is a feature vector rearranged in descending order within the subband, j is a sample point of the spectrum within the subband, N b is the total number of sample points within the subband, and α is a stable peak and valley value. Represents a parameter for extracting. In this embodiment, α = 0.2. However, the spectral flatness was applied only to AS in this example because the value of SF k approached zero as much as possible when integrating SSAI and could not be quantified.
上述した特徴量はAEの全フレームから抽出されるため、それぞれの特徴量の平均値および標準偏差を、AEから得られる特徴量として定義する。すなわち、AEから(i)20次元のASの特徴ベクトル、(ii)22次元のSSAIの特徴ベクトル、(iii)42次元の両者の特徴ベクトルを抽出することができる。これらに加えて、スペクトルから得られる特徴量、例えば、スペクトル非対称性(spectral asymmetry)、バンドエネルギー比(band energy ratio)等を用いることもできる。
Since the above-described feature values are extracted from all frames of AE, the average value and standard deviation of each feature value are defined as the feature values obtained from AE. That is, (i) a 20-dimensional AS feature vector, (ii) a 22-dimensional SSAI feature vector, and (iii) a 42-dimensional feature vector can be extracted from the AE. In addition to these, it is also possible to use feature quantities obtained from the spectrum, such as spectral asymmetry, band energy ratio, and the like.
本実施例では、それぞれの特徴ベクトルを(i)ASF:Auditory spectrum features、(ii)SSAIF:Summary SAI features、(iii)AIMF:AIM featuresと呼ぶ。
(MLRを用いたいびき音/非いびき音の分類) In this embodiment, each feature vector is referred to as (i) ASF: Auditory spectrum features, (ii) SSAIF: Summary SAI features, and (iii) AIMF: AIM features.
(Classification of snoring / non-snoring using MLR)
(MLRを用いたいびき音/非いびき音の分類) In this embodiment, each feature vector is referred to as (i) ASF: Auditory spectrum features, (ii) SSAIF: Summary SAI features, and (iii) AIMF: AIM features.
(Classification of snoring / non-snoring using MLR)
さらにステップS306において特徴ベクトルを用いたMLRモデルに基づいて学習を行い、ステップS305でMLRを用いたいびき音/非いびき音の分類を行い、さらにステップS307において閾値に基づく判別を行う。ここでは、分類部50で音響特徴量を所定の種別に分類し、判別部60でいびき音又は非いびき音の判別を行うために、AEから抽出された特徴ベクトルを用いた多項分布ロジスティック回帰(Multi-nomial logistic regression:MLR)分析を用いた。MLR分析は、ロジスティック曲線を利用して複数の測定値を、2つのカテゴリのいずれかに分類する2値識別の識別器として優れた統計的分析手法である。ここでMLRの式を次式に示す。
Further, learning is performed based on the MLR model using the feature vector in step S306, the snoring sound / non-snoring sound classification using the MLR is performed in step S305, and the discrimination based on the threshold is performed in step S307. Here, in order to classify the acoustic feature quantity into a predetermined type by the classifying unit 50 and discriminate the snoring sound or the non-snoring sound by the discriminating unit 60, a multinomial distribution logistic regression using a feature vector extracted from the AE ( Multi-nomial logistic regression (MLR) analysis was used. The MLR analysis is an excellent statistical analysis technique as a discriminator for binary identification that classifies a plurality of measurement values into one of two categories using a logistic curve. Here, the equation of MLR is shown in the following equation.
ここで、pは分類対象となる音がいびき音のカテゴリに分類される確率を示す。またβd=(d=0,1,...,D)は、最尤法(the maximum likelihood method)によって推定されたパラメータである。さらにfd=(d=0,1,...,D)は、独立変数(Independent variables)とされる特徴量スペクトルの値、Dは特徴ベクトルの次元を、それぞれ示す。
Here, p indicates the probability that the sound to be classified is classified into the snoring sound category. Β d = (d = 0, 1,..., D) is a parameter estimated by the maximum likelihood method. Further, f d = (d = 0, 1,..., D) denotes a feature amount spectrum value that is regarded as an independent variable, and D denotes a dimension of the feature vector.
MLR分析では、最尤法に基づく学習により推定されたβdと従属変数(Dependent variable)Yのモデルが構築される。ここでYは、いびき音と相関があれば1(Y=1)に、非いびき音と相関があれば0(Y=0)に近付く。このモデルでテストを実行することで、テストセットfdの独立変数が与えられると、Y=1を得る確率pを推定可能であることが確認された。pの閾値pthreに基づいて、各AEを2つのカテゴリ(いびき音又は非いびき音)のいずれか一方に分類でき、いびき音と非いびき音の分類を分類器で実行できる。このシミュレーションは、MATLAB(R2014a、The MathWorks, Inc., Natick, MA, USA)のStatistics Toolbox Version 9.0を用いて行った。なおOSASスクリーニングの場合は、OSASと相関があれば1(Y=1)に、非OSASと相関があれば0(Y=0)に近付くとして同様に考えることができる。
(AIMを用いた分類器の性能評価) In the MLR analysis, a model of β d estimated by learning based on the maximum likelihood method and a dependent variable Y is constructed. Here, Y approaches 1 (Y = 1) if there is a correlation with the snoring sound, and 0 (Y = 0) if there is a correlation with the non-snoring sound. It was confirmed that by executing the test with this model, it is possible to estimate the probability p of obtaining Y = 1 when an independent variable of the test set f d is given. Based on the threshold value p thre of p, each AE can be classified into one of two categories (snoring sound or non-snoring sound), and the classification of the snoring sound and the non-snoring sound can be performed by the classifier. This simulation was performed using Statistics Toolbox Version 9.0 of MATLAB (R2014a, The MathWorks, Inc., Natick, MA, USA). In the case of OSAS screening, it can be considered in the same way that it approaches 1 (Y = 1) if there is a correlation with OSAS and approaches 0 (Y = 0) if there is a correlation with non-OSAS.
(Performance evaluation of classifier using AIM)
(AIMを用いた分類器の性能評価) In the MLR analysis, a model of β d estimated by learning based on the maximum likelihood method and a dependent variable Y is constructed. Here, Y approaches 1 (Y = 1) if there is a correlation with the snoring sound, and 0 (Y = 0) if there is a correlation with the non-snoring sound. It was confirmed that by executing the test with this model, it is possible to estimate the probability p of obtaining Y = 1 when an independent variable of the test set f d is given. Based on the threshold value p thre of p, each AE can be classified into one of two categories (snoring sound or non-snoring sound), and the classification of the snoring sound and the non-snoring sound can be performed by the classifier. This simulation was performed using Statistics Toolbox Version 9.0 of MATLAB (R2014a, The MathWorks, Inc., Natick, MA, USA). In the case of OSAS screening, it can be considered in the same way that it approaches 1 (Y = 1) if there is a correlation with OSAS and approaches 0 (Y = 0) if there is a correlation with non-OSAS.
(Performance evaluation of classifier using AIM)
次に、聴覚イメージモデル(AIM)を用いた分類器の性能評価を行った。ここでは、分類性能の指標として、感度(Sensitivity)、特異度(Specificity)、精度(Accuracy)、陽性適中度(Positive pre-dictive value:PPV)、陰性適中度(Negative predictive value:NPV)を用いた。
Next, the performance of the classifier using the auditory image model (AIM) was evaluated. Here, sensitivity, specificity, accuracy, positive predictive value (PPV), and negative predictive value (NPV) are used as indicators of classification performance. It was.
ここで感度(Sensitivity)は、判別結果がいびきを検出する能力である。また特異度(Specificity)は、非いびきのうち、判定結果が閾値以下になる割合である。また陽性適中度(Positive predictive value:PPV)は、判定結果が閾値以上のとき、実際にいびきである確率を表す。さらに陰性適中度(Negative predictive value:NPV)は、判定結果が閾値以下のとき、非いびきである確率を表す。これらに基づいて、TP、FP、FN、TNの関係を、以下のように定義する。
“Sensitivity” here is the ability of the discrimination result to detect snoring. Specificity is the proportion of non-snoring that results in a determination result equal to or less than a threshold value. Positive predictive value (PPV) represents the probability of actually snoring when the determination result is equal to or greater than a threshold value. Further, negative predictive value (NPV) represents the probability of non-snoring when the determination result is equal to or less than a threshold value. Based on these, the relationship between TP, FP, FN, and TN is defined as follows.
ここで、感度(Sensitivity)、特異度(Specificity)、精度(Accuracy)、陽性適中度(PPV)、陰性適中度(NPV)を示す式を、上記TP、FP、FN、TNを用いて、それぞれ次式のように規定する。
Here, the expressions indicating sensitivity, specificity, accuracy, positive appropriateness (PPV), and negative appropriateness (NPV) are expressed using TP, FP, FN, and TN, respectively. It is defined as the following formula.
(ROC曲線)
(ROC curve)
ROC(Receiver Operating Characteristic)曲線とは、横軸に偽陽性率(1-特異度)、縦軸に真陽性率(感度)をとり、それぞれプロットしたものである。本実施例では、ROC曲線は Pthreにより構築することができる。ROC曲線の最適しきい値、すなわち、最適のPthreは、Youden’s indexの手法を用いて求めることができる。ROC曲線は、理想的な分類部の場合は左上に大きく弧を描く。この性質のため、ROC曲線の下部領域の面積である曲線下面積(Area Under the Curve:AUC)を、分類器や分類アルゴリズムの性能の良さを表す指標として利用できる。AUC値は、0.5から1の範囲で値をとり、分類精度が良好な場合には1に近づく特性を持つ、分類精度の評価指標である。
(学習データセット及びテストデータセット) The ROC (Receiver Operating Characteristic) curve is plotted with the false positive rate (1-specificity) on the horizontal axis and the true positive rate (sensitivity) on the vertical axis. In this example, the ROC curve can be constructed by P thre . The optimum threshold value of the ROC curve, that is, the optimum P thre can be obtained using the method of Youden's index. The ROC curve draws a large arc at the upper left in the case of an ideal classification unit. Due to this property, an area under the curve (AUC), which is the area of the lower region of the ROC curve, can be used as an index representing the performance of the classifier or the classification algorithm. The AUC value takes a value in the range of 0.5 to 1, and is a classification accuracy evaluation index having a characteristic approaching 1 when the classification accuracy is good.
(Learning dataset and test dataset)
(学習データセット及びテストデータセット) The ROC (Receiver Operating Characteristic) curve is plotted with the false positive rate (1-specificity) on the horizontal axis and the true positive rate (sensitivity) on the vertical axis. In this example, the ROC curve can be constructed by P thre . The optimum threshold value of the ROC curve, that is, the optimum P thre can be obtained using the method of Youden's index. The ROC curve draws a large arc at the upper left in the case of an ideal classification unit. Due to this property, an area under the curve (AUC), which is the area of the lower region of the ROC curve, can be used as an index representing the performance of the classifier or the classification algorithm. The AUC value takes a value in the range of 0.5 to 1, and is a classification accuracy evaluation index having a characteristic approaching 1 when the classification accuracy is good.
(Learning dataset and test dataset)
次に、40名の被験者から抽出されたAEを用いて、本実施例に係る生体音響抽出装置の性能評価を行った。この結果を表1に示す。
Next, the performance evaluation of the bioacoustic extraction device according to the present embodiment was performed using AE extracted from 40 subjects. The results are shown in Table 1.
M:male;F:female;BMI:body mass index;AHI:apnea-hypopnea index
M: male; F: female; BMI: body mass index; AHI: apnea-hypopnea index
表1に示すように、40名から抽出されたAEは、学習データセット16141(いびき音13406、非いびき音2735)、テストデータセット10651(いびき音7346、非いびき音3305)に分割されている。
(ラベリング) As shown in Table 1, AEs extracted from 40 people are divided into a learning data set 16141 (snoring sound 13406, non-snoring sound 2735) and a test data set 10651 (snoring sound 7346, non-snoring sound 3305). .
(labeling)
(ラベリング) As shown in Table 1, AEs extracted from 40 people are divided into a learning data set 16141 (snoring sound 13406, non-snoring sound 2735) and a test data set 10651 (snoring sound 7346, non-snoring sound 3305). .
(labeling)
本実施例では、聴取結果に基づいてAEのラベリング作業を行っている。ヘッドフォン(SHURE SRH840)から流れるAEを、3名の評価者が聴取してコンセンサスにより、AEのラベリングを行った。このように、全員の同意なくいびき音が選定されないようにした。
In this embodiment, the AE labeling work is performed based on the listening results. Three evaluators listened to the AE flowing from the headphones (SHURE SRH840) and labeled the AE by consensus. In this way, the snoring sound was not selected without everyone's consent.
このようなラベリング作業時において、非いびき音(non-snore)だと判定されたAEを表2に示す。
Table 2 shows the AEs that were determined to be non-snore during such labeling work.
(AIMに基づくいびき音と非いびき音の分類)
(Classification of snoring sound and non-snoring sound based on AIM)
以上のようにして得られた特徴ベクトルであるASF、SSAIF、AIMFを用いて、本実施例の性能評価を行った。この結果を表3に示す。この表に示すように、どの特徴ベクトルでも、高精度にいびき音と非いびき音の分類が可能であることが分かる。その中でも、AIMFが最も優れた性能を示した。これは、ヒトの聴覚が周波数情報と時間情報の両方を用いて音の分析を行っていることが理由だと思われる。
Using the feature vectors ASF, SSAIF, and AIMF obtained as described above, the performance of this example was evaluated. The results are shown in Table 3. As can be seen from this table, any feature vector can be classified into snoring sounds and non-snoring sounds with high accuracy. Among them, AIMF showed the best performance. This seems to be because the human auditory sense analyzes sound using both frequency information and time information.
一般的に、特徴ベクトルの次元数が高い場合、計算量が大きくなる。そこで、使用した3つの特徴ベクトルから、それぞれ、高い分類精度を維持したまま、次元圧縮が行えるかどうかを検討した。ここでは分類精度の向上に寄与する特徴量を抽出するため、ASF、SSAIFの次元数をそれぞれ増加させ、Accuracyの変動割合を計算した。
Generally, when the dimension number of the feature vector is high, the calculation amount becomes large. Therefore, we examined whether or not dimensional compression can be performed while maintaining high classification accuracy from the three feature vectors used. Here, in order to extract feature amounts that contribute to improvement of classification accuracy, the number of dimensions of ASF and SSAIF was increased, and the variation ratio of Accuracy was calculated.
次元数を1つ増加させたとき、Accuracyを1%以上増加させた特徴量を分類精度の向上に寄与する特徴量として抽出を行っている。図6に、特徴ベクトルのインデックスとAccuracyとの関係を示す。この図から、いびき音/非いびき音の分類に有効な特徴量が分かる。
When the number of dimensions is increased by 1, the feature amount that has increased Accuracy by 1% or more is extracted as the feature amount that contributes to the improvement of the classification accuracy. FIG. 6 shows the relationship between the feature vector index and Accuracy. From this figure, the feature quantity effective for the classification of the snoring sound / non-snoring sound can be understood.
さらに表4に、AS、SSAIから抽出された1%以上精度向上に貢献した特徴ベクトル(ASFopt',SSAIFopt.)を示している。この結果から、ASの次元数が4次元になり、SSAIの次元数が5次元に、大幅に次元圧縮できることが確認された。
Further, Table 4 shows feature vectors (ASF opt ′ , SSAIF opt. ) Extracted from AS and SSAI and contributing to accuracy improvement of 1% or more. From this result, it was confirmed that the number of AS dimensions can be reduced to 4 dimensions, and the number of dimensions of SSAI can be significantly reduced to 5 dimensions.
さらに、AIMFの次元圧縮を考慮して、ASFopt.とSSAIFopt.を合わせた9次元のAIMFopt.を特徴ベクトルとして使用することとした。上述した3つの特徴ベクトル:ASFopt.、SSAIFopt.、AIMFopt.によるシステムの性能評価を行った結果を表5に示す。表の結果から、より少ない特徴量を用いて、次元圧縮前と比較して同程度の、高精度ないびき音の自動分類及び抽出が行えることが判る。特に、AIMFopt.を用いた場合、最も高いシステムの精度(Accuracy)が96.9%、(感度:97.2%、特異度:96.3%)であることが分かった。ここでAIMFopt.を用いた場合のROC解析結果を、図7に示す。このように、本実施例に係る生体音響抽出装置の有効性や、いびき音抽出のための最適な特徴量が確認できた。
Further, in consideration of AIMF dimensional compression, 9-dimensional AIMF opt. , Which is a combination of ASF opt. And SSAIF opt. , Is used as a feature vector. Table 5 shows the results of system performance evaluation using the above-described three feature vectors: ASF opt. , SSAIF opt. , And AIMF opt . From the results of the table, it can be seen that it is possible to perform automatic classification and extraction of high-accuracy snoring sound with the same degree as compared with before dimension compression, using fewer feature amounts. In particular, when AIMF opt. Was used, the highest system accuracy (Accuracy) was found to be 96.9% (sensitivity: 97.2%, specificity: 96.3%). FIG. 7 shows the ROC analysis result when AIMF opt. Is used. Thus, the effectiveness of the bioacoustic extraction apparatus according to the present embodiment and the optimum feature amount for snoring sound extraction could be confirmed.
AS:auditory spectrum;SSAI:summary SAI;OT:optimum threshold;TP:true positive;FP:false positive;TN:true negative;FN:false negative;Sen.:sensitivity;Spe.:specificity;AUC:area under the curve;Acc.:accuracy;PPV:positive predictive value;NPV:negative predictive value
(従来手法との対比) AS: auditory spectrum; SSAI: summary SAI; OT: optimum threshold; TP: true positive; FP: false positive; TN: true negative; FN: false negative; : Sensitivity; Spe. : Specificity; AUC: area under the curve; Acc. : Accuracy; PPV: positive predictive value; NPV: negative predictive value
(Contrast with conventional method)
(従来手法との対比) AS: auditory spectrum; SSAI: summary SAI; OT: optimum threshold; TP: true positive; FP: false positive; TN: true negative; FN: false negative; : Sensitivity; Spe. : Specificity; AUC: area under the curve; Acc. : Accuracy; PPV: positive predictive value; NPV: negative predictive value
(Contrast with conventional method)
以上の通り、AIMベースのいびき音/非いびき音を分類する分類器の有効性が実証された。次に、いびき音/非いびき音分類手法として従来より提案されている報告例と本実施例を比較して、その優位性の可否を検証する。ここでは、過去の報告例として、DuckittらのMFCCs、CavusogluらのSED、Karunajeewaらの正規化AC’s、LPCs等、AzarbarzinらのSED(500Hz)、DafnaらのMFCCs、LPCs、SED等を用いたそれぞれの分類性能と、上述した本実施例の手法で40名の被験者データを分類した精度Accとを、以下の表7に纏めた。この表から明らかなとおり、分類対象とした被験者データや条件は異なるものの、本実施例の手法はいずれの報告例よりも優れた分類精度を達成している。
As described above, the effectiveness of the classifier that classifies AIM-based snore / non-snoring sounds has been demonstrated. Next, a report example conventionally proposed as a snoring sound / non-snoring sound classification method is compared with the present embodiment to verify the superiority of the superiority. Here, as past reports, Duckitt et al. MFCCs, Cavusoglu et al. SED, Karunajeewa et al. Normalized AC's, LPCs, etc., Azarbarzin et al. SED (500 Hz), Dafna et al. Table 7 below summarizes the classification performance and the accuracy Acc obtained by classifying the data of 40 subjects by the above-described method of this example. As is clear from this table, the subject data and conditions for classification are different, but the method of this example achieves classification accuracy superior to any reported example.
Acc.:accuracy;PPV:positive predictive value;MFCCs:mel-frequency cepstrum coefficients;OSAS:obstructive sleep apnea syndrome;SED:subband energy distribution;ACs:autocorrelation coefficients;LPCs:linear predictive coding coefficients;AIMF:AIM feature
(Duckittら) Acc. : Accuracy; PPV: positive predictive value; MFCCs: mel-frequency cepstrum coefficients; OSAS: obstructive sleep apnea syndrome; SED: subband energy distribution; ACs: autocorrelation coefficients; LPCs: linear predictive coding coefficients; AIMF: AIM feature
(Duckitt et al.)
(Duckittら) Acc. : Accuracy; PPV: positive predictive value; MFCCs: mel-frequency cepstrum coefficients; OSAS: obstructive sleep apnea syndrome; SED: subband energy distribution; ACs: autocorrelation coefficients; LPCs: linear predictive coding coefficients; AIMF: AIM feature
(Duckitt et al.)
Duckittらは、メル周波数ケプストラム係数(MFCC)を用いて、非いびき音を呼吸音、物音、無音区間、その他の雑音(車の音、犬の吠える音等)に大別して学習させているが、将来的には寝言等の音声も区別して学習できるように拡張する必要があると報告している(非特許文献5)。
(Cavusogluら) Duckitt et al. Use the Mel frequency cepstrum coefficient (MFCC) to learn non-snoring sounds by classifying them into breathing sounds, noises, silent intervals, and other noises (car sounds, dog barking sounds, etc.) In the future, it has been reported that it is necessary to expand the voice so that it is possible to distinguish and learn voices such as sleep (Non-Patent Document 5).
(Cavusoglu et al.)
(Cavusogluら) Duckitt et al. Use the Mel frequency cepstrum coefficient (MFCC) to learn non-snoring sounds by classifying them into breathing sounds, noises, silent intervals, and other noises (car sounds, dog barking sounds, etc.) In the future, it has been reported that it is necessary to expand the voice so that it is possible to distinguish and learn voices such as sleep (Non-Patent Document 5).
(Cavusoglu et al.)
Cavusogluらは、サブバンドエネルギー分布(Subband Energy Distribution:SED)を用いて単純ないびき音データセットのみを用いて学習した場合に、いびき音/非いびき音の分類で98.7%の精度を達成したとしている。しかしながら、単純いびき音とOSAS患者のいびき音を含むデータセットで学習した場合には、精度が90.2%に低下したと報告されている(非特許文献7)。これに対して本発明の実施例に係る分類方法では、単純いびき音とOSASいびき音を含むデータセットでも97.3%の精度を達成している。
(Karunajeewa) Cavusoglu et al. Achieved 98.7% accuracy in snoring / non-snoring classification when trained using only a simple snoring data set using Subband Energy Distribution (SED). You are doing. However, it has been reported that the accuracy decreased to 90.2% when learning with a data set including a snore sound and a snore sound of an OSAS patient (Non-patent Document 7). In contrast, the classification method according to the embodiment of the present invention achieves an accuracy of 97.3% even in a data set including a simple snore sound and an OSAS snore sound.
(Karunajeewa)
(Karunajeewa) Cavusoglu et al. Achieved 98.7% accuracy in snoring / non-snoring classification when trained using only a simple snoring data set using Subband Energy Distribution (SED). You are doing. However, it has been reported that the accuracy decreased to 90.2% when learning with a data set including a snore sound and a snore sound of an OSAS patient (Non-patent Document 7). In contrast, the classification method according to the embodiment of the present invention achieves an accuracy of 97.3% even in a data set including a simple snore sound and an OSAS snore sound.
(Karunajeewa)
Karunajeewaらは、正規化AC’s、LPCs等を用いて非いびき音を分類するにあたり、呼吸音と無音区間のみを用いており、言語音や物音は患者が眠る直前の10分又は20分間を除外することで回避できると報告している(非特許文献6)。そこで、呼吸音とその他の非いびき音の割合を調査するため、本実施例で使用しているデータベースにおける非いびき音を3名の実施例者による聴感評価試験を行い、呼吸音、咳、音声(寝言、うめき声、話し声)、物音(ベッドのきしむ音、金属音、サイレン等)の4クラスに分類して各エピソード数の調査を行った。その結果、録音が開始してから最初の1時間を除外しているにも関わらず、呼吸音以外の非いびき音が全体の非いびき音に占める割合は24.4%となり、決してこれらの音を無視できないことが分かった。そして、このようなデータセットにおいて、本実施例の分類方法は96.9%という高い分類精度を示すことができた。したがって、実施例に係る分類方法は睡眠中に発生すると想定される様々な音に対応できることが示唆される。
(Azarbarzinら) Karunajeewa et al. Used only the breathing sound and silence period to classify the non-snoring sounds using normalized AC's, LPCs, etc., and the speech sounds and noises were 10 minutes or 20 minutes before the patient fell asleep. It is reported that it can be avoided by excluding it (Non-Patent Document 6). Therefore, in order to investigate the proportion of breathing sounds and other non-snoring sounds, a non-snoring sound in the database used in this example was subjected to an audibility evaluation test by three examples, breathing sounds, cough, voice The number of episodes was investigated by classifying into four classes (sleeping, moaning, speaking) and noise (bed squeak, metal, siren, etc.). As a result, despite the exclusion of the first hour from the start of recording, non-snoring sounds other than breathing sounds accounted for 24.4% of the total non-snoring sounds. It was found that can not be ignored. In such a data set, the classification method of the present example was able to show a high classification accuracy of 96.9%. Therefore, it is suggested that the classification method according to the example can cope with various sounds assumed to occur during sleep.
(Azarbarzin et al.)
(Azarbarzinら) Karunajeewa et al. Used only the breathing sound and silence period to classify the non-snoring sounds using normalized AC's, LPCs, etc., and the speech sounds and noises were 10 minutes or 20 minutes before the patient fell asleep. It is reported that it can be avoided by excluding it (Non-Patent Document 6). Therefore, in order to investigate the proportion of breathing sounds and other non-snoring sounds, a non-snoring sound in the database used in this example was subjected to an audibility evaluation test by three examples, breathing sounds, cough, voice The number of episodes was investigated by classifying into four classes (sleeping, moaning, speaking) and noise (bed squeak, metal, siren, etc.). As a result, despite the exclusion of the first hour from the start of recording, non-snoring sounds other than breathing sounds accounted for 24.4% of the total non-snoring sounds. It was found that can not be ignored. In such a data set, the classification method of the present example was able to show a high classification accuracy of 96.9%. Therefore, it is suggested that the classification method according to the example can cope with various sounds assumed to occur during sleep.
(Azarbarzin et al.)
Azarbarzinらは、500HzのSEDを用いて、単純いびき音とOSASいびき音のデータセットを分類し、93.1%の精度を得たと報告している(非特許文献9)。しかしながら、分類対象データはわずか15分から抽出されたに過ぎず、これに対して本実施例は2時間もの長時間のデータに対して97.3%を達成している。
(Dafnaら) Azarbarzin et al. Reported that the data set of simple snore sound and OSAS snore sound was classified using SED of 500 Hz, and the accuracy of 93.1% was obtained (Non-patent Document 9). However, the classification target data is extracted from only 15 minutes. In contrast, the present embodiment achieves 97.3% for data as long as 2 hours.
(Dafna et al.)
(Dafnaら) Azarbarzin et al. Reported that the data set of simple snore sound and OSAS snore sound was classified using SED of 500 Hz, and the accuracy of 93.1% was obtained (Non-patent Document 9). However, the classification target data is extracted from only 15 minutes. In contrast, the present embodiment achieves 97.3% for data as long as 2 hours.
(Dafna et al.)
Dafnaらは、MFCCs、LPCs、SED等、複数の音響解析手法を組み合わせた34次元の特徴ベクトルを用いている(非特許文献10)。しかしながら、この手法はそれぞれの特徴量に関する理論を十分に理解して使用する必要があり、システムの複雑性が高い。これに対して、本実施例では比較的低次元の特徴ベクトルを用いて、既存のAIMシミュレータを利用したシンプルなプログラムで構築できる利点を有する。
Dafna et al. Uses a 34-dimensional feature vector that combines a plurality of acoustic analysis methods such as MFCCs, LPCs, and SED (Non-Patent Document 10). However, this method needs to fully understand and use the theory about each feature amount, and the complexity of the system is high. In contrast, this embodiment has an advantage that a relatively low-dimensional feature vector can be used to construct a simple program using an existing AIM simulator.
また津崎らは、これらの特徴ベクトルを使用する際に、ASについてはピークの総数・位置・レベル、スペクトル重心、スペクトル傾斜、スペクトル起伏に対応する量を計算し、SSAIについてはピークの時間間隔の逆数の対数によってピッチ相当値を計算し、ピークの高さによってピッチ明瞭性の指標として特徴抽出を行っている(非特許文献11)。しかしながら、活用しているSSAIの情報が少なく、より有効に活用する手段を言及する必要があった。またピークの自動検出を実現するには、検出精度に対する頑健性が課題となっている。これに対して、本実施例ではピーク検出システムを用いないAIMの特徴抽出法によるいびき音と非いびき音の分類法の効果検証を行った結果、ASとSSAIの両方を用いた場合の結果が最も高い精度を示している。また、SSAIのみを用いた場合でも94%の精度を得ており、SSAIの情報も有効活用している。
In addition, when using these feature vectors, Tsuzaki et al. Calculated the total number / position / level of peaks, spectral centroid, spectral tilt, and quantity corresponding to spectral undulations for AS, and the peak time interval for SSAI. A pitch equivalent value is calculated by the logarithm of the reciprocal number, and feature extraction is performed as an index of the clarity of the pitch by the peak height (Non-Patent Document 11). However, there is little information on SSAI being used, and it is necessary to mention means for more effectively using it. In addition, robustness with respect to detection accuracy is an issue to realize automatic peak detection. On the other hand, in the present embodiment, as a result of verifying the effect of the snoring sound and non-snoring sound classification method by the AIM feature extraction method that does not use the peak detection system, the result when both AS and SSAI are used is as follows. The highest accuracy is shown. Further, even when only SSAI is used, the accuracy of 94% is obtained, and SSAI information is also effectively used.
また本実施例に係るASやSSAIを用いた有音分類方法によれば、音響スペクトルから得られる特徴量、例えばピークの総数、位置、振幅に対応する特徴量や、スペクトルの重心、傾斜、増加や減少に対応する特徴量をASやSSAIから抽出することができる。
In addition, according to the voiced classification method using AS or SSAI according to the present embodiment, the feature amount obtained from the acoustic spectrum, for example, the feature amount corresponding to the total number, position, and amplitude of the peak, the center of gravity, inclination, and increase of the spectrum. The feature amount corresponding to the decrease can be extracted from the AS or SSAI.
また、抽出されたいびき音の内で、ピッチ(周期)を有する区間のみを対象として、スクリーニングを行うこともできる。あるいは、予めいびき音を抽出する際に、ピッチを有する区間のみを抽出するように構成してもよい。なお以上の例では、セグメントにおけるSNRが5dB以上の音をAEとして使用しているが、有音区間検出法の利用により、SNR<5dBの音を使用することもできる。
In addition, screening can be performed only for a section having a pitch (period) in the extracted snoring sounds. Alternatively, only a section having a pitch may be extracted when the snoring sound is extracted in advance. In the above example, a sound having an SNR of 5 dB or more in the segment is used as the AE, but a sound with SNR <5 dB can also be used by using the sounded section detection method.
またAIMの処理に際して、図4のブロック図で示すすべてのステージの処理を行う必要はなく、例えばNAP(神経活動パターン)のステージまでで得られる特徴量を用いることで、処理の高速化を図ることができる。
(実施例) In the AIM processing, it is not necessary to perform all the stages shown in the block diagram of FIG. 4. For example, feature speeds obtained up to the NAP (neural activity pattern) stage are used to speed up the processing. be able to.
(Example)
(実施例) In the AIM processing, it is not necessary to perform all the stages shown in the block diagram of FIG. 4. For example, feature speeds obtained up to the NAP (neural activity pattern) stage are used to speed up the processing. be able to.
(Example)
以上説明した非接触マイク技術を使用したAIMベースのいびき音と非いびき音の分類法を用いて、40名の被験者を用いて生体音響抽出方法の精度評価を行った。これらの結果を実施例として以下説明する。
(システムの雑音耐性) Using the AIM-based snoring sound and non-snoring sound classification methods using the non-contact microphone technology described above, the accuracy of the bioacoustic extraction method was evaluated using 40 subjects. These results will be described below as examples.
(System noise immunity)
(システムの雑音耐性) Using the AIM-based snoring sound and non-snoring sound classification methods using the non-contact microphone technology described above, the accuracy of the bioacoustic extraction method was evaluated using 40 subjects. These results will be described below as examples.
(System noise immunity)
本実施例では、睡眠関連音を録音するために非接触マイクを使用した。このアプローチは睡眠関連音分類の研究において、たびたび接触マイクと比較して議論される。非接触マイクは被験者に負荷をかけずに録音ができるという利点を有する一方、録音時のSNRの大小が問題点として挙げられる。非接触マイクを用いたこれまでの報告例では、信号のSNRを改善するアプローチとしてスペクトルサブトラクション法等の雑音低減処理を前処理に用いている。しかしながら、スペクトルの減算処理はミュージカルノイズと呼ばれる合成音声を生成してしまい、低いSNRでは基本周波数の推定が困難となる。これに対して、本実施例のBMMで使用したガンマチャープフィルタバンクは、ミュージカルノイズを生じさせることなく、-2dBのような低いSNRでも効果的に雑音環境下から音声を取り出すことができる。これは、AIMがノイズよりも周期音の微細構造を保存する特性を有しているためと考えられる。また、AIMベースの特徴ベクトルは、MFCCよりも高い雑音抑圧を有していると報告されている。以上の理由から、AIMには実環境の録音に対する優れた雑音耐性があると言える。
(有音区間の推定) In this example, a non-contact microphone was used to record sleep-related sounds. This approach is often discussed in sleep-related sound classification studies compared to contact microphones. The non-contact microphone has an advantage that recording can be performed without imposing a load on the subject, while the magnitude of the SNR during recording is a problem. In past reports using a non-contact microphone, noise reduction processing such as a spectral subtraction method is used for preprocessing as an approach to improve the SNR of a signal. However, the spectrum subtraction process generates synthesized speech called musical noise, and it becomes difficult to estimate the fundamental frequency at a low SNR. On the other hand, the gamma chirp filter bank used in the BMM of this embodiment can effectively extract voice from a noisy environment even with a low SNR such as -2 dB without causing musical noise. This is presumably because AIM has the characteristic of preserving the fine structure of periodic sounds rather than noise. AIM-based feature vectors are also reported to have higher noise suppression than MFCC. For the above reasons, it can be said that AIM has excellent noise resistance against recording in a real environment.
(Sound section estimation)
(有音区間の推定) In this example, a non-contact microphone was used to record sleep-related sounds. This approach is often discussed in sleep-related sound classification studies compared to contact microphones. The non-contact microphone has an advantage that recording can be performed without imposing a load on the subject, while the magnitude of the SNR during recording is a problem. In past reports using a non-contact microphone, noise reduction processing such as a spectral subtraction method is used for preprocessing as an approach to improve the SNR of a signal. However, the spectrum subtraction process generates synthesized speech called musical noise, and it becomes difficult to estimate the fundamental frequency at a low SNR. On the other hand, the gamma chirp filter bank used in the BMM of this embodiment can effectively extract voice from a noisy environment even with a low SNR such as -2 dB without causing musical noise. This is presumably because AIM has the characteristic of preserving the fine structure of periodic sounds rather than noise. AIM-based feature vectors are also reported to have higher noise suppression than MFCC. For the above reasons, it can be said that AIM has excellent noise resistance against recording in a real environment.
(Sound section estimation)
ここで、有音区間推定部20が有音区間を推定する方法の一例を、図8のフローチャート及び図9A~図9Eのグラフに基づいて説明する。ここでは、元音響データの波形(信号強度の時間変化)の一例として、図9Aに示すような睡眠関連音データから、有音区間としていびきエピソードを抽出することを考える。
Here, an example of a method in which the sound segment estimation unit 20 estimates the sound segment will be described based on the flowchart in FIG. 8 and the graphs in FIGS. 9A to 9E. Here, as an example of the waveform of the original acoustic data (time change in signal intensity), it is considered that a snoring episode is extracted as a voiced segment from sleep-related sound data as shown in FIG. 9A.
まずステップS801において、睡眠関連音を収集する。ここでは、非接触式マイクロフォンを用いて、睡眠中の患者から元音響データ(図9A)となる睡眠関連音を録音する。
First, in step S801, sleep related sounds are collected. Here, the sleep related sound used as original sound data (FIG. 9A) is recorded from the patient during sleep using a non-contact type microphone.
次にステップS802において、元音響データを微分又は差分する。この処理は図2に示す前処理器21で行う。ここでは、図9Aの元音響データを、前処理器21である微分器で微分しており、この結果得られる前処理データの信号波形を、図9Bに示す。なお差分は、デジタルフィルタの一である一次FIR(Finite Impulse Response)フィルタで行うことができる。
Next, in step S802, the original sound data is differentiated or differentiated. This process is performed by the pre-processor 21 shown in FIG. Here, the original acoustic data of FIG. 9A is differentiated by a differentiator which is the pre-processor 21, and the signal waveform of the pre-processed data obtained as a result is shown in FIG. 9B. The difference can be performed by a first-order FIR (Finite Impulse Response) filter which is one of digital filters.
さらにステップS803において、前処理データを二乗する。この処理は図2に示す二乗器22で行う。図9Bの前処理データを二乗器22で二乗した結果得られる二乗データの信号波形を、図9Cに示す。
Further, in step S803, the preprocess data is squared. This process is performed by the squarer 22 shown in FIG. FIG. 9C shows a signal waveform of the square data obtained as a result of squaring the preprocessed data in FIG. 9B by the squarer 22.
さらにステップS804において、二乗データをダウンサンプリングする。この処理は図2に示すダウンサンプリング器23で行う。図9Cの二乗データをダウンサンプリング器23でダウンサンプリングした結果得られるダウンサンプリングデータの信号波形を、図9Dに示す。なお、ダウンサンプリングの代わりに、二乗データを、例えば、N=400、シフト幅200でセグメントに分割して、k番目のセグメントにおける信号エネルギーを求めて実現しても良い 。
In step S804, the square data is down-sampled. This processing is performed by the downsampling device 23 shown in FIG. FIG. 9D shows a signal waveform of the down-sampling data obtained as a result of down-sampling the square data of FIG. 9C by the down-sampler 23. Instead of downsampling, the square data may be divided into segments with N = 400 and a shift width of 200, for example, and the signal energy in the kth segment may be obtained.
ステップS805において、ダウンサンプリングデータから中央値を取得する。この処理は図2に示すメディアンフィルタ24で行う。図9Dのダウンサンプリングデータから、メディアンフィルタ24で中央値を取得した結果得られる信号波形を、図9Eに示す。このようにして、図9Aのような背景ノイズに埋もれた元音響データから、図9Eのように必要な生体音響データのみを抽出することが可能となり、背景ノイズに埋もれた睡眠関連音であっても、有音区間(呼吸音、いびき音等)を正確に抽出可能となる。
In step S805, the median value is acquired from the downsampling data. This processing is performed by the median filter 24 shown in FIG. FIG. 9E shows a signal waveform obtained as a result of obtaining the median value by the median filter 24 from the down-sampling data of FIG. 9D. In this way, it becomes possible to extract only necessary bioacoustic data as shown in FIG. 9E from the original acoustic data buried in the background noise as shown in FIG. 9A, and it is a sleep related sound buried in the background noise. In addition, it is possible to accurately extract a sound section (breathing sound, snoring sound, etc.).
なお、有音区間の検出に際しては、上述の通り差分、二乗等を用いる他、ニューラルネットワーク等の学習機械や、その他の時系列解析技術、信号解析・モデリング技術を用いて実現してもよい。
(比較例1、2) In addition, the detection of the voiced section may be realized using a learning machine such as a neural network, other time series analysis techniques, and signal analysis / modeling techniques in addition to using the difference, the square, and the like as described above.
(Comparative Examples 1 and 2)
(比較例1、2) In addition, the detection of the voiced section may be realized using a learning machine such as a neural network, other time series analysis techniques, and signal analysis / modeling techniques in addition to using the difference, the square, and the like as described above.
(Comparative Examples 1 and 2)
ここで比較のため、音声区間の検出方法として従来から知られている方法を比較例として適用した。ここでは、ゼロ交差率(Zero-Crossing Rate:ZCR)を用いる方法を比較例1、音声信号のエネルギーに基づくSTE法を比較例2として、図10Aの元音響データに対してそれぞれ有音区間の自動抽出を行った結果を、図10B、図10Cにそれぞれ示す。さらに比較のため、上述した実施例1に係る方法での自動抽出結果を図10Dに示す。これらZCRやSTE法は、外部から入力された音声信号より、音声区間のみを検出する音声区間検出方法として、代表的な手法である。
(ZCR) Here, for comparison, a conventionally known method as a method for detecting a speech section is applied as a comparative example. Here, a method using a zero-crossing rate (ZCR) is referred to as Comparative Example 1, and an STE method based on the energy of an audio signal is referred to as Comparative Example 2, and each of the sound sections of the original acoustic data in FIG. The results of the automatic extraction are shown in FIGS. 10B and 10C, respectively. For further comparison, FIG. 10D shows an automatic extraction result obtained by the method according to Example 1 described above. These ZCR and STE methods are typical techniques as a voice section detection method for detecting only a voice section from a voice signal input from the outside.
(ZCR)
(ZCR) Here, for comparison, a conventionally known method as a method for detecting a speech section is applied as a comparative example. Here, a method using a zero-crossing rate (ZCR) is referred to as Comparative Example 1, and an STE method based on the energy of an audio signal is referred to as Comparative Example 2, and each of the sound sections of the original acoustic data in FIG. The results of the automatic extraction are shown in FIGS. 10B and 10C, respectively. For further comparison, FIG. 10D shows an automatic extraction result obtained by the method according to Example 1 described above. These ZCR and STE methods are typical techniques as a voice section detection method for detecting only a voice section from a voice signal input from the outside.
(ZCR)
まずZCRは、次式で定義される。
First, ZCR is defined by the following equation.
ここで、sgn[χ(k)]は次式で表される。
Here, sgn [χ (k)] is expressed by the following equation.
ZCRは、発声される有音区間のZCRが、無音区間のZCRよりも相当小さい場面で主に利用されている。この方法は、音声の種別(強い調和構造)に依存する。
(STE法) The ZCR is mainly used in a scene where the ZCR of the voiced section to be uttered is considerably smaller than the ZCR of the silent section. This method depends on the type of voice (strong harmony structure).
(STE method)
(STE法) The ZCR is mainly used in a scene where the ZCR of the voiced section to be uttered is considerably smaller than the ZCR of the silent section. This method depends on the type of voice (strong harmony structure).
(STE method)
またSTE法は、音のSTE関数を上記数1のように定義する。
In the STE method, the STE function of sound is defined as shown in Equation 1 above.
このSTE法は、有音区間における上記Ekの値が無音区間のEkよりも相当大きく、SNRが高くて有音区間のEkが背景音のノイズから明瞭に判読できる場面で主に利用されている。
This STE method is mainly used in a scene in which the value of E k in a voiced section is considerably larger than E k in a silent section, the SNR is high, and E k in a voiced section can be clearly read from background noise. Has been.
このように、ZCRやSTE等の手法は、いずれも比較的雑音の影響が少ない環境を前提として自動音声区間検出を行っているため、図10Aに示すようなSNRが低下する環境下では、適切な有音区間の抽出がなされ難い。図10Bに示す比較例1のZCRでは十分な抽出が成されているとは言えず、また図10Cに示す比較例2の信号エネルギーでも、一様にピークが表れており分離が困難である。これに対し、上述した実施例1に係る方法での自動抽出結果は、図10Dに示すように、明確に信号が抽出されており、図10Aのような極めてSNRの小さな睡眠関連音データに対しても、実施例に係る有音区間推定部を用いることで、有音区間(呼吸音、いびき音)を抽出、分離でき、有用性が確認された。
As described above, since all methods such as ZCR and STE perform automatic speech interval detection on the premise of an environment with relatively little influence of noise, it is appropriate in an environment where the SNR decreases as shown in FIG. 10A. It is difficult to extract a sound section. In the ZCR of Comparative Example 1 shown in FIG. 10B, it cannot be said that sufficient extraction is performed, and even with the signal energy of Comparative Example 2 shown in FIG. 10C, peaks appear uniformly and separation is difficult. On the other hand, as shown in FIG. 10D, the automatic extraction result by the method according to the above-described first embodiment is clearly extracted as shown in FIG. 10D, and the sleep-related sound data having an extremely small SNR as shown in FIG. 10A. However, by using the sounded section estimation unit according to the example, a sounded section (breathing sound, snoring sound) can be extracted and separated, and its usefulness has been confirmed.
なお、この例では元音響データとして、表6に示すように、10人の被験者に対して睡眠関連音データを収集し、睡眠関連音の長さを120sとして人手によりいびき音(Snore)と呼気音(Breath)に予め分類している。
In this example, as shown in Table 6, sleep-related sound data is collected for 10 subjects as original sound data, and the length of the sleep-related sound is 120 s. It has been classified in advance as sound (Breath).
上述した感度(Sensitivity)、特異度(Specificity)、曲線下面積(AUC)の定義に従い、被験者毎に曲線下面積、感度、特異度を、実施例1、比較例1、比較例2について、それぞれ算出した。得られた結果を、以下の表9~表11に、それぞれ示す。これらの結果、実施例1においては感度、特異度、曲線下面積のいずれも高い精度を達成できていることが確認された。
In accordance with the definitions of sensitivity (Sensitivity), specificity (Specificity), and area under the curve (AUC) described above, the area under the curve, the sensitivity, and the specificity for each subject, Example 1, Comparative Example 1, and Comparative Example 2, respectively. Calculated. The obtained results are shown in Tables 9 to 11 below. As a result, in Example 1, it was confirmed that high precision was achieved in all of sensitivity, specificity, and area under the curve.
(実施例3)
(Example 3)
さらに、いびき音と非いびき音との分類がなされたデータに対して、AIMに基づくスクリーニング、すなわち篩い分けを図1のスクリーニング部70でもって行う。ここでは、スクリーニング部70が、いびき音からOSAS(閉塞性睡眠時無呼吸症候群)か非OSASかの判定を行う。ここで、スクリーニング部70がOSASと非OSASの分類を適切に行えるか、その有用性を確認するため、実施例3を行った。ここでは、まずいびき音と非いびき音との分類用のデータセットとして、31名の被験者から抽出したAEのデータセットを用意した。この内、20名を学習データセットに、11名をテストデータセットとして用いた。ここでは、被験者の睡眠1時間のデータを2時間分抽出した。
Further, screening based on the AIM, that is, sieving is performed by the screening unit 70 of FIG. 1 on the data classified into the snoring sound and the non-snoring sound. Here, the screening part 70 determines whether it is OSAS (obstructive sleep apnea syndrome) or non-OSAS from the snoring sound. Here, Example 3 was performed in order to confirm whether the screening unit 70 can appropriately classify OSAS and non-OSAS. Here, an AE data set extracted from 31 subjects was prepared as a data set for classifying snoring sounds and non-snoring sounds. Of these, 20 were used as learning data sets and 11 were used as test data sets. Here, the data of the subject's sleep for 1 hour was extracted for 2 hours.
AEは、人手により予めいびき音又は非いびき音にラベル分けされている。いびき音と非いびき音の分類に用いたデータセットの詳細を以下の表に示す。
AE is pre-labeled into snoring sound or non-snoring sound by hand. Details of the dataset used to classify snoring sounds and non-snoring sounds are shown in the table below.
M:male;F:female;BMI:body mass index;AHI:apnea-hypopnea index
M: male; F: female; BMI: body mass index; AHI: apnea-hypopnea index
一方、OSASと非OSASの分類に用いたデータセットを、以下の表に示す。ここでは、50名の被験者を用いて、この内35名を学習データセットに、15名をテストデータセットに、それぞれ利用した。
On the other hand, the data set used for OSAS and non-OSAS classification is shown in the following table. Here, 50 subjects were used, 35 of which were used as the learning data set and 15 were used as the test data set.
(AIMに基づくOSASと非OSASの分類)
(Classification of OSAS and non-OSAS based on AIM)
まず、図1の分類部50及び判別部60によるいびき音又は非いびき音の分類性能を 6次元特徴ベクトル(ASから抽出された尖度、歪度、スペクトル重心、スペクトルバンド幅、SSAIから抽出された尖度、歪度)を用いて評価した。この表に示すとおり、これらの特徴ベクトルを用いることで、感度98.4%、特異度94.06%という極めて高い精度でいびき音又は非いびき音の分類がなされていることが確認された。
First, the classification performance of the snoring sound or the non-snoring sound by the classifying unit 50 and the discriminating unit 60 of FIG. 1 is extracted from a 6-dimensional feature vector (kurtosis, skewness, spectral centroid, spectral bandwidth, and SSAI extracted from AS. Kurtosis and skewness). As shown in this table, by using these feature vectors, it was confirmed that snoring sounds or non-snoring sounds were classified with extremely high accuracy of sensitivity 98.4% and specificity 94.06%.
さらに本実施例において、図1の分類部50及び判別部60で判別されたいびき音の8次元特徴(ASから抽出された歪度、スペクトル重心、スペクトルロールオフ、SSAIから抽出された尖度、歪度、スペクトルバンド幅、スペクトルロールオフ、スペクトルエントロピー)ベクトルに基づき、スクリーニング部70でOSASと非OSASの分類を行った。また、いびき音又は非いびき音の分類、OSASと非OSASの分類では、上述した特徴量の組み合わせを用いることができる。これらの特徴量に加えて、スペクトル非対称性(spectral asymmetry)、バンドエネルギー比(band energy ratio)等を用いることもできる。このOSASと非OSASの分類の評価は、10foldのクロスバリデーションテストで行った。ここでは、データセットからランダムに選択した9foldを学習用として、残りの1foldをテスト用として用いた。
Further, in the present embodiment, the eight-dimensional features of the snoring sound discriminated by the classifying unit 50 and the discriminating unit 60 of FIG. 1 (distortion degree extracted from AS, spectral centroid, spectral roll-off, kurtosis extracted from SSAI, Based on the skewness, spectral bandwidth, spectral roll-off, spectral entropy) vectors, the screening unit 70 classified OSAS and non-OSAS. Further, in the classification of the snoring sound or the non-snoring sound, and the classification of the OSAS and the non-OSAS, the above-described combination of feature amounts can be used. In addition to these feature quantities, spectral asymmetry, band energy ratio, and the like can also be used. The evaluation of the OSAS and non-OSAS classification was performed by a 10-fold cross-validation test. Here, 9fold randomly selected from the data set was used for learning, and the remaining 1fold was used for testing.
この結果、無呼吸低呼吸指数(apnea-hypopnea index:AHI)の判断基準となる閾値を15イベント/hに設定してOSAS患者をスクリーニング部70で篩い分けしたところ、上記表14に示すように感度は85.00%±26.87、特異度は95.00%±15.81という優れた結果が得られ、本実施例の有用性が確認された。なお、AHIはこの値に限らず、5イベント/hや10イベント/h等とすることもできる。また分類部50、判別部60、スクリーニング部70における解析に際して、男女の性別毎の特徴を加味した分類や判別、篩い分けも可能となる。なお、分類部50や判別部60を用いた睡眠音の分類や識別には、MLRに代えて、パターン認識・識別技術、学習機械、例えばニューラルネットワーク、ディープニューラルネットワーク、サポートベクターマシン(SVM)等を利用することもできる。また、上記実施例では2クラス分類としたが、上記学習機械を用いて、多クラス分類問題を考えることもできる。例えば、特徴量をもとに、直接、OSASいびき(1)、非OSASいびき(2)、非いびき(3)のように分類することもできる。勿論、自動抽出においては、いびき(1)、呼吸音(2)、咳(3)のように多クラスに分類することができる。
(実施例4) As a result, when the threshold value used as the judgment standard of apnea-hypopnea index (AHI) was set to 15 events / h and the OSAS patients were screened by thescreening unit 70, as shown in Table 14 above. Excellent results were obtained with a sensitivity of 85.00% ± 26.87 and a specificity of 95.00% ± 15.81, confirming the usefulness of this example. The AHI is not limited to this value, and may be 5 events / h, 10 events / h, or the like. Further, in the analysis in the classification unit 50, the determination unit 60, and the screening unit 70, classification, determination, and sieving in consideration of the characteristics of each gender are possible. For classification and identification of sleep sounds using the classification unit 50 and the determination unit 60, pattern recognition / identification technology, learning machines such as neural networks, deep neural networks, support vector machines (SVM), etc., are used instead of MLR. Can also be used. In the above embodiment, the two-class classification is used. However, a multi-class classification problem can be considered using the learning machine. For example, it can be classified directly into OSAS snoring (1), non-OSAS snoring (2), and non-snoring (3) based on the feature amount. Of course, automatic extraction can be classified into multiple classes such as snoring (1), breathing sound (2), and cough (3).
Example 4
(実施例4) As a result, when the threshold value used as the judgment standard of apnea-hypopnea index (AHI) was set to 15 events / h and the OSAS patients were screened by the
Example 4
次に実施例4として、被験者数を増やした状態で、すなわち被験者データベースを拡大した状態で、いびき/非いびきの識別、OSAS/非OSASの識別が可能か否かを検証した。ここでは、2時間の睡眠音の中から獲得されたAudio event(AE)を使用する。なお、睡眠音は、終夜睡眠ポリグラフ(Polysonmnography:PSG)検査中に録音されている。性能評価を行うために、睡眠研究に従事する3名の視聴者により、注意深く、いびき音/非いびき音の2クラスのラベリングが行われている。PSG検査時に得られた被験者情報とラベリングによって得られた、いびきの数の平均値を表15の被験者データベースに示す。
Next, as Example 4, it was verified whether snoring / non-snoring discrimination and OSAS / non-OSAS discrimination were possible with the number of subjects increased, that is, with the subject database expanded. Here, an Audio event (AE) acquired from a 2-hour sleep sound is used. The sleep sound is recorded during the polysomnography (PSG) examination all night. To perform the performance evaluation, two viewers engaged in sleep research are carefully labeling two classes of snore / non-snoring sounds. The subject information obtained during the PSG examination and the average value of the number of snores obtained by labeling are shown in the subject database of Table 15.
AEごとにStabilized auditory image(SAI)を形成し、10フレーム以降に得られたAuditory spectrum(AS)とSummary stabilized auditory image(SSAI)を解析対象とした。なお、各フレームは、ASとSSAIは最大振幅が1になるように正規化を行った。なお標準偏差で正規化する場合は、1エピソードで正規化することも可能である。ASからはKurtosis、Skewness、Spectral centroid、Spectral bandwidth、Spectral roll-off、Spectral entropy、Spectral contrast、Spectral flatnessの8つの特徴を利用した。SSAIではSpectral flatness以外の7つの特徴量を用いた。これらの特徴量は、AEの全フレームから抽出されるため、それぞれの特徴量の平均値をAEから得られる特徴量として使用した。これらの特徴量からの中から、更に識別に有効な特徴量を選択するために、男性、女性、男性女性データセット毎に、特徴選択アルゴリズムであるステップワイズ法を使用した。
A Stabilized auditory image (SAI) was formed for each AE, and the Auditory spectrum (AS) and Summary stabilized auditory (image (SSAI) obtained after 10 frames were used for analysis. Each frame was normalized so that the maximum amplitude of AS and SSAI was 1. In addition, when normalizing with a standard deviation, it is also possible to normalize with one episode. From AS, eight characteristics of Kurtosis, Skewness, Spectral centroid, Spectral bandwidth, Spectral roll-off, Spectral entropy, Spectral contrast, Spectral flatness were used. In SSAI, seven feature values other than Spectral flatness were used. Since these feature amounts are extracted from all frames of the AE, the average value of each feature amount is used as the feature amount obtained from the AE. In order to select an effective feature quantity for discrimination from these feature quantities, a stepwise method, which is a feature selection algorithm, was used for each male, female, and male / female data set.
上述のように選択した特徴ベクトルを用いて、MLRモデルに基づく学習を行う。ここでは、図3で示したAIMを用いた生体音響抽出方法を示すフローチャートのステップS305で行うMLRを用いたいびき音/非いびき音の分類(OSASスクリーニングの場合:OSAS/non-OSASの分類)を行った。さらにステップS307において閾値に基づく判別を行った。ここでは、分類部50で音響特徴量を所定の種別に分類し、判別部60でいびき音又は非いびき音の判別を行うために、AEから抽出された特徴ベクトルを用いた多項分布ロジスティック回帰(Multi-nomial logistic regression:MLR)分析を用いた。
(いびき音自動分類の性能評価) Learning based on the MLR model is performed using the feature vector selected as described above. Here, classification of snoring sound / non-snoring sound using MLR performed in step S305 of the flowchart showing the bioacoustic extraction method using AIM shown in FIG. 3 (in the case of OSAS screening: classification of OSAS / non-OSAS) Went. In step S307, discrimination based on the threshold value was performed. Here, in order to classify the acoustic feature quantity into a predetermined type by the classifyingunit 50 and discriminate the snoring sound or the non-snoring sound by the discriminating unit 60, a multinomial distribution logistic regression using a feature vector extracted from the AE ( Multi-nomial logistic regression (MLR) analysis was used.
(Performance evaluation of automatic snoring classification)
(いびき音自動分類の性能評価) Learning based on the MLR model is performed using the feature vector selected as described above. Here, classification of snoring sound / non-snoring sound using MLR performed in step S305 of the flowchart showing the bioacoustic extraction method using AIM shown in FIG. 3 (in the case of OSAS screening: classification of OSAS / non-OSAS) Went. In step S307, discrimination based on the threshold value was performed. Here, in order to classify the acoustic feature quantity into a predetermined type by the classifying
(Performance evaluation of automatic snoring classification)
AIMを用いた、いびき音自動分類の性能評価を行うために、Leave-one-out交差検証を行った。ここでは、全テストデータセットにおける感度(Sensitivity)、特異度(Specificity)、AUC(area under the curve)、精度(Accuracy)、陽性適中度(Positive pre-dictive value:PPV)、陰性適中度(Negative predictive value:NPV)を計算して、それらの平均値、標準偏差を求めることにより、システムの分類性能を評価した(自動分類、スクリーニング共に、厳しい基準で評価を行っている)。
(OSASスクリーニングの性能評価) In order to evaluate the performance of automatic snoring classification using AIM, leave-one-out cross validation was performed. Here, sensitivity (Specificity), AUC (area under the curve), accuracy (Accuracy), positive pre-dictive value (PPV), negative appropriateness (Negative) in all test data sets The predictive value (NPV) was calculated, and the average value and standard deviation were calculated to evaluate the classification performance of the system (both automatic classification and screening are evaluated based on strict criteria).
(Performance evaluation of OSAS screening)
(OSASスクリーニングの性能評価) In order to evaluate the performance of automatic snoring classification using AIM, leave-one-out cross validation was performed. Here, sensitivity (Specificity), AUC (area under the curve), accuracy (Accuracy), positive pre-dictive value (PPV), negative appropriateness (Negative) in all test data sets The predictive value (NPV) was calculated, and the average value and standard deviation were calculated to evaluate the classification performance of the system (both automatic classification and screening are evaluated based on strict criteria).
(Performance evaluation of OSAS screening)
AIMを用いた、OSASスクリーニングの性能評価には、被験者データベースから無作為に選んだ70%を学習用データセット、残りの30%をテスト用データセットして使用した。両データセット共に、OSAS被験者とNon-OSAS被験者が均等になるように、独立に分割し、ランダムに50パターン作成した学習・テスト用データセットを使用して、感度(Sensitivity)、特異度(Specificity)、AUC(area under the curve、精度(Accuracy)、陽性適中度(Positive pre-dictive value:PPV)、陰性適中度(Negative predictive value:NPV)を計算した後、平均値と標準偏差の計算を行い、OSASスクリーニングの性能を評価した。
(いびき音自動分類の性能評価結果) For performance evaluation of OSAS screening using AIM, 70% randomly selected from the subject database was used as a learning data set, and the remaining 30% was used as a test data set. Both datasets were divided independently so that the OSAS subjects and Non-OSAS subjects would be equal, and 50 patterns were randomly created, and the sensitivity (Sensitivity) and specificity (Specificity) ), AUC (area under the curve, accuracy, positive pre-dictive value (PPV), negative predictive value (NPV), and then calculate the mean and standard deviation. The OSAS screening performance was evaluated.
(Performance evaluation results of automatic snoring classification)
(いびき音自動分類の性能評価結果) For performance evaluation of OSAS screening using AIM, 70% randomly selected from the subject database was used as a learning data set, and the remaining 30% was used as a test data set. Both datasets were divided independently so that the OSAS subjects and Non-OSAS subjects would be equal, and 50 patterns were randomly created, and the sensitivity (Sensitivity) and specificity (Specificity) ), AUC (area under the curve, accuracy, positive pre-dictive value (PPV), negative predictive value (NPV), and then calculate the mean and standard deviation. The OSAS screening performance was evaluated.
(Performance evaluation results of automatic snoring classification)
AIMを用いた、いびき音自動分類の性能評価結果(Leave-one-out交差検証)を表16に示す。
Table 16 shows the performance evaluation results (Leave-one-out cross-validation) of automatic snoring sound classification using AIM.
以上の結果から、AIMから得られる特徴だけを使用して、男女ともに、高い精度で、いびきを自動分類できることが判明した。
(AIMを用いた、OSASスクリーニングの性能評価結果) From the above results, it was found that snoring can be automatically classified with high accuracy for both men and women using only the features obtained from AIM.
(Results of OSAS screening performance evaluation using AIM)
(AIMを用いた、OSASスクリーニングの性能評価結果) From the above results, it was found that snoring can be automatically classified with high accuracy for both men and women using only the features obtained from AIM.
(Results of OSAS screening performance evaluation using AIM)
上述した方法によりAIMを用いて自動抽出した、いびき音に基づくOSASスクリーニングの性能評価結果を表17に示す。また参考のため、自動抽出を行わず、ラベリングにより手動で抽出した、いびき音のみを利用したOSASスクリーニングの性能評価結果を表18に示す。
Table 17 shows the performance evaluation results of the OSAS screening based on the snoring sound, which was automatically extracted using the AIM by the method described above. For reference, Table 18 shows the performance evaluation results of OSAS screening using only the snoring sound, which is extracted manually without labeling and automatically extracted.
表17より、ヒトの聴覚能力を模倣した、AIMだけを用いて自動抽出した、いびき音をもとに、どのデータセットにおいても高い精度でOSASスクリーニングを行えることが示唆された。表17と18から、自動抽出された場合に比べて、ラベリングにより手動で抽出したいびき音に基づくOSASスクリーニングの性能が全ての被験者セットにおいて高いことが判った。この結果より、AIMを用いた、いびき音の自動抽出の性能を更に向上させることによりOSASスクリーニング性能の向上が示唆される。いびき自動抽出性能の向上のために、ASとSSAIのフレームにおける正規化方法の変更、例えばフレームで正規化するのではなく、1エピソードで正規化することも可能である。また、特徴量の追加、例えば、非特許文献1で述べられるような、ピッチ情報、フォルマント周波数情報などの信号処理、音声認識、音声信号処理に使用される特徴ベクトルとの組み合わせ等も可能である。
Table 17 suggests that OSAS screening can be performed with high accuracy in any data set based on snoring sounds that are automatically extracted using only AIM, imitating human auditory ability. From Tables 17 and 18, it was found that the performance of OSAS screening based on snoring sounds manually extracted by labeling was higher in all subject sets than in the case of automatic extraction. This result suggests that the OSAS screening performance is improved by further improving the performance of automatic extraction of snoring sound using AIM. In order to improve the snoring automatic extraction performance, it is possible to change the normalization method in the AS and SSAI frames, for example, normalize in one episode instead of normalizing in frames. In addition, addition of feature amounts, for example, signal processing such as pitch information and formant frequency information, speech recognition, feature vectors used for speech signal processing, and the like as described in Non-Patent Document 1 are possible. .
さらに本実施例では、エネルギの高い有音区間を対象にAIM処理を行った。その他、低SNRの音を含む有音区間を対象としてAIM処理を行って、咳、いびき、呼吸、発声、ベッドノイズ等のカテゴリに分類することも可能である。
Furthermore, in this embodiment, AIM processing was performed for a high-sounding sound section. In addition, it is possible to classify into categories such as cough, snoring, breathing, vocalization, bed noise, etc. by performing AIM processing on a voiced section including a low SNR sound.
以上の実施例に係る方法を纏めると、まずAIM処理により、35msのフレーム毎にSAIを求める。次に、各SAIからASとSSAIを求める。さらにAS、SSAIから、それぞれ特徴量(尖度、歪度、スペクトルバンド幅、スペクトル重心、スペクトルエントロピー、スペクトルロールオフ、スペクトルフラットネス等)を抽出する。フレームの数だけ特徴量が得られるので、有音区間の一の音から得られる各特徴量は、平均化することにより、平均値、標準偏差として利用できる。ここでは、いびき音自動抽出の場合は、AS、SSAIから得られる特徴量の平均値、標準偏差を使用した。一方、OSASスクリーニングでは、AS、SSAIから得られる特徴量の平均値のみを使用した。このように、OSASスクリーニングに比べ、いびき音自動抽出では、より多くの特徴量を使用、検討した。さらに多くの特徴量を用いることもできるし、また特徴量の基本統計量(標準偏差、尖度等)や、ピークの総数、出現位置、振幅、重心、傾斜、増加、減少等の特徴量等を使用して、生体音響自動抽出、OSASスクリーニングの性能を評価できる。また、常にSAIを用いる必要はなく、例えばSAI手前の処理であるNAPまでで特徴量を抽出することにより、計算速度を向上させることもできる。
To summarize the method according to the above embodiment, first, the SAI is obtained for every 35 ms frame by the AIM processing. Next, AS and SSAI are obtained from each SAI. Further, feature values (kurtosis, skewness, spectrum bandwidth, spectrum centroid, spectrum entropy, spectrum roll-off, spectrum flatness, etc.) are extracted from AS and SSAI, respectively. Since the feature amount is obtained by the number of frames, each feature amount obtained from one sound of the sounded section can be used as an average value and a standard deviation by averaging. Here, in the case of automatic extraction of snoring sound, the average value and standard deviation of feature values obtained from AS and SSAI were used. On the other hand, in the OSAS screening, only the average value of feature values obtained from AS and SSAI was used. Thus, compared with OSAS screening, more feature values were used and examined in the snoring sound automatic extraction. More feature quantities can be used, and basic statistics (standard deviation, kurtosis, etc.) of feature quantities, feature quantities such as total number of peaks, appearance position, amplitude, center of gravity, slope, increase, decrease, etc. Can be used to evaluate the performance of bioacoustic automatic extraction and OSAS screening. Further, it is not always necessary to use SAI. For example, the calculation speed can be improved by extracting the feature amount up to NAP which is the process before SAI.
以上の例ではいびき音を例に挙げて説明したが、本発明の対象はいびき音に限らず、生体物の発する他の音響(生体音響)にも利用でき、さらに検出された生体音響から、種々の症例の発見や診断等に適用できる。例えば睡眠音の検出により、上述したOSASスクリーニングや、睡眠障害の鑑別が可能となる。また肺音、呼吸音、咳等から、ぜんそく、肺炎等の診断が可能となる。あるいは、心音から各種の心疾患が可能となり、さらに腸音の解析により機能性消化管障害のような各種の腸疾患のスクリーニングが可能となる。その他、胎動音、筋音等の検出にも適用できる。このような生体音響を適切に抽出することで、これらの生体音響から診断可能な症例の解析に対して好適に利用できる。また、本発明はヒトに限らず、他の生物に対しても利用できる。例えば愛玩動物や動物園で飼育される動物の健康診断等においても、好適に利用できる。
In the above example, the snoring sound has been described as an example, but the subject of the present invention is not limited to the snoring sound, but can be used for other sounds (biological sound) generated by biological objects, and from the detected bioacoustics, It can be applied to discovery and diagnosis of various cases. For example, by detecting sleep sound, OSAS screening as described above and discrimination of sleep disorders can be performed. Diagnosis of asthma, pneumonia, etc. can be made from lung sounds, respiratory sounds, coughs, and the like. Alternatively, various heart diseases can be made from heart sounds, and various intestinal diseases such as functional gastrointestinal disorders can be screened by analyzing intestinal sounds. In addition, it can also be applied to detection of fetal movement sounds, muscle sounds, and the like. By appropriately extracting such bioacoustics, it can be suitably used for analysis of cases that can be diagnosed from these bioacoustics. Moreover, this invention can be utilized not only for a human but for other living organisms. For example, it can be suitably used for health examinations of pets and animals bred at zoos.
本発明の生体音響抽出装置、生体音響解析装置、生体音響抽出プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器は、患者の睡眠ポリグラフ検査と共に、あるいはこれに代えていびき音を測定し、SASの診断を行う用途として好適に利用できる。
The bioacoustic extraction device, the bioacoustic analysis device, the bioacoustic extraction program, the computer-readable recording medium, and the recorded device according to the present invention measure snoring sounds together with or in place of a patient's polysomnographic examination, It can utilize suitably as a use which diagnoses.
100…生体音響抽出装置
110…生体音響解析装置
10…入力部
20…有音区間推定部
21…前処理器
22…二乗器
23…ダウンサンプリング器
24…メディアンフィルタ
30…聴覚像生成部
40…音響特徴量抽出部
50…分類部
60…判別部
70…スクリーニング部 DESCRIPTION OFSYMBOLS 100 ... Bioacoustic extraction apparatus 110 ... Bioacoustic analysis apparatus 10 ... Input part 20 ... Sound section estimation part 21 ... Preprocessor 22 ... Squarer 23 ... Downsampling device 24 ... Median filter 30 ... Auditory image generation part 40 ... Sound Feature amount extraction unit 50 ... classification unit 60 ... discrimination unit 70 ... screening unit
110…生体音響解析装置
10…入力部
20…有音区間推定部
21…前処理器
22…二乗器
23…ダウンサンプリング器
24…メディアンフィルタ
30…聴覚像生成部
40…音響特徴量抽出部
50…分類部
60…判別部
70…スクリーニング部 DESCRIPTION OF
Claims (27)
- 生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出装置であって、
生体音響データを含む元音響データを取得するための入力部と、
前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、
前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、
前記聴覚像生成部で生成された聴覚像に対して、音響特徴量を抽出する音響特徴量抽出部と、
前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、
前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部と
を備える生体音響抽出装置。 A bioacoustic extraction device for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
An input unit for acquiring original acoustic data including bioacoustic data;
From the original sound data input from the input unit, a voiced section estimation unit that estimates a voiced section;
An auditory image generation unit that generates an auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation unit;
An acoustic feature amount extraction unit that extracts an acoustic feature amount from the auditory image generated by the auditory image generation unit;
A classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type;
A bioacoustic extraction apparatus comprising: a discrimination unit that discriminates whether or not the acoustic feature quantity classified by the classification unit is bioacoustic data based on a predetermined threshold value. - 請求項1に記載の生体音響抽出装置であって、
前記聴覚像生成部が、聴覚イメージモデルを用いて安定化聴覚像を生成するよう構成しており、
前記音響特徴量抽出部が、前記聴覚像生成部で生成された安定化聴覚像に基づいて、音響特徴量を抽出してなる生体音響抽出装置。 The bioacoustic extraction device according to claim 1,
The auditory image generation unit is configured to generate a stabilized auditory image using an auditory image model,
A bioacoustic extraction apparatus in which the acoustic feature amount extraction unit extracts an acoustic feature amount based on the stabilized auditory image generated by the auditory image generation unit. - 請求項2に記載の生体音響抽出装置であって、
前記聴覚像生成部が、さらに安定化聴覚像から、総括安定化聴覚像と、聴覚スペクトルを生成するよう構成しており、
前記音響特徴量抽出部が、前記聴覚像生成部で生成された総括安定化聴覚像と、聴覚スペクトルに基づいて、音響特徴量を抽出してなる生体音響抽出装置。 The bioacoustic extraction device according to claim 2,
The auditory image generation unit is configured to generate an overall stabilized auditory image and an auditory spectrum from the stabilized auditory image,
A bioacoustic extraction apparatus in which the acoustic feature quantity extraction unit extracts an acoustic feature quantity based on the overall stabilized auditory image generated by the auditory image generation unit and an auditory spectrum. - 請求項3に記載の生体音響抽出装置であって、
前記音響特徴量抽出部が、聴覚スペクトル及び/又は総括安定化聴覚像の尖度、歪度、スペクトル重心、スペクトルバンド幅、スペクトル フラットネス、スペクトルロールオフ、スペクトルエントロピー、オクターブベースのスペクトルコントラストの少なくともいずれかを、音響特徴量として抽出してなる生体音響抽出装置。 The bioacoustic extraction device according to claim 3,
The acoustic feature quantity extraction unit includes at least kurtosis, skewness, spectrum centroid, spectrum bandwidth, spectrum flatness, spectrum roll-off, spectrum entropy, and octave-based spectrum contrast of the auditory spectrum and / or the overall stabilized auditory image. A bioacoustic extraction apparatus that extracts either as an acoustic feature. - 請求項1に記載の生体音響抽出装置であって、
前記聴覚像生成部が、聴覚イメージモデルを用いて神経活動パターンを生成するよう構成しており、
前記音響特徴量抽出部が、前記聴覚像生成部で生成された神経活動パターンに基づいて、音響特徴量を抽出してなる生体音響抽出装置。 The bioacoustic extraction device according to claim 1,
The auditory image generation unit is configured to generate a neural activity pattern using an auditory image model,
The bioacoustic extraction apparatus which the said acoustic feature-value extraction part extracts an acoustic feature-value based on the neural activity pattern produced | generated by the said auditory image production | generation part. - 生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出装置であって、
生体音響データを含む元音響データを取得するための入力部と、
前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、
前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、
前記聴覚像生成部で生成された聴覚像に対して、聴覚スペクトルを生成する聴覚スペクトル生成部と、
聴覚像に対して、総括安定化聴覚像を生成する総括安定化聴覚像を生成する総括安定化聴覚像生成部と、
前記聴覚スペクトル生成部で生成された聴覚スペクトルと、前記総括安定化聴覚像生成部で生成された総括安定化聴覚像から、音響特徴量を抽出する音響特徴量抽出部と、
前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、
前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部と
を備える生体音響抽出装置。 A bioacoustic extraction device for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
An input unit for acquiring original acoustic data including bioacoustic data;
From the original sound data input from the input unit, a voiced section estimation unit that estimates a voiced section;
An auditory image generation unit that generates an auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation unit;
An auditory spectrum generator that generates an auditory spectrum for the auditory image generated by the auditory image generator;
An overall stabilized auditory image generator for generating an overall stabilized auditory image for generating an overall stabilized auditory image for the auditory image;
An acoustic feature amount extraction unit that extracts an acoustic feature amount from the auditory spectrum generated by the auditory spectrum generation unit and the overall stabilized auditory image generated by the overall stabilization auditory image generation unit;
A classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type;
A bioacoustic extraction apparatus comprising: a discrimination unit that discriminates whether or not the acoustic feature quantity classified by the classification unit is bioacoustic data based on a predetermined threshold value. - 請求項1~6のいずれか一項に記載の生体音響抽出装置であって、
元音響データの内、周期を有する区間を抽出するよう構成してなる生体音響抽出装置。 The bioacoustic extraction device according to any one of claims 1 to 6,
The bioacoustic extraction apparatus comprised so that the area which has a period may be extracted among original sound data. - 請求項1~7のいずれか一項に記載の生体音響抽出装置であって、
前記有音区間推定部が、
元音響データを微分又は差分して前処理するための前処理器と、
前記前処理器で前処理された前処理データを二乗するための二乗器と、
前記二乗器で二乗された二乗データをダウンサンプリングするためのダウンサンプリング器と、
前記ダウンサンプリング器でダウンサンプリングされたダウンサンプリングデータから中央値を取得するためのメディアンフィルタと、
を備える生体音響抽出装置。 The bioacoustic extraction device according to any one of claims 1 to 7,
The voiced section estimation unit,
A pre-processor for differentiating or differentiating the original acoustic data and pre-processing;
A squarer for squaring the preprocessed data preprocessed by the preprocessor;
A downsampler for downsampling the squared data squared by the squarer;
A median filter for obtaining a median value from the downsampled data downsampled by the downsampler;
A bioacoustic extraction apparatus. - 請求項1~8のいずれか一項に記載の生体音響抽出装置であって、
前記入力部が、検査対象の患者と非接触に設置される非接触式マイクロフォンである生体音響抽出装置。 The bioacoustic extraction device according to any one of claims 1 to 8,
The bioacoustic extraction device in which the input unit is a non-contact microphone that is installed in a non-contact manner with a patient to be examined. - 請求項1~9のいずれか一項に記載の生体音響抽出装置であって、
前記判別部による生体音響データの判別が、非言語処理である生体音響抽出装置。 The bioacoustic extraction device according to any one of claims 1 to 9,
A bioacoustic extraction apparatus in which the discrimination of bioacoustic data by the discrimination unit is non-language processing. - 請求項1~10のいずれか一項に記載の生体音響抽出装置であって、
元音響データが、患者の睡眠時に取得される生体音響であり、
睡眠下に取得された生体音響データから、必要な生体音響データを抽出してなる生体音響抽出装置。 The bioacoustic extraction device according to any one of claims 1 to 10,
The original acoustic data is a bioacoustic acquired when the patient sleeps,
A bioacoustic extraction apparatus configured to extract necessary bioacoustic data from bioacoustic data acquired during sleep. - 請求項1~11のいずれか一項に記載の生体音響抽出装置であって、
元音響データが、患者の睡眠時に集音される睡眠関連音であり、
生体音響データが、いびき音のデータであり、
前記所定の種別が、いびき音と非いびき音の別である生体音響抽出装置。 The bioacoustic extraction device according to any one of claims 1 to 11,
The original sound data are sleep-related sounds collected during patient sleep,
The bioacoustic data is snoring sound data,
The bioacoustic extraction apparatus in which the predetermined type is different from a snoring sound and a non-snoring sound. - 生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析装置であって、
生体音響データを含む元音響データを取得するための入力部と、
前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、
前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、
前記聴覚像生成部で生成された聴覚像に対して、音響特徴量を抽出する音響特徴量抽出部と、
前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、
前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部と、
前記判別部で生体音響データと判別された真値データに対して、スクリーニングを行うスクリーニング部と、
を備える生体音響解析装置。 A bioacoustic analyzer for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data,
An input unit for acquiring original acoustic data including bioacoustic data;
From the original sound data input from the input unit, a voiced section estimation unit that estimates a voiced section;
An auditory image generation unit that generates an auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation unit;
An acoustic feature amount extraction unit that extracts an acoustic feature amount from the auditory image generated by the auditory image generation unit;
A classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type;
A discrimination unit that discriminates whether or not the biometric data is based on a predetermined threshold with respect to the acoustic feature amount classified by the classification unit;
A screening unit that performs screening on true value data determined as bioacoustic data by the determination unit;
A bioacoustic analyzer. - 生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析装置であって、
生体音響データを含む元音響データを取得するための入力部と、
前記入力部から入力された元音響データから、有音区間を推定する有音区間推定部と、
前記有音区間推定部で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成部と、
前記聴覚像生成部で生成された聴覚像に対して、聴覚スペクトルを生成する聴覚スペクトル生成部と、
聴覚像に対して、総括安定化聴覚像を生成する総括安定化聴覚像生成部と、
前記聴覚スペクトル生成部で生成された聴覚スペクトルと、前記総括安定化聴覚像生成部で生成された総括安定化聴覚像から、音響特徴量を抽出する音響特徴量抽出部と、
前記音響特徴量抽出部で抽出された音響特徴量を、所定の種別に分類する分類部と、
前記分類部で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別部と、
前記判別部で生体音響データと判別された真値データに対して、スクリーニングを行うスクリーニング部と、
を備える生体音響解析装置。 A bioacoustic analyzer for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data,
An input unit for acquiring original acoustic data including bioacoustic data;
From the original sound data input from the input unit, a voiced section estimation unit that estimates a voiced section;
An auditory image generation unit that generates an auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation unit;
An auditory spectrum generator that generates an auditory spectrum for the auditory image generated by the auditory image generator;
For the auditory image, a generalized stabilized auditory image generation unit that generates a generalized stabilized auditory image;
An acoustic feature amount extraction unit that extracts an acoustic feature amount from the auditory spectrum generated by the auditory spectrum generation unit and the overall stabilized auditory image generated by the overall stabilization auditory image generation unit;
A classification unit that classifies the acoustic feature amount extracted by the acoustic feature amount extraction unit into a predetermined type;
A discrimination unit that discriminates whether or not the biometric data is based on a predetermined threshold with respect to the acoustic feature amount classified by the classification unit;
A screening unit that performs screening on true value data determined as bioacoustic data by the determination unit;
A bioacoustic analyzer. - 請求項13又は14に記載の生体音響解析装置であって、
前記スクリーニング部は、元音響データから抽出される生体音響データに対して疾患スクリーニングを行うよう構成してなる生体音響解析装置。 The bioacoustic analyzer according to claim 13 or 14,
The said screening part is a bioacoustic analyzer comprised so that a disease screening may be performed with respect to the bioacoustic data extracted from original acoustic data. - 請求項15に記載の生体音響解析装置であって、
前記スクリーニング部は、元音響データから抽出される生体音響データに対して閉塞型睡眠時無呼吸症候群のスクリーニングを行うよう構成してなる生体音響解析装置。 The bioacoustic analyzer according to claim 15,
The screening unit is a bioacoustic analysis apparatus configured to perform screening for obstructive sleep apnea syndrome on bioacoustic data extracted from original acoustic data. - 生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出方法であって、
生体音響データを含む元音響データを取得する工程と、
前記取得された元音響データから、有音区間を推定する工程と、
前記推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する工程と、
前記生成された聴覚像に対して、音響特徴量を抽出する工程と、
前記抽出された音響特徴量を、所定の種別に分類する工程と、
前記分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する工程と
を含む生体音響抽出方法。 A bioacoustic extraction method for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
Obtaining original acoustic data including bioacoustic data;
Estimating the voiced section from the acquired original acoustic data;
Generating an auditory image according to an auditory image model based on the estimated voiced interval;
Extracting an acoustic feature from the generated auditory image;
Classifying the extracted acoustic features into a predetermined type;
And a step of determining whether or not the classified acoustic feature amount is bioacoustic data based on a predetermined threshold. - 生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出方法であって、
生体音響データを含む元音響データを取得する工程と、
前記取得された元音響データから、有音区間を推定する工程と、
前記推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する工程と、
前記安定化聴覚像から、総括安定化聴覚像を生成する工程と、
前記生成された総括安定化聴覚像から得られる所定の音響特徴量を抽出する工程と、
前記抽出された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する工程と
を含む生体音響抽出方法。 A bioacoustic extraction method for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
Obtaining original acoustic data including bioacoustic data;
Estimating the voiced section from the acquired original acoustic data;
Generating a stabilized auditory image according to an auditory image model based on the estimated voiced section;
Generating an overall stabilized auditory image from the stabilized auditory image;
Extracting a predetermined acoustic feature obtained from the generated generalized stabilized auditory image;
And a step of discriminating whether or not the extracted acoustic feature quantity is bioacoustic data based on a predetermined threshold value. - 請求項18に記載の生体音響抽出方法であって、
前記安定化聴覚像から、聴覚スペクトルを生成すると共に、
前記所定の音響特徴量を抽出する工程において、前記総括安定化聴覚像に加え、前記生成された聴覚スペクトルから得られる所定の音響特徴量を抽出する生体音響抽出方法。 The bioacoustic extraction method according to claim 18,
Generating an auditory spectrum from the stabilized auditory image;
In the step of extracting the predetermined acoustic feature amount, a bioacoustic extraction method of extracting a predetermined acoustic feature amount obtained from the generated auditory spectrum in addition to the overall stabilized auditory image. - 請求項18又は19に記載の生体音響抽出方法であって、さらに、
前記所定の音響特徴量を抽出する工程に先立ち、前記抽出された音響特徴量から、識別に寄与する音響特徴量を選択する工程を含む生体音響抽出方法。 The bioacoustic extraction method according to claim 18 or 19, further comprising:
Prior to the step of extracting the predetermined acoustic feature amount, a bioacoustic extraction method including a step of selecting an acoustic feature amount contributing to identification from the extracted acoustic feature amount. - 請求項18~20のいずれか一項に記載の生体音響抽出方法であって、
前記生体音響データか否かを判別する工程が、多項分布ロジスティック回帰分析を用いたいびき音又は非いびき音の分類である生体音響抽出方法。 The bioacoustic extraction method according to any one of claims 18 to 20,
The bioacoustic extraction method, wherein the step of determining whether or not the bioacoustic data is classification of snoring sound or non-snoring sound using a multinomial distribution logistic regression analysis. - 生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析方法であって、
生体音響データを含む元音響データを取得する工程と、
前記取得された元音響データから、有音区間を推定する工程と、
前記推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する工程と、
前記安定化聴覚像から、聴覚スペクトル及び総括安定化聴覚像を生成する工程と、
前記生成された聴覚スペクトル及び総括安定化聴覚像から得られる所定の音響特徴量を抽出する工程と、
前記抽出された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する工程と、
前記判別工程で生体音響データと判別された真値データに対して、スクリーニングを行う工程と、
を含む生体音響解析方法。 A bioacoustic analysis method for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data,
Obtaining original acoustic data including bioacoustic data;
Estimating the voiced section from the acquired original acoustic data;
Generating a stabilized auditory image according to an auditory image model based on the estimated voiced section;
Generating an auditory spectrum and an overall stabilized auditory image from the stabilized auditory image;
Extracting a predetermined acoustic feature obtained from the generated auditory spectrum and the overall stabilized auditory image;
Determining whether or not the extracted acoustic feature amount is bioacoustic data based on a predetermined threshold;
Screening for true value data determined as bioacoustic data in the determination step;
A bioacoustic analysis method including: - 請求項22に記載の生体音響解析方法であって、
前記スクリーニングを行う工程が、多項分布ロジスティック回帰分析を用いた閉塞型睡眠時無呼吸症候群又は非閉塞型睡眠時無呼吸症候群のスクリーニングである生体音響抽出方法。 The bioacoustic analysis method according to claim 22,
The bioacoustic extraction method, wherein the screening is screening for obstructive sleep apnea syndrome or non-occlusive sleep apnea syndrome using multinomial logistic regression analysis. - 生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出プログラムであって、
生体音響データを含む元音響データを取得するための入力機能と、
前記入力機能で入力された元音響データから、有音区間を推定する有音区間推定機能と、
前記有音区間推定機能で推定された有音区間に基づいて、聴覚イメージモデルに従い聴覚像を生成する聴覚像生成機能と、
前記聴覚像生成機能で生成された聴覚像に対して、音響特徴量を抽出する音響特徴量抽出機能と、
前記音響特徴量抽出機能で抽出された音響特徴量を、所定の種別に分類する分類機能と、
前記分類機能で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別機能と
をコンピュータに実現させる生体音響抽出プログラム。 A bioacoustic extraction program for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
An input function for acquiring original acoustic data including bioacoustic data;
A voiced section estimation function for estimating a voiced section from the original sound data input by the input function,
Auditory image generation function for generating an auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation function;
An acoustic feature amount extraction function for extracting an acoustic feature amount from the auditory image generated by the auditory image generation function;
A classification function for classifying the acoustic feature quantity extracted by the acoustic feature quantity extraction function into a predetermined type;
The bioacoustic extraction program which makes a computer implement | achieve the discrimination | determination function which discriminate | determines whether it is bioacoustic data based on the predetermined threshold value with respect to the acoustic feature-value classified by the said classification function. - 生体音響データを含む元音響データから、必要な生体音響データを抽出するための生体音響抽出プログラムであって、
生体音響データを含む元音響データを取得するための入力機能と、
前記入力機能で入力された元音響データから、有音区間を推定する有音区間推定機能と、
前記有音区間推定機能で推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する安定化聴覚像生成機能と、
前記安定化聴覚像から、総括安定化聴覚像を生成する機能と、
前記生成された総括安定化聴覚像に対して、所定の音響特徴量を抽出する音響特徴量抽出機能と、
前記音響特徴量抽出機能で抽出された所定の音響特徴量を、所定の種別に分類する分類機能と、
前記分類機能で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別機能と
をコンピュータに実現させる生体音響抽出プログラム。 A bioacoustic extraction program for extracting necessary bioacoustic data from original acoustic data including bioacoustic data,
An input function for acquiring original acoustic data including bioacoustic data;
A voiced section estimation function for estimating a voiced section from the original sound data input by the input function,
A stabilized auditory image generation function for generating a stabilized auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation function;
A function of generating an overall stabilized auditory image from the stabilized auditory image;
An acoustic feature amount extraction function for extracting a predetermined acoustic feature amount for the generated overall stabilized auditory image;
A classification function for classifying the predetermined acoustic feature amount extracted by the acoustic feature amount extraction function into a predetermined type;
The bioacoustic extraction program which makes a computer implement | achieve the discrimination | determination function which discriminate | determines whether it is bioacoustic data based on the predetermined threshold value with respect to the acoustic feature-value classified by the said classification function. - 生体音響データを含む元音響データから、必要な生体音響データを抽出し、解析するための生体音響解析プログラムであって、
生体音響データを含む元音響データを取得するための入力機能と、
前記入力機能で入力された元音響データから、有音区間を推定する有音区間推定機能と、
前記有音区間推定機能で推定された有音区間に基づいて、聴覚イメージモデルに従い安定化聴覚像を生成する安定化聴覚像生成機能と、
前記安定化聴覚像から、総括安定化聴覚像を生成する機能と、
前記生成された総括安定化聴覚像に対して、所定の音響特徴量を抽出する音響特徴量抽出機能と、
前記音響特徴量抽出機能で抽出された所定の音響特徴量を、所定の種別に分類する分類機能と、
前記分類機能で分類された音響特徴量に対して、所定の閾値に基づいて生体音響データか否かを判別する判別機能と、
前記判別機能で生体音響データと判別された真値データに対して、スクリーニングを行う機能と、
をコンピュータに実現させる生体音響解析プログラム。 A bioacoustic analysis program for extracting and analyzing necessary bioacoustic data from original acoustic data including bioacoustic data,
An input function for acquiring original acoustic data including bioacoustic data;
A voiced section estimation function for estimating a voiced section from the original sound data input by the input function,
A stabilized auditory image generation function for generating a stabilized auditory image according to an auditory image model based on the voiced segment estimated by the voiced segment estimation function;
A function of generating an overall stabilized auditory image from the stabilized auditory image;
An acoustic feature amount extraction function for extracting a predetermined acoustic feature amount for the generated overall stabilized auditory image;
A classification function for classifying the predetermined acoustic feature amount extracted by the acoustic feature amount extraction function into a predetermined type;
A discrimination function for discriminating whether or not bioacoustic data is based on a predetermined threshold with respect to the acoustic feature amount classified by the classification function;
A function of performing screening on true value data determined as bioacoustic data by the determination function;
A bioacoustic analysis program that enables a computer to realize - 請求項25又は26に記載されるプログラムを格納したコンピュータで読み取り可能な記録媒体又は記録した機器。 A computer-readable recording medium or a recorded device storing the program according to claim 25 or 26.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017565502A JP6908243B2 (en) | 2016-02-01 | 2017-01-25 | Bioacoustic extractor, bioacoustic analyzer, bioacoustic extraction program, computer-readable recording medium and recording equipment |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2016017572 | 2016-02-01 | ||
JP2016-017572 | 2016-02-01 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017135127A1 true WO2017135127A1 (en) | 2017-08-10 |
Family
ID=59499570
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2017/002592 WO2017135127A1 (en) | 2016-02-01 | 2017-01-25 | Bioacoustic extraction device, bioacoustic analysis device, bioacoustic extraction program, and computer-readable storage medium and stored device |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6908243B2 (en) |
WO (1) | WO2017135127A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109243470A (en) * | 2018-08-16 | 2019-01-18 | 南京农业大学 | Broiler chicken cough monitoring method based on Audiotechnica |
WO2019216320A1 (en) * | 2018-05-08 | 2019-11-14 | 国立大学法人徳島大学 | Machine learning apparatus, analysis apparatus, machine learning method, and analysis method |
CN111105812A (en) * | 2019-12-31 | 2020-05-05 | 普联国际有限公司 | Audio feature extraction method and device, training method and electronic equipment |
CN111938649A (en) * | 2019-05-16 | 2020-11-17 | 医疗财团法人徐元智先生医药基金会亚东纪念医院 | Method for predicting sleep apnea from snore by using neural network |
JP2020537147A (en) * | 2017-10-11 | 2020-12-17 | ビーピー エクスプロレーション オペレーティング カンパニー リミテッドBp Exploration Operating Company Limited | Event detection using acoustic frequency domain features |
WO2024154917A1 (en) * | 2023-01-18 | 2024-07-25 | 주식회사 베러마인드 | Method for operating sleep curation service system for user |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008545170A (en) * | 2005-06-29 | 2008-12-11 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | Apparatus, method, and computer program for analyzing audio signal |
US20120004749A1 (en) * | 2008-12-10 | 2012-01-05 | The University Of Queensland | Multi-parametric analysis of snore sounds for the community screening of sleep apnea with non-gaussianity index |
JP2014008263A (en) * | 2012-06-29 | 2014-01-20 | Univ Of Yamanashi | Shunt constriction diagnostic support system and method, array shaped sound collection sensor device, and successive segmentation self organized map forming device, method and program |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4935931B2 (en) * | 2008-10-16 | 2012-05-23 | 富士通株式会社 | Apnea detection program and apnea detection device |
JP5827108B2 (en) * | 2011-11-18 | 2015-12-02 | 株式会社アニモ | Information processing method, apparatus and program |
JP6136394B2 (en) * | 2012-08-09 | 2017-05-31 | 株式会社Jvcケンウッド | Respiratory sound analyzer, respiratory sound analysis method and respiratory sound analysis program |
JP6412458B2 (en) * | 2015-03-31 | 2018-10-24 | セコム株式会社 | Ultrasonic sensor |
-
2017
- 2017-01-25 JP JP2017565502A patent/JP6908243B2/en active Active
- 2017-01-25 WO PCT/JP2017/002592 patent/WO2017135127A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008545170A (en) * | 2005-06-29 | 2008-12-11 | フラウンホッファー−ゲゼルシャフト ツァ フェルダールング デァ アンゲヴァンテン フォアシュンク エー.ファオ | Apparatus, method, and computer program for analyzing audio signal |
US20120004749A1 (en) * | 2008-12-10 | 2012-01-05 | The University Of Queensland | Multi-parametric analysis of snore sounds for the community screening of sleep apnea with non-gaussianity index |
JP2014008263A (en) * | 2012-06-29 | 2014-01-20 | Univ Of Yamanashi | Shunt constriction diagnostic support system and method, array shaped sound collection sensor device, and successive segmentation self organized map forming device, method and program |
Non-Patent Citations (1)
Title |
---|
TSUZAKI, MINORU: "Effectiveness of Auditory Parameterization for Unit Selection in Concatenative Speech Synthesis", IEICE TECHNICAL REPORT, vol. 101, no. 232, 2001, pages 23 - 30 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020537147A (en) * | 2017-10-11 | 2020-12-17 | ビーピー エクスプロレーション オペレーティング カンパニー リミテッドBp Exploration Operating Company Limited | Event detection using acoustic frequency domain features |
JP7277059B2 (en) | 2017-10-11 | 2023-05-18 | ビーピー エクスプロレーション オペレーティング カンパニー リミテッド | Event detection using acoustic frequency domain features |
WO2019216320A1 (en) * | 2018-05-08 | 2019-11-14 | 国立大学法人徳島大学 | Machine learning apparatus, analysis apparatus, machine learning method, and analysis method |
JPWO2019216320A1 (en) * | 2018-05-08 | 2021-06-17 | 国立大学法人徳島大学 | Machine learning equipment, analysis equipment, machine learning methods and analysis methods |
JP7197922B2 (en) | 2018-05-08 | 2022-12-28 | 国立大学法人徳島大学 | Machine learning device, analysis device, machine learning method and analysis method |
CN109243470A (en) * | 2018-08-16 | 2019-01-18 | 南京农业大学 | Broiler chicken cough monitoring method based on Audiotechnica |
CN109243470B (en) * | 2018-08-16 | 2020-05-05 | 南京农业大学 | Broiler cough monitoring method based on audio technology |
CN111938649A (en) * | 2019-05-16 | 2020-11-17 | 医疗财团法人徐元智先生医药基金会亚东纪念医院 | Method for predicting sleep apnea from snore by using neural network |
JP2020185390A (en) * | 2019-05-16 | 2020-11-19 | 醫療財團法人徐元智先生醫藥基金會亞東紀念醫院 | Method for predicting sleep apnea |
CN111105812A (en) * | 2019-12-31 | 2020-05-05 | 普联国际有限公司 | Audio feature extraction method and device, training method and electronic equipment |
WO2024154917A1 (en) * | 2023-01-18 | 2024-07-25 | 주식회사 베러마인드 | Method for operating sleep curation service system for user |
Also Published As
Publication number | Publication date |
---|---|
JP6908243B2 (en) | 2021-07-21 |
JPWO2017135127A1 (en) | 2019-01-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6908243B2 (en) | Bioacoustic extractor, bioacoustic analyzer, bioacoustic extraction program, computer-readable recording medium and recording equipment | |
Mekyska et al. | Robust and complex approach of pathological speech signal analysis | |
Amrulloh et al. | Automatic cough segmentation from non-contact sound recordings in pediatric wards | |
US10007480B2 (en) | Multi-parametric analysis of snore sounds for the community screening of sleep apnea with non-Gaussianity index | |
Abeyratne et al. | Pitch jump probability measures for the analysis of snoring sounds in apnea | |
Xie et al. | Audio-based snore detection using deep neural networks | |
Muhammad et al. | Multidirectional regression (MDR)-based features for automatic voice disorder detection | |
US20200093423A1 (en) | Estimation of sleep quality parameters from whole night audio analysis | |
Lin et al. | Automatic wheezing detection using speech recognition technique | |
Karunajeewa et al. | Silence–breathing–snore classification from snore-related sounds | |
US20220007964A1 (en) | Apparatus and method for detection of breathing abnormalities | |
Jiang et al. | Automatic snoring sounds detection from sleep sounds based on deep learning | |
Lei et al. | Content-based classification of breath sound with enhanced features | |
JP7197922B2 (en) | Machine learning device, analysis device, machine learning method and analysis method | |
Nonaka et al. | Automatic snore sound extraction from sleep sound recordings via auditory image modeling | |
El Emary et al. | Towards developing a voice pathologies detection system | |
Kang et al. | Snoring and apnea detection based on hybrid neural networks | |
Fezari et al. | Acoustic analysis for detection of voice disorders using adaptive features and classifiers | |
Fonseca et al. | Discrete wavelet transform and support vector machine applied to pathological voice signals identification | |
González-Martínez et al. | Improving snore detection under limited dataset through harmonic/percussive source separation and convolutional neural networks | |
Porieva et al. | Investigation of lung sounds features for detection of bronchitis and COPD using machine learning methods | |
Saudi et al. | Computer aided recognition of vocal folds disorders by means of RASTA-PLP | |
Corcoran et al. | Glottal Flow Analysis in Parkinsonian Speech. | |
Sengupta et al. | Optimization of cepstral features for robust lung sound classification | |
Dafna et al. | Automatic detection of snoring events using Gaussian mixture models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17747286 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2017565502 Country of ref document: JP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17747286 Country of ref document: EP Kind code of ref document: A1 |