CN114209302A - Cough detection method based on data uncertainty learning - Google Patents
Cough detection method based on data uncertainty learning Download PDFInfo
- Publication number
- CN114209302A CN114209302A CN202111492741.2A CN202111492741A CN114209302A CN 114209302 A CN114209302 A CN 114209302A CN 202111492741 A CN202111492741 A CN 202111492741A CN 114209302 A CN114209302 A CN 114209302A
- Authority
- CN
- China
- Prior art keywords
- layer
- cough
- output
- prediction
- noise
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010011224 Cough Diseases 0.000 title claims abstract description 96
- 238000001514 detection method Methods 0.000 title claims abstract description 42
- 230000006870 function Effects 0.000 claims abstract description 60
- 238000012549 training Methods 0.000 claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 37
- 238000012360 testing method Methods 0.000 claims abstract description 32
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000011478 gradient descent method Methods 0.000 claims abstract 2
- 239000013598 vector Substances 0.000 claims description 57
- 230000004913 activation Effects 0.000 claims description 34
- 238000010606 normalization Methods 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 12
- 238000009826 distribution Methods 0.000 claims description 11
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 10
- 238000009432 framing Methods 0.000 claims description 8
- 238000011176 pooling Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 2
- 210000002569 neuron Anatomy 0.000 claims description 2
- 230000001172 regenerating effect Effects 0.000 claims description 2
- 238000004088 simulation Methods 0.000 abstract description 9
- 238000013135 deep learning Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 4
- 238000010801 machine learning Methods 0.000 description 3
- 210000002345 respiratory system Anatomy 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 238000003745 diagnosis Methods 0.000 description 2
- 201000010099 disease Diseases 0.000 description 2
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 2
- 230000036541 health Effects 0.000 description 2
- 238000011158 quantitative evaluation Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000011514 reflex Effects 0.000 description 2
- 230000028327 secretion Effects 0.000 description 2
- 238000002560 therapeutic procedure Methods 0.000 description 2
- 206010035664 Pneumonia Diseases 0.000 description 1
- 208000001305 Respiratory System Abnormalities Diseases 0.000 description 1
- 241001122767 Theaceae Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 208000017574 dry cough Diseases 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 239000007789 gas Substances 0.000 description 1
- 239000002085 irritant Substances 0.000 description 1
- 231100000021 irritant Toxicity 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 210000001767 medulla oblongata Anatomy 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000003097 mucus Anatomy 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 244000052769 pathogen Species 0.000 description 1
- 230000001575 pathological effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000001681 protective effect Effects 0.000 description 1
- 230000005180 public health Effects 0.000 description 1
- 208000008128 pulmonary tuberculosis Diseases 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000001034 respiratory center Anatomy 0.000 description 1
- 230000000241 respiratory effect Effects 0.000 description 1
- 210000001533 respiratory mucosa Anatomy 0.000 description 1
- 208000023504 respiratory system disease Diseases 0.000 description 1
- 230000009528 severe injury Effects 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Detecting, measuring or recording devices for evaluating the respiratory organs
- A61B5/0823—Detecting or evaluating cough events
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/48—Other medical applications
- A61B5/4803—Speech analysis specially adapted for diagnostic purposes
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7203—Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/725—Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7253—Details of waveform analysis characterised by using transforms
- A61B5/7257—Details of waveform analysis characterised by using transforms using Fourier transforms
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
- A61B5/7264—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
- A61B5/7267—Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Public Health (AREA)
- Surgery (AREA)
- Veterinary Medicine (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Physiology (AREA)
- Signal Processing (AREA)
- Psychiatry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Mathematical Physics (AREA)
- Pulmonology (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Fuzzy Systems (AREA)
- Complex Calculations (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a cough detection method based on uncertain data learning. The problem that cough detection accuracy is low in the real environment in the prior art is mainly solved. The implementation scheme is as follows: selecting voice data from different public data sets, preprocessing the voice data, and dividing the voice data into a training set and a test set; constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; setting an objective function of the detector network; setting a learning rate and a maximum iteration number, and updating the target function by adopting a random gradient descent method through a training set to obtain a trained detector network; and inputting the test data set into the trained detector network to obtain a cough detection result. The method can obtain high accuracy in a noise-free environment, has better performance in a real noise simulation condition, and can be used for intelligently detecting cough sounds and collecting cough samples.
Description
Technical Field
The invention belongs to the technical field of voice signal processing, and further relates to a cough detection method which can be used for intelligently detecting cough sounds and collecting cough samples.
Background
Cough is a reaction mechanism of a human body to respiratory system abnormality, is used for removing pathogens, mucus or foreign matters, and when receptors in respiratory mucosa are stimulated by foreign matters, irritant gases, respiratory secretions and the like entering respiratory tracts and are transmitted to medulla oblongata respiratory center through afferent fibers, a cough reflex is caused, is a protective reflex, can help to remove hidden secretions and harmful matters in the respiratory tracts, and is beneficial to the human body under normal conditions. However, when frequent, severe and persistent coughing occurs, it becomes a pathological condition, and the frequency, intensity and time of occurrence of the cough can provide important information for doctors to diagnose clinical patients. Cough detection is used as the primary part of cough data collection, and the quantitative evaluation of cough frequency or intensity and the qualitative evaluation of dry cough or wet cough types are obtained by a cough detection technology, so that doctors are helped to make more accurate judgment on respiratory tract and lung related focuses. In addition, early signs of disease can be pre-diagnosed by cough detection analysis and prescribed therapy when basic therapy is effective, thereby reducing the human and financial costs of health services. Cough detection recognition can also provide health authorities with timely monitoring information about the occurrence of high-burden respiratory diseases, support early outbreak recognition in specific geographical areas, and better make public health decisions.
In conclusion, cough detection has important significance for preventing, evaluating and controlling epidemic diseases such as pulmonary tuberculosis, new coronary pneumonia and the like. In recent years, with the increase of computer hardware platforms and the increase of data volume, machine learning has been developed vigorously, and machine learning learns target conversion functions and features from a large amount of data sets and predicts new data. The deep learning theory as a part of machine learning is widely used for various task processing due to the superior learning ability. Due to the rapid development of the internet and big data, a large amount of data set bases are provided for deep learning, the deep learning learns richer mappings of the data set through a feature extraction network and a nonlinear layer, and unknown data can be well predicted by utilizing the data set mappings. The cough detection method based on deep learning also becomes a popular research direction for cough detection, and the core of many existing cough detection algorithms is the classifier design based on deep learning.
PRAD KADAMBI et al propose a NEURAL network-BASED COUGH detection scheme in the paper TOWARDS AWEARABLE COUGH DETECTOR BASED ON NEURAL NETWORKS, which adopts a deep NEURAL network structure, a training and testing data set is manually selected from the recordings of 9 patients, and an end-to-end training is adopted, so that the accuracy of 0.923 is achieved ON the testing set.
ALI IMRAN et al, in paper AI4 cove-19, AI Enabled present Diagnosis for cove-19 from cough Samples tea an App, proposed a coughing classifier based on convolutional neural network, which adopts a structure of convolutional layer and full-connection cascade, adopts end-to-end training, and classifies whether audio contains coughing under a data set self-built by the author, achieving accuracy of 0.9791.
Although these cough detection methods have high accuracy, they have the following disadvantages because they are trained and tested under the same data set and do not include strong noise background:
firstly, when the test set and the training set are not in the same data set, the detection accuracy is low;
secondly, when the noise background with complex types and high intensity exists in the test set, the accuracy rate shows severe damage;
thirdly, when the detection is carried out in a real environment, a good detection effect is difficult to achieve due to the limitation of training data.
Disclosure of Invention
The invention aims to provide a cough detection method based on data uncertainty learning aiming at the defects of the cough detection method, so as to improve the cough detection accuracy rate under the simulated real noisy environment under different data sets containing noise backgrounds.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) constructing a cough detection data set, respectively selecting 15000 cough voice data and 15000 non-cough voice data from different public data sets, preprocessing the data sets, dividing the data sets into a training set and a test set according to a ratio of 9:1, wherein the data sets contain a label of whether the cough is contained;
(2) constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; the feature prediction module, the mean value and variance prediction module and the full-connection module form a classifier, wherein the mean value and variance prediction module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full-connection layer, an I activation function layer, an II full-connection layer, an II activation function layer, an II normalization layer, a prediction output layer and an uncertain vector generator, the uncertain vector generator takes the prediction mean value and the prediction variance output by the prediction output layer as parameters, and adds uncertain components to feature vectors to generate uncertain feature vectors, so that the features of an input network have randomness and uncertainty, the stability of the network for classifying noisy data is enhanced, and the detection accuracy of the network for real noisy data is improved;
(3) setting an objective function L of a detector networkG:
LG=Lcross+λLkl,
Wherein: lambda [ alpha ]<1,LcrossIn order to be a function of the cross-entropy loss,μi、σithe predicted mean and variance are output by the mean and variance prediction module;
(4) training the detector network:
4a) setting a learning rate L and a maximum iterative training time T;
4b) inputting the training data set into a detector network, and obtaining a noisy data set through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;
4c) substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function LcrossObtaining cross entropy result, and predicting mean value mu output by mean value and variance prediction module in detector networkiSum variance σiSubstituted into LklA function is used for obtaining a loss value after one training, iterating the network by adopting a random gradient reduction method according to the change of the loss value obtained by each training, updating the network parameters until the set network training times T are reached, and finishing the training of the detector network;
(5) inputting the test set into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram by the noise-containing test set through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough.
Compared with the prior art, the invention has the following advantages:
firstly, the invention constructs a detector network, the Mel-map of the voice data generated by the Mel-map generating module is used as the input of the classifier, because the Mel-map contains the characteristics of the voice data in the frequency domain and the time domain, and the output spectrogram converts the HZ frequency into the Mel frequency, the output frequency can be changed from linear to non-linear which is easier for network perception, compared with the prior art which only extracts the characteristics of the time domain or the frequency domain to obtain the characteristic vector, the invention considers the characteristic information of the voice data in the time domain and the frequency domain, and leads the characteristics which participate in classification to be more comprehensive.
Secondly, the classifier in the detector network is provided with a mean value and variance prediction module, so that the classification capability of the classifier is globally and locally enhanced by a classification method based on uncertain data learning and through parameters in the network self-adaptive learning classifier, and the problem that the traditional technical method needs to manually adjust model parameters is solved. And an uncertain learning method is introduced to generate a characteristic vector with uncertainty, so that the characteristic vector is used as the input of a full-connection module to achieve the effect of uncertain learning, the characteristic vector of the input network has randomness and uncertainty, and the robustness and generalization capability of the network are enhanced.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a Mel map generation module of the present invention;
FIG. 3 is a sub-flow diagram of the present invention for constructing a classifier structure block diagram in a detector network.
Detailed Description
Embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the specific implementation steps of this example include the following:
step 1, a data set is obtained.
1.1) selecting 15000 cough data and 15000 non-cough data with the sampling frequency of the voice data being more than 16000hz and the voice time length not shorter than 3s from ESC-50, COUGHVID, AUDIO and a public Chinese voice data set;
1.2) preprocessing the selected data:
first, voice data is resampled, the sampling rate is set to 16000hz,
then, carrying out normalization processing on the cough data, and mapping the cough data into a range from-1 to 1;
then, the cough data is intercepted into a voice section with the length of 0.5s-1s, and the blank voice is used for filling to expand the cough data into voice with the length of 1 s; non-cough data is directly intercepted as 1s long speech,
1.3) dividing the preprocessed 30000 voice data into a training set and a test set according to a ratio of 9:1, namely, selecting 13500 voice data from 15000 cough data and 15000 non-cough data randomly as the training set and the rest 3000 voice data as the test set.
And 2, constructing a detector network.
2.1) establishing a noise generation module:
the noise generation module is a voice signal adder and is used for adding noise to voice data so as to simulate the voice data under a real condition, the input of the noise generation module is a voice signal, a noise type and a signal-to-noise ratio, wherein the noise type is white noise or common background noise, Gaussian white noise is added to the voice signal when the white noise is selected, and background noise is added to the voice signal when the common background noise is selected; the noise signal to noise ratio parameter is used to determine the strength of the added noise.
2.2) establishing a Mel spectrogram generating module:
the module is formed by sequentially cascading a framing processor, a windowing device, an FFT generator, a time domain stacker and a Mel filter bank, wherein:
a framing processor for dividing the input noisy speech signal in the time domain at a fixed time interval to generate a speech segment in units of frames;
the windowing device is used for adding a rectangular window to each frame of voice section after framing so as to facilitate the subsequent FFT;
the FFT generator is used for performing fast Fourier transform on each frame of voice signals after windowing to obtain one-dimensional frequency domain signals;
the time domain stacker is used for stacking the frequency domain signals of all the frames on a time domain to obtain a spectrogram;
and the Mel filter bank is used for converting the sound spectrogram into a Mel spectrogram and outputting the Mel spectrogram.
Referring to fig. 2, the operation flow of the mel-map generation module is as follows:
the voice data is subjected to framing processing through a framing processor to obtain a multi-frame voice section with a fixed frame length;
the multiframe voice segment passes through a windowing device to obtain a multiframe voice segment which is truncated by a rectangular window;
fourier transform is carried out on each frame of voice section through an FFT generator, and a one-dimensional frequency domain signal of each frame of voice section is obtained;
stacking the one-dimensional frequency signals of each frame along a time domain through a time domain stacker to obtain a spectrogram;
the sound spectrogram passes through a Mel scale filter bank to obtain a Mel spectrogram.
2.3) establishing a classifier:
referring to fig. 3, the specific implementation of this step is as follows:
2.3.1) building a feature prediction Module
The feature prediction module structure is as follows:
input layer → 1 st max pooling layer → 1 st convolution layer → 1 st activation function layer → 2 nd convolution layer → 2 nd activation function layer → 2 nd max pooling layer → 1 st residual block layer → 2 nd residual block layer → 3 rd convolution layer → 3 rd activation function → 4 th convolution layer → 4 th activation function → 3 rd max pooling layer → output layer;
the parameters of each layer are as follows:
the input layer inputs the Mel spectrogram generated by the Mel spectrogram generating module;
the number of input channels of the 1 st convolution layer is 1, and the number of output channels is 16;
the number of input channels of the 2 nd convolution layer is 16, and the number of output channels is 64;
the input channel number of the 3 rd convolution layer is 64, and the output channel number is 16;
the number of input channels of the 4 th convolution layer is 16, and the number of output channels is 16;
the convolution kernel size of all convolution layers is set to be 5 multiplied by 5, the convolution step length is set to be 1, and the filling is set to be 2;
relu is used for activation functions of all activation function layers;
the convolution kernels for all the largest pooling layers are 2 x 2, and the step size is 2.
And the output channel number of each residual block is 64.
Each residual block is formed by a first convolution layer, a first activation function layer, a second convolution layer, a second activation function layer, an adder and a cascade in sequence, wherein:
the number of input channels of the first convolution layer and the second convolution layer is 64, and the number of output channels of the first convolution layer and the second convolution layer is 64;
the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1;
the activation functions of the first activation function layer and the second activation function layer use Relu;
the inputs to the adder are the second active function layer output and the input to the residual block.
2.3.2) establishing a feature prediction module establishing a mean and variance prediction module:
the module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full connection layer, an I activation function layer, a II full connection layer, a II activation function layer, a II normalization layer, a prediction output layer and an uncertain vector generator, wherein the parameters of each layer are as follows:
inputting a feature diagram with the number of channels being 16 output by a feature prediction part of an input layer;
the first normalization layer uses a BN normalization function, and the number of channels is 16;
the second normalization layer uses a BN normalization function, and the number of channels is 128;
the number of input channels of the I full connection layer is 3840, and the number of output channels of the I full connection layer is 512;
the input channel number of the II full connection layer is 512, and the output channel number is 128;
the Dropout layer is used to randomly discard 15% of neurons;
the Flatten layer is used for flattening the two-dimensional characteristic diagram to a one-dimensional vector;
the activation functions of the I and II activation function layers both adopt Relu;
the prediction output layer outputs two characteristic vectors with the length of 128, namely a prediction mean value and a prediction variance;
the inputs to the uncertainty vector generator are the predicted mean and the predicted variance to generate different uncertainty feature vectors during the training process and the testing process, which are implemented as follows:
in the training process of the uncertain vector generator, the prediction mean value and the prediction variance output by the prediction output layer are used as parameters of normal distribution, and the uncertain characteristics with the length of 128 are randomly generatedVector, assuming mean result of prediction as μiVariance prediction variance result is sigmaiInput is xiThe uncertainty feature vector z generatediThe following relationships exist:
p(zi|xi)=N(zi;μi,σ2I)
in the formula, the average value prediction part outputs a result muiCan be regarded as a prediction of the speech data characteristics, and the variance prediction part outputs a result sigmaiCan be regarded as the prediction of muiI is an identity matrix of length 128, where each output sample is no longer a definite value but a normal distribution N (z)i;μi,σ2I) Random sampling of (2); since the gradient cannot be solved by random sampling, the backward propagation of the gradient during training is prevented, and the network iteration is blocked during training, a new parameterization is needed to be adopted to ensure that the network can still apply the gradient iteration, namely, the normal distribution is firstly used for carrying out the sigma iterationiSampling and regenerating siAs ziEquivalent of (d): si=μi+εσiε to N (0,1), siAs ziThe equivalent expression of (a) is the uncertainty feature vector output by this module in the training process;
in the test process, the uncertain vector generator sends the feature vector with the length of 128 into the uncertain vector generator, and the uncertain vector generator directly outputs the prediction mean value.
2.3.3) establishing a fully connected Module
The structure of the full-connection module is as follows: input layer → fully connected layer → output layer, whose output is a vector of length 2;
the parameters of each layer are as follows:
inputting a feature vector with the length of 128 generated by an uncertain vector generator into an input layer;
the number of input channels of the fully connected layer is 128 and the number of output channels is 2.
The input of the full-connection module is a feature vector with the length of 128 generated by the uncertain vector generator, and a probability vector [ a, b ] with the length of 2 is output, wherein a represents the probability of being cough, and b represents the probability of not being cough.
2.3.4) sequentially cascading a feature prediction module, a mean value and variance prediction module and a full-connection module to obtain a classifier;
and 2.4) sequentially cascading the established noise generation module, the built Mel map generation module and the built classifier to form a detector network.
Step 3, setting an objective function L of the detector networkG
3.1) setting a cross entropy loss function L between the output result of the detector network and the data set labelcross:
Lcross=-1/2[y log a+(1-y)log b]
Wherein y is the label of the voice data in the data set, when y is 1, the voice data contains cough, and when y is 0, the voice data does not contain cough; a is the cough probability output by the detector network, and b is the non-cough probability output by the detector network;
3.2) defining a divergence function L of a normal distribution and a standard normal distribution which are formed by taking a prediction mean and a prediction variance as parametersklI.e. in μiAnd σiNormal distribution N (mu) formed for parametersi,σi) According to the formula of N (mu)i,σi) Obtaining divergence function L with standard normal distribution N (0,1)kl:
In the formula, KL is used for solving KL divergence of two probability distributions after parentheses;
3.3) mixing LcrossAnd LklAdding to obtain an optimization function L of the classifierG:
LG=Lcross+λ·Lkl
Wherein λ is a weighting parameter, λ < 1.
And 5, training the detector network.
Setting the learning rate L to be 0.0001 and the maximum iterative training time T to be 200;
27000 sections of voice data in a training data set are input to a detector network, and a noisy data set is obtained through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;
substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function LcrossObtaining cross entropy result, and calculating the predicted mean value mu of the detector network outputiSum variance σiSubstituting divergence function LklObtaining a loss value after one training, iterating the network by adopting a random gradient reduction method according to the change of the loss value obtained by each training, updating network parameters until a set network training frequency T is reached, and finishing the training of the detector network;
and 6, testing the test set to obtain a cough detection result.
Inputting a test set consisting of 3000 voice data into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough, so that the detection of cough is completed.
The effect of the present invention will be further explained with the simulation experiment.
1. Simulation conditions are as follows:
the hardware environment of the simulation experiment is: a GPU of NVIDIA GTX 1080Ti model and a running memory of 128 GB;
the software environment of the simulation experiment is as follows: the deep learning framework pytorch1.8.0.
In the simulation experiment, the accuracy of the classifier is adopted as the objective quantitative evaluation index, assuming that the total number of the test set voice segments participating in the experiment is d, and the number of accurate classification after passing through the classifier is f, the accuracy can be expressed as:
P=f/d。
2. simulation content and result analysis
To verify the effectiveness of the present invention and to introduce the superiority of uncertain learning, comparisons were made using the method of the present invention and the AI4 method, respectively.
The AI4 method is a Cough detection method proposed by LI IMRAN et al in the paper AI4COVID-19 AI Enabled Preliminary Diagnosis for COVID-19from Covid Samples via an App, and the classifier thereof consists of a convolutional layer and a full link layer and also adopts an end-to-end training mode. In the simulation experiment, the classifier in the method is replaced by the classifier provided by AI4, and the same training set and test set are adopted to carry out the comparison experiment.
The experimental procedure was as follows:
firstly, parameters of a noise generation module in two methods are adjusted simultaneously to carry out a plurality of comparison experiments, and Gaussian noise and background noise are respectively selected as noise types; selecting SNR values of 10, 8 and 5 respectively;
then, according to the steps in the specific implementation mode, performing experimental simulation to obtain the detection result of the test set, comparing the detection result with the label of the test set to obtain whether the detection is accurate, counting the number of the accurate detections and the number of the errors in the detection to obtain the detection accuracy P, wherein the results are shown in table 1:
TABLE 1 cough detection accuracy results
As can be seen from Table 1, the invention can obtain better detection results under different noise conditions and different noise intensities, which shows that the invention has good detection results under the condition of real environmental noise.
The detection accuracy of the method is higher than that of the AI4 method under different noise types and different noise intensities, and the method is proved to be capable of obtaining better detection effect by introducing an uncertain learning method.
Claims (10)
1. A method for cough detection based on data uncertain learning, comprising:
(1) constructing a cough detection data set, respectively selecting 15000 cough voice data and 15000 non-cough voice data from different public data sets, preprocessing the data sets, dividing the data sets into a training set and a test set according to a ratio of 9:1, wherein the data sets contain a label of whether the cough is contained;
(2) constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; the feature prediction module, the mean value and variance prediction module and the full-connection module form a classifier, wherein the mean value and variance prediction module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full-connection layer, an I activation function layer, an II full-connection layer, an II activation function layer, an II normalization layer, a prediction output layer and an uncertain vector generator, the uncertain vector generator takes the prediction mean value and the prediction variance output by the prediction output layer as parameters, and adds uncertain components to feature vectors to generate uncertain feature vectors, so that the features of an input network have randomness and uncertainty, the stability of the network for classifying noisy data is enhanced, and the detection accuracy of the network for real noisy data is improved;
(3) setting an objective function L of a detector networkG:
LG=Lcross+λLkl,
Wherein: lambda [ alpha ]<1,LcrossIn order to be a function of the cross-entropy loss,μi、σithe predicted mean and variance are output by the mean and variance prediction module;
(4) training the detector network:
4a) setting a learning rate L and a maximum iterative training time T;
4b) inputting the training data set into a detector network, and obtaining a noisy data set through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;
4c) substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function LcrossObtaining cross entropy result, and predicting mean value mu output by mean value and variance prediction module in detector networkiSum variance σiSubstituted into LklObtaining a loss value after one training by using a divergence function, iterating the network by adopting a random gradient descent method according to the change of the loss value obtained by each training, updating network parameters until a set network training frequency T is reached, and finishing the training of the detector network;
(5) inputting the test set into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram by the noise-containing test set through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough.
2. The method of claim 1, wherein the cough detection data set is preprocessed in (1) by:
2a) setting the frequency of all downloaded cough voice data and non-cough voice data to 16000hz, carrying out normalization processing on the voice data, and mapping the voice data into a range from-1 to 1;
2b) and processing the cough voice data and the non-cough voice data after line normalization in different modes:
processing of cough voice data: firstly, intercepting the cough voice data into audio with the length of 0.5s-1 s; then expanding the intercepted voice data, namely filling the intercepted voice data with blank voice, and expanding the audio frequency into 1s long audio frequency;
processing non-cough voice data: the non-cough data is intercepted as 1s long audio.
3. The method according to claim 1, wherein the noise generation module in (2) is a voice signal adder, the input is voice signal, noise type and signal-to-noise ratio, and the output is a noise-added voice signal; when the noise type is white noise, adding Gaussian white noise to the voice signal; when the noise type is selected as background noise, the noise type is used for adding the background noise which is common in life to the voice signal, and the magnitude of the signal-to-noise ratio is related to the intensity of the added noise.
4. The method according to claim 1, wherein the Mel pattern generation module in (2) is structured as follows:
input → framing processor → windower → FFT generator → time domain stacker → Mel filter bank → output;
the framing processor is used for dividing the input voice signal after noise addition at a fixed time interval on a time domain to generate a voice section taking a frame as a unit;
the windowing device is used for adding a rectangular window to each frame of voice section after framing so as to facilitate the subsequent FFT;
the FFT generator is used for performing fast Fourier transform on each frame of voice signals after windowing to obtain one-dimensional frequency domain signals;
the time domain stacker is used for stacking the frequency domain signals of all frames on a time domain to obtain a spectrogram;
the Mel filter bank is used for converting the sound spectrogram into Mel spectrogram and outputting the Mel spectrogram.
5. The method of claim 1, wherein the feature prediction module structure and parameters in (2) are as follows:
the structure is as follows: input layer → 1 st max pooling layer → 1 st convolution layer → 1 st activation function layer → 2 nd convolution layer → 2 nd activation function layer → 2 nd max pooling layer → 1 st residual block layer → 2 nd residual block layer → 3 rd convolution layer → 3 rd activation function → 4 th convolution layer → 4 th activation function → 3 rd max pooling layer → output layer;
parameters of each layer are as follows:
the input layer inputs the Mel spectrogram generated by the Mel spectrogram generating module;
the number of input channels of the 1 st convolution layer is 1, and the number of output channels is 16;
the number of input channels of the 2 nd convolution layer is 16, and the number of output channels is 64;
the input channel number of the 3 rd convolution layer is 64, and the output channel number is 16;
the number of input channels of the 4 th convolution layer is 16, and the number of output channels is 16;
the convolution kernel size of all convolution layers is set to be 5 multiplied by 5, the convolution step length is set to be 1, and the filling is set to be 2;
relu is used for activation functions of all activation function layers;
the convolution kernels for all the largest pooling layers are 2 x 2, and the step size is 2.
And the output channel number of each residual block is 64.
6. The method of claim 5, wherein each residual block in the feature prediction module is composed of a first convolutional layer, a first activation function layer, a second convolutional layer, a second activation function layer, an adder, and sequentially cascaded, and the parameters of each layer are as follows:
the number of input channels of the first convolution layer and the second convolution layer is 64, and the number of output channels of the first convolution layer and the second convolution layer is 64;
the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1;
the activation functions of the first activation function layer and the second activation function layer use Relu;
the inputs to the adder are the second active function layer output and the input to the residual block.
7. The method of claim 1, wherein the mean and variance prediction module, structure and parameters in (2) are as follows:
the structure is as follows: input layer → first normalization layer → Dropout layer → scatter layer → first fully-connected layer → first activation function layer → second fully-connected layer → second activation function layer → second normalization layer → prediction output layer → uncertain vector generator;
the parameters of each layer are as follows:
inputting a feature diagram with the number of channels being 16 output by a feature prediction part of an input layer;
the first normalization layer uses a BN normalization function, and the number of channels is 16;
the second normalization layer uses a BN normalization function, and the number of channels is 128;
the number of input channels of the I full connection layer is 3840, and the number of output channels of the I full connection layer is 512;
the input channel number of the II full connection layer is 512, and the output channel number is 128;
the Dropout layer is used to randomly discard 15% of neurons;
the Flatten layer is used for flattening the two-dimensional characteristic diagram to a one-dimensional vector;
the activation functions of the I and II activation function layers both adopt Relu;
the prediction output layer outputs two characteristic vectors with the length of 128, namely a prediction mean value and a prediction variance;
the input of the uncertainty vector generator is a prediction mean and a prediction variance, which are used for generating different uncertainty feature vectors in a training process and a testing process.
8. The method of claim 7, wherein the uncertainty vector generator generates different uncertainty feature vectors during the training process and the testing process, by:
in the training process, the prediction mean value and the prediction variance output by the prediction output layer are used as parameters of normal distribution, uncertainty characteristic vectors with the length of 128 are randomly generated, and the prediction mean value result is assumed to be muiVariance prediction variance result is sigmaiInput is xiThe uncertainty feature vector z generatediThe following relationships exist:
p(zi|xi)=N(zi;μi,σ2I)
wherein the mean value predicting part outputs a result muiCan be regarded as pair voicePrediction of data characteristics, and the result sigma output by variance prediction partiCan be regarded as the prediction of muiI is an identity matrix of length 128, where each output sample is no longer a definite value but a normal distribution N (z)i;μi,σ2I) Random sampling of (2); because the infinitesimal random sampling can prevent the back propagation of the gradient during training, the network iteration is blocked in the training process, and the network can still apply the gradient iteration by adopting reparameterization, namely firstly, the normal distribution is used for carrying out the sigma pairiSampling and regenerating siAs ziEquivalent of (d): si=μi+εσiε~N(0,1)
S isiAs ziThe equivalent representation of (a) is the output of this module in the training process;
during the test, the feature vector with the length of 128 enters an uncertain vector generator, and the uncertain vector generator directly outputs the prediction mean.
9. The method of claim 1, wherein the fully connected modules, structures and parameters are as follows:
the structure is as follows: input layer → fully connected layer → output layer, whose output is a vector of length 2;
the parameters of each layer are as follows:
inputting a feature vector with the length of 128 generated by an uncertain vector generator into an input layer;
the number of input channels of the fully connected layer is 128 and the number of output channels is 2.
The input of the full-connection module is a feature vector with the length of 128 generated by the uncertain vector generator, and a probability vector [ a, b ] with the length of 2 is output, wherein a represents the probability of being cough, and b represents the probability of not being cough.
10. The method of claim 1, wherein the objective function LGCross entropy loss function L in (1)crossIs represented as follows:
Lcross=-1/2[y log a+(1-y)log b]
wherein y is the label of the voice data, when y is 1, the voice data contains the cough, and when y is 0, the voice data does not contain the cough; a is the probability of coughing output by the fully connected module, and b is the probability of non-coughing output by the fully connected module.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111492741.2A CN114209302B (en) | 2021-12-08 | 2021-12-08 | Cough detection method based on data uncertainty learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111492741.2A CN114209302B (en) | 2021-12-08 | 2021-12-08 | Cough detection method based on data uncertainty learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114209302A true CN114209302A (en) | 2022-03-22 |
CN114209302B CN114209302B (en) | 2024-03-26 |
Family
ID=80700590
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111492741.2A Active CN114209302B (en) | 2021-12-08 | 2021-12-08 | Cough detection method based on data uncertainty learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114209302B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117357754A (en) * | 2023-11-15 | 2024-01-09 | 江苏麦麦医疗科技有限公司 | Intelligent household oxygenerator and control system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120278022A1 (en) * | 2011-04-29 | 2012-11-01 | Pulsar Informatics, Inc. | Systems and methods for latency and measurement uncertainty management in stimulus-response tests |
WO2013036718A1 (en) * | 2011-09-08 | 2013-03-14 | Isis Innovation Ltd. | Determining acceptability of physiological signals |
CN109493874A (en) * | 2018-11-23 | 2019-03-19 | 东北农业大学 | A kind of live pig cough sound recognition methods based on convolutional neural networks |
US20190107888A1 (en) * | 2017-10-06 | 2019-04-11 | Holland Bloorview Kids Rehabilitation Hospital | Brain-computer interface platform and process for classification of covert speech |
CN110383375A (en) * | 2017-02-01 | 2019-10-25 | 瑞爱普健康有限公司 | Method and apparatus for the cough in detection noise background environment |
US20200060604A1 (en) * | 2017-02-24 | 2020-02-27 | Holland Bloorview Kids Rehabilitation Hospital | Systems and methods of automatic cough identification |
AU2020102516A4 (en) * | 2020-09-30 | 2020-11-19 | Du, Jiahui Mr | Health status monitoring system based on speech analysis |
CN112472065A (en) * | 2020-11-18 | 2021-03-12 | 天机医用机器人技术(清远)有限公司 | Disease detection method based on cough sound recognition and related equipment thereof |
KR20210134195A (en) * | 2020-04-30 | 2021-11-09 | 서울대학교산학협력단 | Method and apparatus for voice recognition using statistical uncertainty modeling |
-
2021
- 2021-12-08 CN CN202111492741.2A patent/CN114209302B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120278022A1 (en) * | 2011-04-29 | 2012-11-01 | Pulsar Informatics, Inc. | Systems and methods for latency and measurement uncertainty management in stimulus-response tests |
WO2013036718A1 (en) * | 2011-09-08 | 2013-03-14 | Isis Innovation Ltd. | Determining acceptability of physiological signals |
CN110383375A (en) * | 2017-02-01 | 2019-10-25 | 瑞爱普健康有限公司 | Method and apparatus for the cough in detection noise background environment |
US20200060604A1 (en) * | 2017-02-24 | 2020-02-27 | Holland Bloorview Kids Rehabilitation Hospital | Systems and methods of automatic cough identification |
US20190107888A1 (en) * | 2017-10-06 | 2019-04-11 | Holland Bloorview Kids Rehabilitation Hospital | Brain-computer interface platform and process for classification of covert speech |
CN109493874A (en) * | 2018-11-23 | 2019-03-19 | 东北农业大学 | A kind of live pig cough sound recognition methods based on convolutional neural networks |
KR20210134195A (en) * | 2020-04-30 | 2021-11-09 | 서울대학교산학협력단 | Method and apparatus for voice recognition using statistical uncertainty modeling |
AU2020102516A4 (en) * | 2020-09-30 | 2020-11-19 | Du, Jiahui Mr | Health status monitoring system based on speech analysis |
CN112472065A (en) * | 2020-11-18 | 2021-03-12 | 天机医用机器人技术(清远)有限公司 | Disease detection method based on cough sound recognition and related equipment thereof |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117357754A (en) * | 2023-11-15 | 2024-01-09 | 江苏麦麦医疗科技有限公司 | Intelligent household oxygenerator and control system |
Also Published As
Publication number | Publication date |
---|---|
CN114209302B (en) | 2024-03-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111429938B (en) | Single-channel voice separation method and device and electronic equipment | |
CN105023573B (en) | It is detected using speech syllable/vowel/phone boundary of auditory attention clue | |
CN112885372B (en) | Intelligent diagnosis method, system, terminal and medium for power equipment fault sound | |
CN108520753A (en) | Voice lie detection method based on the two-way length of convolution memory network in short-term | |
CN111951824A (en) | Detection method for distinguishing depression based on sound | |
CN111986679A (en) | Speaker confirmation method, system and storage medium for responding to complex acoustic environment | |
CN111724806B (en) | Double-visual-angle single-channel voice separation method based on deep neural network | |
Aibinu et al. | Artificial neural network based autoregressive modeling technique with application in voice activity detection | |
Whitehill et al. | Whosecough: In-the-wild cougher verification using multitask learning | |
CN114209302B (en) | Cough detection method based on data uncertainty learning | |
Looney et al. | Joint estimation of acoustic parameters from single-microphone speech observations | |
CN116842460A (en) | Cough-related disease identification method and system based on attention mechanism and residual neural network | |
CN116013276A (en) | Indoor environment sound automatic classification method based on lightweight ECAPA-TDNN neural network | |
Zheng et al. | MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios | |
CN113974607A (en) | Sleep snore detecting system based on impulse neural network | |
Hernandez-Espinosa et al. | Diagnosis of vocal and voice disorders by the speech signal | |
CN116965819A (en) | Depression recognition method and system based on voice characterization | |
Meng et al. | Noisy training for deep neural networks | |
Kong et al. | Dynamic multi-scale convolution for dialect identification | |
CN111462770A (en) | L STM-based late reverberation suppression method and system | |
CN113096691A (en) | Detection method, device, equipment and computer storage medium | |
CN113488069A (en) | Method and device for quickly extracting high-dimensional voice features based on generative countermeasure network | |
Iwok et al. | Evaluation of Machine Learning Algorithms using Combined Feature Extraction Techniques for Speaker Identification | |
CN113327633A (en) | Method and device for detecting noisy speech endpoint based on deep neural network model | |
Rosa et al. | Evaluation of neural classifiers using statistic methods for identification of laryngeal pathologies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |