CN114209302A - Cough detection method based on data uncertainty learning - Google Patents

Cough detection method based on data uncertainty learning Download PDF

Info

Publication number
CN114209302A
CN114209302A CN202111492741.2A CN202111492741A CN114209302A CN 114209302 A CN114209302 A CN 114209302A CN 202111492741 A CN202111492741 A CN 202111492741A CN 114209302 A CN114209302 A CN 114209302A
Authority
CN
China
Prior art keywords
layer
cough
output
prediction
noise
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111492741.2A
Other languages
Chinese (zh)
Other versions
CN114209302B (en
Inventor
赵永源
谷成明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Jiezhixin Technology Co ltd
Original Assignee
Guangzhou Jiezhixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Jiezhixin Technology Co ltd filed Critical Guangzhou Jiezhixin Technology Co ltd
Priority to CN202111492741.2A priority Critical patent/CN114209302B/en
Publication of CN114209302A publication Critical patent/CN114209302A/en
Application granted granted Critical
Publication of CN114209302B publication Critical patent/CN114209302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • A61B5/0823Detecting or evaluating cough events
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/48Other medical applications
    • A61B5/4803Speech analysis specially adapted for diagnostic purposes
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7203Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/725Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7253Details of waveform analysis characterised by using transforms
    • A61B5/7257Details of waveform analysis characterised by using transforms using Fourier transforms
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Public Health (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Physiology (AREA)
  • Signal Processing (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Pulmonology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cough detection method based on uncertain data learning. The problem that cough detection accuracy is low in the real environment in the prior art is mainly solved. The implementation scheme is as follows: selecting voice data from different public data sets, preprocessing the voice data, and dividing the voice data into a training set and a test set; constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; setting an objective function of the detector network; setting a learning rate and a maximum iteration number, and updating the target function by adopting a random gradient descent method through a training set to obtain a trained detector network; and inputting the test data set into the trained detector network to obtain a cough detection result. The method can obtain high accuracy in a noise-free environment, has better performance in a real noise simulation condition, and can be used for intelligently detecting cough sounds and collecting cough samples.

Description

Cough detection method based on data uncertainty learning
Technical Field
The invention belongs to the technical field of voice signal processing, and further relates to a cough detection method which can be used for intelligently detecting cough sounds and collecting cough samples.
Background
Cough is a reaction mechanism of a human body to respiratory system abnormality, is used for removing pathogens, mucus or foreign matters, and when receptors in respiratory mucosa are stimulated by foreign matters, irritant gases, respiratory secretions and the like entering respiratory tracts and are transmitted to medulla oblongata respiratory center through afferent fibers, a cough reflex is caused, is a protective reflex, can help to remove hidden secretions and harmful matters in the respiratory tracts, and is beneficial to the human body under normal conditions. However, when frequent, severe and persistent coughing occurs, it becomes a pathological condition, and the frequency, intensity and time of occurrence of the cough can provide important information for doctors to diagnose clinical patients. Cough detection is used as the primary part of cough data collection, and the quantitative evaluation of cough frequency or intensity and the qualitative evaluation of dry cough or wet cough types are obtained by a cough detection technology, so that doctors are helped to make more accurate judgment on respiratory tract and lung related focuses. In addition, early signs of disease can be pre-diagnosed by cough detection analysis and prescribed therapy when basic therapy is effective, thereby reducing the human and financial costs of health services. Cough detection recognition can also provide health authorities with timely monitoring information about the occurrence of high-burden respiratory diseases, support early outbreak recognition in specific geographical areas, and better make public health decisions.
In conclusion, cough detection has important significance for preventing, evaluating and controlling epidemic diseases such as pulmonary tuberculosis, new coronary pneumonia and the like. In recent years, with the increase of computer hardware platforms and the increase of data volume, machine learning has been developed vigorously, and machine learning learns target conversion functions and features from a large amount of data sets and predicts new data. The deep learning theory as a part of machine learning is widely used for various task processing due to the superior learning ability. Due to the rapid development of the internet and big data, a large amount of data set bases are provided for deep learning, the deep learning learns richer mappings of the data set through a feature extraction network and a nonlinear layer, and unknown data can be well predicted by utilizing the data set mappings. The cough detection method based on deep learning also becomes a popular research direction for cough detection, and the core of many existing cough detection algorithms is the classifier design based on deep learning.
PRAD KADAMBI et al propose a NEURAL network-BASED COUGH detection scheme in the paper TOWARDS AWEARABLE COUGH DETECTOR BASED ON NEURAL NETWORKS, which adopts a deep NEURAL network structure, a training and testing data set is manually selected from the recordings of 9 patients, and an end-to-end training is adopted, so that the accuracy of 0.923 is achieved ON the testing set.
ALI IMRAN et al, in paper AI4 cove-19, AI Enabled present Diagnosis for cove-19 from cough Samples tea an App, proposed a coughing classifier based on convolutional neural network, which adopts a structure of convolutional layer and full-connection cascade, adopts end-to-end training, and classifies whether audio contains coughing under a data set self-built by the author, achieving accuracy of 0.9791.
Although these cough detection methods have high accuracy, they have the following disadvantages because they are trained and tested under the same data set and do not include strong noise background:
firstly, when the test set and the training set are not in the same data set, the detection accuracy is low;
secondly, when the noise background with complex types and high intensity exists in the test set, the accuracy rate shows severe damage;
thirdly, when the detection is carried out in a real environment, a good detection effect is difficult to achieve due to the limitation of training data.
Disclosure of Invention
The invention aims to provide a cough detection method based on data uncertainty learning aiming at the defects of the cough detection method, so as to improve the cough detection accuracy rate under the simulated real noisy environment under different data sets containing noise backgrounds.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) constructing a cough detection data set, respectively selecting 15000 cough voice data and 15000 non-cough voice data from different public data sets, preprocessing the data sets, dividing the data sets into a training set and a test set according to a ratio of 9:1, wherein the data sets contain a label of whether the cough is contained;
(2) constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; the feature prediction module, the mean value and variance prediction module and the full-connection module form a classifier, wherein the mean value and variance prediction module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full-connection layer, an I activation function layer, an II full-connection layer, an II activation function layer, an II normalization layer, a prediction output layer and an uncertain vector generator, the uncertain vector generator takes the prediction mean value and the prediction variance output by the prediction output layer as parameters, and adds uncertain components to feature vectors to generate uncertain feature vectors, so that the features of an input network have randomness and uncertainty, the stability of the network for classifying noisy data is enhanced, and the detection accuracy of the network for real noisy data is improved;
(3) setting an objective function L of a detector networkG
LG=Lcross+λLkl
Wherein: lambda [ alpha ]<1,LcrossIn order to be a function of the cross-entropy loss,
Figure BDA0003399923290000021
μi、σithe predicted mean and variance are output by the mean and variance prediction module;
(4) training the detector network:
4a) setting a learning rate L and a maximum iterative training time T;
4b) inputting the training data set into a detector network, and obtaining a noisy data set through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;
4c) substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function LcrossObtaining cross entropy result, and predicting mean value mu output by mean value and variance prediction module in detector networkiSum variance σiSubstituted into LklA function is used for obtaining a loss value after one training, iterating the network by adopting a random gradient reduction method according to the change of the loss value obtained by each training, updating the network parameters until the set network training times T are reached, and finishing the training of the detector network;
(5) inputting the test set into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram by the noise-containing test set through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough.
Compared with the prior art, the invention has the following advantages:
firstly, the invention constructs a detector network, the Mel-map of the voice data generated by the Mel-map generating module is used as the input of the classifier, because the Mel-map contains the characteristics of the voice data in the frequency domain and the time domain, and the output spectrogram converts the HZ frequency into the Mel frequency, the output frequency can be changed from linear to non-linear which is easier for network perception, compared with the prior art which only extracts the characteristics of the time domain or the frequency domain to obtain the characteristic vector, the invention considers the characteristic information of the voice data in the time domain and the frequency domain, and leads the characteristics which participate in classification to be more comprehensive.
Secondly, the classifier in the detector network is provided with a mean value and variance prediction module, so that the classification capability of the classifier is globally and locally enhanced by a classification method based on uncertain data learning and through parameters in the network self-adaptive learning classifier, and the problem that the traditional technical method needs to manually adjust model parameters is solved. And an uncertain learning method is introduced to generate a characteristic vector with uncertainty, so that the characteristic vector is used as the input of a full-connection module to achieve the effect of uncertain learning, the characteristic vector of the input network has randomness and uncertainty, and the robustness and generalization capability of the network are enhanced.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a Mel map generation module of the present invention;
FIG. 3 is a sub-flow diagram of the present invention for constructing a classifier structure block diagram in a detector network.
Detailed Description
Embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the specific implementation steps of this example include the following:
step 1, a data set is obtained.
1.1) selecting 15000 cough data and 15000 non-cough data with the sampling frequency of the voice data being more than 16000hz and the voice time length not shorter than 3s from ESC-50, COUGHVID, AUDIO and a public Chinese voice data set;
1.2) preprocessing the selected data:
first, voice data is resampled, the sampling rate is set to 16000hz,
then, carrying out normalization processing on the cough data, and mapping the cough data into a range from-1 to 1;
then, the cough data is intercepted into a voice section with the length of 0.5s-1s, and the blank voice is used for filling to expand the cough data into voice with the length of 1 s; non-cough data is directly intercepted as 1s long speech,
1.3) dividing the preprocessed 30000 voice data into a training set and a test set according to a ratio of 9:1, namely, selecting 13500 voice data from 15000 cough data and 15000 non-cough data randomly as the training set and the rest 3000 voice data as the test set.
And 2, constructing a detector network.
2.1) establishing a noise generation module:
the noise generation module is a voice signal adder and is used for adding noise to voice data so as to simulate the voice data under a real condition, the input of the noise generation module is a voice signal, a noise type and a signal-to-noise ratio, wherein the noise type is white noise or common background noise, Gaussian white noise is added to the voice signal when the white noise is selected, and background noise is added to the voice signal when the common background noise is selected; the noise signal to noise ratio parameter is used to determine the strength of the added noise.
2.2) establishing a Mel spectrogram generating module:
the module is formed by sequentially cascading a framing processor, a windowing device, an FFT generator, a time domain stacker and a Mel filter bank, wherein:
a framing processor for dividing the input noisy speech signal in the time domain at a fixed time interval to generate a speech segment in units of frames;
the windowing device is used for adding a rectangular window to each frame of voice section after framing so as to facilitate the subsequent FFT;
the FFT generator is used for performing fast Fourier transform on each frame of voice signals after windowing to obtain one-dimensional frequency domain signals;
the time domain stacker is used for stacking the frequency domain signals of all the frames on a time domain to obtain a spectrogram;
and the Mel filter bank is used for converting the sound spectrogram into a Mel spectrogram and outputting the Mel spectrogram.
Referring to fig. 2, the operation flow of the mel-map generation module is as follows:
the voice data is subjected to framing processing through a framing processor to obtain a multi-frame voice section with a fixed frame length;
the multiframe voice segment passes through a windowing device to obtain a multiframe voice segment which is truncated by a rectangular window;
fourier transform is carried out on each frame of voice section through an FFT generator, and a one-dimensional frequency domain signal of each frame of voice section is obtained;
stacking the one-dimensional frequency signals of each frame along a time domain through a time domain stacker to obtain a spectrogram;
the sound spectrogram passes through a Mel scale filter bank to obtain a Mel spectrogram.
2.3) establishing a classifier:
referring to fig. 3, the specific implementation of this step is as follows:
2.3.1) building a feature prediction Module
The feature prediction module structure is as follows:
input layer → 1 st max pooling layer → 1 st convolution layer → 1 st activation function layer → 2 nd convolution layer → 2 nd activation function layer → 2 nd max pooling layer → 1 st residual block layer → 2 nd residual block layer → 3 rd convolution layer → 3 rd activation function → 4 th convolution layer → 4 th activation function → 3 rd max pooling layer → output layer;
the parameters of each layer are as follows:
the input layer inputs the Mel spectrogram generated by the Mel spectrogram generating module;
the number of input channels of the 1 st convolution layer is 1, and the number of output channels is 16;
the number of input channels of the 2 nd convolution layer is 16, and the number of output channels is 64;
the input channel number of the 3 rd convolution layer is 64, and the output channel number is 16;
the number of input channels of the 4 th convolution layer is 16, and the number of output channels is 16;
the convolution kernel size of all convolution layers is set to be 5 multiplied by 5, the convolution step length is set to be 1, and the filling is set to be 2;
relu is used for activation functions of all activation function layers;
the convolution kernels for all the largest pooling layers are 2 x 2, and the step size is 2.
And the output channel number of each residual block is 64.
Each residual block is formed by a first convolution layer, a first activation function layer, a second convolution layer, a second activation function layer, an adder and a cascade in sequence, wherein:
the number of input channels of the first convolution layer and the second convolution layer is 64, and the number of output channels of the first convolution layer and the second convolution layer is 64;
the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1;
the activation functions of the first activation function layer and the second activation function layer use Relu;
the inputs to the adder are the second active function layer output and the input to the residual block.
2.3.2) establishing a feature prediction module establishing a mean and variance prediction module:
the module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full connection layer, an I activation function layer, a II full connection layer, a II activation function layer, a II normalization layer, a prediction output layer and an uncertain vector generator, wherein the parameters of each layer are as follows:
inputting a feature diagram with the number of channels being 16 output by a feature prediction part of an input layer;
the first normalization layer uses a BN normalization function, and the number of channels is 16;
the second normalization layer uses a BN normalization function, and the number of channels is 128;
the number of input channels of the I full connection layer is 3840, and the number of output channels of the I full connection layer is 512;
the input channel number of the II full connection layer is 512, and the output channel number is 128;
the Dropout layer is used to randomly discard 15% of neurons;
the Flatten layer is used for flattening the two-dimensional characteristic diagram to a one-dimensional vector;
the activation functions of the I and II activation function layers both adopt Relu;
the prediction output layer outputs two characteristic vectors with the length of 128, namely a prediction mean value and a prediction variance;
the inputs to the uncertainty vector generator are the predicted mean and the predicted variance to generate different uncertainty feature vectors during the training process and the testing process, which are implemented as follows:
in the training process of the uncertain vector generator, the prediction mean value and the prediction variance output by the prediction output layer are used as parameters of normal distribution, and the uncertain characteristics with the length of 128 are randomly generatedVector, assuming mean result of prediction as μiVariance prediction variance result is sigmaiInput is xiThe uncertainty feature vector z generatediThe following relationships exist:
p(zi|xi)=N(zi;μi2I)
in the formula, the average value prediction part outputs a result muiCan be regarded as a prediction of the speech data characteristics, and the variance prediction part outputs a result sigmaiCan be regarded as the prediction of muiI is an identity matrix of length 128, where each output sample is no longer a definite value but a normal distribution N (z)i;μi2I) Random sampling of (2); since the gradient cannot be solved by random sampling, the backward propagation of the gradient during training is prevented, and the network iteration is blocked during training, a new parameterization is needed to be adopted to ensure that the network can still apply the gradient iteration, namely, the normal distribution is firstly used for carrying out the sigma iterationiSampling and regenerating siAs ziEquivalent of (d): si=μi+εσiε to N (0,1), siAs ziThe equivalent expression of (a) is the uncertainty feature vector output by this module in the training process;
in the test process, the uncertain vector generator sends the feature vector with the length of 128 into the uncertain vector generator, and the uncertain vector generator directly outputs the prediction mean value.
2.3.3) establishing a fully connected Module
The structure of the full-connection module is as follows: input layer → fully connected layer → output layer, whose output is a vector of length 2;
the parameters of each layer are as follows:
inputting a feature vector with the length of 128 generated by an uncertain vector generator into an input layer;
the number of input channels of the fully connected layer is 128 and the number of output channels is 2.
The input of the full-connection module is a feature vector with the length of 128 generated by the uncertain vector generator, and a probability vector [ a, b ] with the length of 2 is output, wherein a represents the probability of being cough, and b represents the probability of not being cough.
2.3.4) sequentially cascading a feature prediction module, a mean value and variance prediction module and a full-connection module to obtain a classifier;
and 2.4) sequentially cascading the established noise generation module, the built Mel map generation module and the built classifier to form a detector network.
Step 3, setting an objective function L of the detector networkG
3.1) setting a cross entropy loss function L between the output result of the detector network and the data set labelcross
Lcross=-1/2[y log a+(1-y)log b]
Wherein y is the label of the voice data in the data set, when y is 1, the voice data contains cough, and when y is 0, the voice data does not contain cough; a is the cough probability output by the detector network, and b is the non-cough probability output by the detector network;
3.2) defining a divergence function L of a normal distribution and a standard normal distribution which are formed by taking a prediction mean and a prediction variance as parametersklI.e. in μiAnd σiNormal distribution N (mu) formed for parametersii) According to the formula of N (mu)ii) Obtaining divergence function L with standard normal distribution N (0,1)kl
Figure BDA0003399923290000071
In the formula, KL is used for solving KL divergence of two probability distributions after parentheses;
3.3) mixing LcrossAnd LklAdding to obtain an optimization function L of the classifierG
LG=Lcross+λ·Lkl
Wherein λ is a weighting parameter, λ < 1.
And 5, training the detector network.
Setting the learning rate L to be 0.0001 and the maximum iterative training time T to be 200;
27000 sections of voice data in a training data set are input to a detector network, and a noisy data set is obtained through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;
substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function LcrossObtaining cross entropy result, and calculating the predicted mean value mu of the detector network outputiSum variance σiSubstituting divergence function LklObtaining a loss value after one training, iterating the network by adopting a random gradient reduction method according to the change of the loss value obtained by each training, updating network parameters until a set network training frequency T is reached, and finishing the training of the detector network;
and 6, testing the test set to obtain a cough detection result.
Inputting a test set consisting of 3000 voice data into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough, so that the detection of cough is completed.
The effect of the present invention will be further explained with the simulation experiment.
1. Simulation conditions are as follows:
the hardware environment of the simulation experiment is: a GPU of NVIDIA GTX 1080Ti model and a running memory of 128 GB;
the software environment of the simulation experiment is as follows: the deep learning framework pytorch1.8.0.
In the simulation experiment, the accuracy of the classifier is adopted as the objective quantitative evaluation index, assuming that the total number of the test set voice segments participating in the experiment is d, and the number of accurate classification after passing through the classifier is f, the accuracy can be expressed as:
P=f/d。
2. simulation content and result analysis
To verify the effectiveness of the present invention and to introduce the superiority of uncertain learning, comparisons were made using the method of the present invention and the AI4 method, respectively.
The AI4 method is a Cough detection method proposed by LI IMRAN et al in the paper AI4COVID-19 AI Enabled Preliminary Diagnosis for COVID-19from Covid Samples via an App, and the classifier thereof consists of a convolutional layer and a full link layer and also adopts an end-to-end training mode. In the simulation experiment, the classifier in the method is replaced by the classifier provided by AI4, and the same training set and test set are adopted to carry out the comparison experiment.
The experimental procedure was as follows:
firstly, parameters of a noise generation module in two methods are adjusted simultaneously to carry out a plurality of comparison experiments, and Gaussian noise and background noise are respectively selected as noise types; selecting SNR values of 10, 8 and 5 respectively;
then, according to the steps in the specific implementation mode, performing experimental simulation to obtain the detection result of the test set, comparing the detection result with the label of the test set to obtain whether the detection is accurate, counting the number of the accurate detections and the number of the errors in the detection to obtain the detection accuracy P, wherein the results are shown in table 1:
TABLE 1 cough detection accuracy results
Figure BDA0003399923290000091
As can be seen from Table 1, the invention can obtain better detection results under different noise conditions and different noise intensities, which shows that the invention has good detection results under the condition of real environmental noise.
The detection accuracy of the method is higher than that of the AI4 method under different noise types and different noise intensities, and the method is proved to be capable of obtaining better detection effect by introducing an uncertain learning method.

Claims (10)

1. A method for cough detection based on data uncertain learning, comprising:
(1) constructing a cough detection data set, respectively selecting 15000 cough voice data and 15000 non-cough voice data from different public data sets, preprocessing the data sets, dividing the data sets into a training set and a test set according to a ratio of 9:1, wherein the data sets contain a label of whether the cough is contained;
(2) constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; the feature prediction module, the mean value and variance prediction module and the full-connection module form a classifier, wherein the mean value and variance prediction module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full-connection layer, an I activation function layer, an II full-connection layer, an II activation function layer, an II normalization layer, a prediction output layer and an uncertain vector generator, the uncertain vector generator takes the prediction mean value and the prediction variance output by the prediction output layer as parameters, and adds uncertain components to feature vectors to generate uncertain feature vectors, so that the features of an input network have randomness and uncertainty, the stability of the network for classifying noisy data is enhanced, and the detection accuracy of the network for real noisy data is improved;
(3) setting an objective function L of a detector networkG
LG=Lcross+λLkl
Wherein: lambda [ alpha ]<1,LcrossIn order to be a function of the cross-entropy loss,
Figure FDA0003399923280000011
μi、σithe predicted mean and variance are output by the mean and variance prediction module;
(4) training the detector network:
4a) setting a learning rate L and a maximum iterative training time T;
4b) inputting the training data set into a detector network, and obtaining a noisy data set through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;
4c) substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function LcrossObtaining cross entropy result, and predicting mean value mu output by mean value and variance prediction module in detector networkiSum variance σiSubstituted into LklObtaining a loss value after one training by using a divergence function, iterating the network by adopting a random gradient descent method according to the change of the loss value obtained by each training, updating network parameters until a set network training frequency T is reached, and finishing the training of the detector network;
(5) inputting the test set into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram by the noise-containing test set through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough.
2. The method of claim 1, wherein the cough detection data set is preprocessed in (1) by:
2a) setting the frequency of all downloaded cough voice data and non-cough voice data to 16000hz, carrying out normalization processing on the voice data, and mapping the voice data into a range from-1 to 1;
2b) and processing the cough voice data and the non-cough voice data after line normalization in different modes:
processing of cough voice data: firstly, intercepting the cough voice data into audio with the length of 0.5s-1 s; then expanding the intercepted voice data, namely filling the intercepted voice data with blank voice, and expanding the audio frequency into 1s long audio frequency;
processing non-cough voice data: the non-cough data is intercepted as 1s long audio.
3. The method according to claim 1, wherein the noise generation module in (2) is a voice signal adder, the input is voice signal, noise type and signal-to-noise ratio, and the output is a noise-added voice signal; when the noise type is white noise, adding Gaussian white noise to the voice signal; when the noise type is selected as background noise, the noise type is used for adding the background noise which is common in life to the voice signal, and the magnitude of the signal-to-noise ratio is related to the intensity of the added noise.
4. The method according to claim 1, wherein the Mel pattern generation module in (2) is structured as follows:
input → framing processor → windower → FFT generator → time domain stacker → Mel filter bank → output;
the framing processor is used for dividing the input voice signal after noise addition at a fixed time interval on a time domain to generate a voice section taking a frame as a unit;
the windowing device is used for adding a rectangular window to each frame of voice section after framing so as to facilitate the subsequent FFT;
the FFT generator is used for performing fast Fourier transform on each frame of voice signals after windowing to obtain one-dimensional frequency domain signals;
the time domain stacker is used for stacking the frequency domain signals of all frames on a time domain to obtain a spectrogram;
the Mel filter bank is used for converting the sound spectrogram into Mel spectrogram and outputting the Mel spectrogram.
5. The method of claim 1, wherein the feature prediction module structure and parameters in (2) are as follows:
the structure is as follows: input layer → 1 st max pooling layer → 1 st convolution layer → 1 st activation function layer → 2 nd convolution layer → 2 nd activation function layer → 2 nd max pooling layer → 1 st residual block layer → 2 nd residual block layer → 3 rd convolution layer → 3 rd activation function → 4 th convolution layer → 4 th activation function → 3 rd max pooling layer → output layer;
parameters of each layer are as follows:
the input layer inputs the Mel spectrogram generated by the Mel spectrogram generating module;
the number of input channels of the 1 st convolution layer is 1, and the number of output channels is 16;
the number of input channels of the 2 nd convolution layer is 16, and the number of output channels is 64;
the input channel number of the 3 rd convolution layer is 64, and the output channel number is 16;
the number of input channels of the 4 th convolution layer is 16, and the number of output channels is 16;
the convolution kernel size of all convolution layers is set to be 5 multiplied by 5, the convolution step length is set to be 1, and the filling is set to be 2;
relu is used for activation functions of all activation function layers;
the convolution kernels for all the largest pooling layers are 2 x 2, and the step size is 2.
And the output channel number of each residual block is 64.
6. The method of claim 5, wherein each residual block in the feature prediction module is composed of a first convolutional layer, a first activation function layer, a second convolutional layer, a second activation function layer, an adder, and sequentially cascaded, and the parameters of each layer are as follows:
the number of input channels of the first convolution layer and the second convolution layer is 64, and the number of output channels of the first convolution layer and the second convolution layer is 64;
the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1;
the activation functions of the first activation function layer and the second activation function layer use Relu;
the inputs to the adder are the second active function layer output and the input to the residual block.
7. The method of claim 1, wherein the mean and variance prediction module, structure and parameters in (2) are as follows:
the structure is as follows: input layer → first normalization layer → Dropout layer → scatter layer → first fully-connected layer → first activation function layer → second fully-connected layer → second activation function layer → second normalization layer → prediction output layer → uncertain vector generator;
the parameters of each layer are as follows:
inputting a feature diagram with the number of channels being 16 output by a feature prediction part of an input layer;
the first normalization layer uses a BN normalization function, and the number of channels is 16;
the second normalization layer uses a BN normalization function, and the number of channels is 128;
the number of input channels of the I full connection layer is 3840, and the number of output channels of the I full connection layer is 512;
the input channel number of the II full connection layer is 512, and the output channel number is 128;
the Dropout layer is used to randomly discard 15% of neurons;
the Flatten layer is used for flattening the two-dimensional characteristic diagram to a one-dimensional vector;
the activation functions of the I and II activation function layers both adopt Relu;
the prediction output layer outputs two characteristic vectors with the length of 128, namely a prediction mean value and a prediction variance;
the input of the uncertainty vector generator is a prediction mean and a prediction variance, which are used for generating different uncertainty feature vectors in a training process and a testing process.
8. The method of claim 7, wherein the uncertainty vector generator generates different uncertainty feature vectors during the training process and the testing process, by:
in the training process, the prediction mean value and the prediction variance output by the prediction output layer are used as parameters of normal distribution, uncertainty characteristic vectors with the length of 128 are randomly generated, and the prediction mean value result is assumed to be muiVariance prediction variance result is sigmaiInput is xiThe uncertainty feature vector z generatediThe following relationships exist:
p(zi|xi)=N(zi;μi2I)
wherein the mean value predicting part outputs a result muiCan be regarded as pair voicePrediction of data characteristics, and the result sigma output by variance prediction partiCan be regarded as the prediction of muiI is an identity matrix of length 128, where each output sample is no longer a definite value but a normal distribution N (z)i;μi2I) Random sampling of (2); because the infinitesimal random sampling can prevent the back propagation of the gradient during training, the network iteration is blocked in the training process, and the network can still apply the gradient iteration by adopting reparameterization, namely firstly, the normal distribution is used for carrying out the sigma pairiSampling and regenerating siAs ziEquivalent of (d): si=μi+εσiε~N(0,1)
S isiAs ziThe equivalent representation of (a) is the output of this module in the training process;
during the test, the feature vector with the length of 128 enters an uncertain vector generator, and the uncertain vector generator directly outputs the prediction mean.
9. The method of claim 1, wherein the fully connected modules, structures and parameters are as follows:
the structure is as follows: input layer → fully connected layer → output layer, whose output is a vector of length 2;
the parameters of each layer are as follows:
inputting a feature vector with the length of 128 generated by an uncertain vector generator into an input layer;
the number of input channels of the fully connected layer is 128 and the number of output channels is 2.
The input of the full-connection module is a feature vector with the length of 128 generated by the uncertain vector generator, and a probability vector [ a, b ] with the length of 2 is output, wherein a represents the probability of being cough, and b represents the probability of not being cough.
10. The method of claim 1, wherein the objective function LGCross entropy loss function L in (1)crossIs represented as follows:
Lcross=-1/2[y log a+(1-y)log b]
wherein y is the label of the voice data, when y is 1, the voice data contains the cough, and when y is 0, the voice data does not contain the cough; a is the probability of coughing output by the fully connected module, and b is the probability of non-coughing output by the fully connected module.
CN202111492741.2A 2021-12-08 2021-12-08 Cough detection method based on data uncertainty learning Active CN114209302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111492741.2A CN114209302B (en) 2021-12-08 2021-12-08 Cough detection method based on data uncertainty learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111492741.2A CN114209302B (en) 2021-12-08 2021-12-08 Cough detection method based on data uncertainty learning

Publications (2)

Publication Number Publication Date
CN114209302A true CN114209302A (en) 2022-03-22
CN114209302B CN114209302B (en) 2024-03-26

Family

ID=80700590

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111492741.2A Active CN114209302B (en) 2021-12-08 2021-12-08 Cough detection method based on data uncertainty learning

Country Status (1)

Country Link
CN (1) CN114209302B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117357754A (en) * 2023-11-15 2024-01-09 江苏麦麦医疗科技有限公司 Intelligent household oxygenerator and control system

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120278022A1 (en) * 2011-04-29 2012-11-01 Pulsar Informatics, Inc. Systems and methods for latency and measurement uncertainty management in stimulus-response tests
WO2013036718A1 (en) * 2011-09-08 2013-03-14 Isis Innovation Ltd. Determining acceptability of physiological signals
CN109493874A (en) * 2018-11-23 2019-03-19 东北农业大学 A kind of live pig cough sound recognition methods based on convolutional neural networks
US20190107888A1 (en) * 2017-10-06 2019-04-11 Holland Bloorview Kids Rehabilitation Hospital Brain-computer interface platform and process for classification of covert speech
CN110383375A (en) * 2017-02-01 2019-10-25 瑞爱普健康有限公司 Method and apparatus for the cough in detection noise background environment
US20200060604A1 (en) * 2017-02-24 2020-02-27 Holland Bloorview Kids Rehabilitation Hospital Systems and methods of automatic cough identification
AU2020102516A4 (en) * 2020-09-30 2020-11-19 Du, Jiahui Mr Health status monitoring system based on speech analysis
CN112472065A (en) * 2020-11-18 2021-03-12 天机医用机器人技术(清远)有限公司 Disease detection method based on cough sound recognition and related equipment thereof
KR20210134195A (en) * 2020-04-30 2021-11-09 서울대학교산학협력단 Method and apparatus for voice recognition using statistical uncertainty modeling

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120278022A1 (en) * 2011-04-29 2012-11-01 Pulsar Informatics, Inc. Systems and methods for latency and measurement uncertainty management in stimulus-response tests
WO2013036718A1 (en) * 2011-09-08 2013-03-14 Isis Innovation Ltd. Determining acceptability of physiological signals
CN110383375A (en) * 2017-02-01 2019-10-25 瑞爱普健康有限公司 Method and apparatus for the cough in detection noise background environment
US20200060604A1 (en) * 2017-02-24 2020-02-27 Holland Bloorview Kids Rehabilitation Hospital Systems and methods of automatic cough identification
US20190107888A1 (en) * 2017-10-06 2019-04-11 Holland Bloorview Kids Rehabilitation Hospital Brain-computer interface platform and process for classification of covert speech
CN109493874A (en) * 2018-11-23 2019-03-19 东北农业大学 A kind of live pig cough sound recognition methods based on convolutional neural networks
KR20210134195A (en) * 2020-04-30 2021-11-09 서울대학교산학협력단 Method and apparatus for voice recognition using statistical uncertainty modeling
AU2020102516A4 (en) * 2020-09-30 2020-11-19 Du, Jiahui Mr Health status monitoring system based on speech analysis
CN112472065A (en) * 2020-11-18 2021-03-12 天机医用机器人技术(清远)有限公司 Disease detection method based on cough sound recognition and related equipment thereof

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117357754A (en) * 2023-11-15 2024-01-09 江苏麦麦医疗科技有限公司 Intelligent household oxygenerator and control system

Also Published As

Publication number Publication date
CN114209302B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
CN111429938B (en) Single-channel voice separation method and device and electronic equipment
CN105023573B (en) It is detected using speech syllable/vowel/phone boundary of auditory attention clue
CN112885372B (en) Intelligent diagnosis method, system, terminal and medium for power equipment fault sound
CN108520753A (en) Voice lie detection method based on the two-way length of convolution memory network in short-term
CN111951824A (en) Detection method for distinguishing depression based on sound
CN111986679A (en) Speaker confirmation method, system and storage medium for responding to complex acoustic environment
CN111724806B (en) Double-visual-angle single-channel voice separation method based on deep neural network
Aibinu et al. Artificial neural network based autoregressive modeling technique with application in voice activity detection
Whitehill et al. Whosecough: In-the-wild cougher verification using multitask learning
CN114209302B (en) Cough detection method based on data uncertainty learning
Looney et al. Joint estimation of acoustic parameters from single-microphone speech observations
CN116842460A (en) Cough-related disease identification method and system based on attention mechanism and residual neural network
CN116013276A (en) Indoor environment sound automatic classification method based on lightweight ECAPA-TDNN neural network
Zheng et al. MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios
CN113974607A (en) Sleep snore detecting system based on impulse neural network
Hernandez-Espinosa et al. Diagnosis of vocal and voice disorders by the speech signal
CN116965819A (en) Depression recognition method and system based on voice characterization
Meng et al. Noisy training for deep neural networks
Kong et al. Dynamic multi-scale convolution for dialect identification
CN111462770A (en) L STM-based late reverberation suppression method and system
CN113096691A (en) Detection method, device, equipment and computer storage medium
CN113488069A (en) Method and device for quickly extracting high-dimensional voice features based on generative countermeasure network
Iwok et al. Evaluation of Machine Learning Algorithms using Combined Feature Extraction Techniques for Speaker Identification
CN113327633A (en) Method and device for detecting noisy speech endpoint based on deep neural network model
Rosa et al. Evaluation of neural classifiers using statistic methods for identification of laryngeal pathologies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant