CN114209302A

CN114209302A - Cough detection method based on data uncertainty learning

Info

Publication number: CN114209302A
Application number: CN202111492741.2A
Authority: CN
Inventors: 赵永源; 谷成明
Original assignee: Guangzhou Jiezhixin Technology Co ltd
Current assignee: Guangzhou Jiezhixin Technology Co ltd
Priority date: 2021-12-08
Filing date: 2021-12-08
Publication date: 2022-03-22
Anticipated expiration: 2041-12-08
Also published as: CN114209302B

Abstract

The invention discloses a cough detection method based on uncertain data learning. The problem that cough detection accuracy is low in the real environment in the prior art is mainly solved. The implementation scheme is as follows: selecting voice data from different public data sets, preprocessing the voice data, and dividing the voice data into a training set and a test set; constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; setting an objective function of the detector network; setting a learning rate and a maximum iteration number, and updating the target function by adopting a random gradient descent method through a training set to obtain a trained detector network; and inputting the test data set into the trained detector network to obtain a cough detection result. The method can obtain high accuracy in a noise-free environment, has better performance in a real noise simulation condition, and can be used for intelligently detecting cough sounds and collecting cough samples.

Description

Cough detection method based on data uncertainty learning

Technical Field

The invention belongs to the technical field of voice signal processing, and further relates to a cough detection method which can be used for intelligently detecting cough sounds and collecting cough samples.

Background

Cough is a reaction mechanism of a human body to respiratory system abnormality, is used for removing pathogens, mucus or foreign matters, and when receptors in respiratory mucosa are stimulated by foreign matters, irritant gases, respiratory secretions and the like entering respiratory tracts and are transmitted to medulla oblongata respiratory center through afferent fibers, a cough reflex is caused, is a protective reflex, can help to remove hidden secretions and harmful matters in the respiratory tracts, and is beneficial to the human body under normal conditions. However, when frequent, severe and persistent coughing occurs, it becomes a pathological condition, and the frequency, intensity and time of occurrence of the cough can provide important information for doctors to diagnose clinical patients. Cough detection is used as the primary part of cough data collection, and the quantitative evaluation of cough frequency or intensity and the qualitative evaluation of dry cough or wet cough types are obtained by a cough detection technology, so that doctors are helped to make more accurate judgment on respiratory tract and lung related focuses. In addition, early signs of disease can be pre-diagnosed by cough detection analysis and prescribed therapy when basic therapy is effective, thereby reducing the human and financial costs of health services. Cough detection recognition can also provide health authorities with timely monitoring information about the occurrence of high-burden respiratory diseases, support early outbreak recognition in specific geographical areas, and better make public health decisions.

In conclusion, cough detection has important significance for preventing, evaluating and controlling epidemic diseases such as pulmonary tuberculosis, new coronary pneumonia and the like. In recent years, with the increase of computer hardware platforms and the increase of data volume, machine learning has been developed vigorously, and machine learning learns target conversion functions and features from a large amount of data sets and predicts new data. The deep learning theory as a part of machine learning is widely used for various task processing due to the superior learning ability. Due to the rapid development of the internet and big data, a large amount of data set bases are provided for deep learning, the deep learning learns richer mappings of the data set through a feature extraction network and a nonlinear layer, and unknown data can be well predicted by utilizing the data set mappings. The cough detection method based on deep learning also becomes a popular research direction for cough detection, and the core of many existing cough detection algorithms is the classifier design based on deep learning.

PRAD KADAMBI et al propose a NEURAL network-BASED COUGH detection scheme in the paper TOWARDS AWEARABLE COUGH DETECTOR BASED ON NEURAL NETWORKS, which adopts a deep NEURAL network structure, a training and testing data set is manually selected from the recordings of 9 patients, and an end-to-end training is adopted, so that the accuracy of 0.923 is achieved ON the testing set.

ALI IMRAN et al, in paper AI4 cove-19, AI Enabled present Diagnosis for cove-19 from cough Samples tea an App, proposed a coughing classifier based on convolutional neural network, which adopts a structure of convolutional layer and full-connection cascade, adopts end-to-end training, and classifies whether audio contains coughing under a data set self-built by the author, achieving accuracy of 0.9791.

Although these cough detection methods have high accuracy, they have the following disadvantages because they are trained and tested under the same data set and do not include strong noise background:

firstly, when the test set and the training set are not in the same data set, the detection accuracy is low;

secondly, when the noise background with complex types and high intensity exists in the test set, the accuracy rate shows severe damage;

thirdly, when the detection is carried out in a real environment, a good detection effect is difficult to achieve due to the limitation of training data.

Disclosure of Invention

The invention aims to provide a cough detection method based on data uncertainty learning aiming at the defects of the cough detection method, so as to improve the cough detection accuracy rate under the simulated real noisy environment under different data sets containing noise backgrounds.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

(1) constructing a cough detection data set, respectively selecting 15000 cough voice data and 15000 non-cough voice data from different public data sets, preprocessing the data sets, dividing the data sets into a training set and a test set according to a ratio of 9:1, wherein the data sets contain a label of whether the cough is contained;

(2) constructing a detector network formed by sequentially cascading a noise generation module, a Mel map generation module, a feature prediction module, a mean value and variance prediction module and a full-connection module; the feature prediction module, the mean value and variance prediction module and the full-connection module form a classifier, wherein the mean value and variance prediction module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full-connection layer, an I activation function layer, an II full-connection layer, an II activation function layer, an II normalization layer, a prediction output layer and an uncertain vector generator, the uncertain vector generator takes the prediction mean value and the prediction variance output by the prediction output layer as parameters, and adds uncertain components to feature vectors to generate uncertain feature vectors, so that the features of an input network have randomness and uncertainty, the stability of the network for classifying noisy data is enhanced, and the detection accuracy of the network for real noisy data is improved;

(3) setting an objective function L of a detector network_G：

L_G＝L_cross+λL_kl，

Wherein: lambda [ alpha ]<1，L_crossIn order to be a function of the cross-entropy loss,

μ_i、σ_ithe predicted mean and variance are output by the mean and variance prediction module;

(4) training the detector network:

4a) setting a learning rate L and a maximum iterative training time T;

4b) inputting the training data set into a detector network, and obtaining a noisy data set through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;

4c) substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function L_crossObtaining cross entropy result, and predicting mean value mu output by mean value and variance prediction module in detector network_iSum variance σ_iSubstituted into L_klA function is used for obtaining a loss value after one training, iterating the network by adopting a random gradient reduction method according to the change of the loss value obtained by each training, updating the network parameters until the set network training times T are reached, and finishing the training of the detector network;

(5) inputting the test set into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram by the noise-containing test set through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough.

Compared with the prior art, the invention has the following advantages:

firstly, the invention constructs a detector network, the Mel-map of the voice data generated by the Mel-map generating module is used as the input of the classifier, because the Mel-map contains the characteristics of the voice data in the frequency domain and the time domain, and the output spectrogram converts the HZ frequency into the Mel frequency, the output frequency can be changed from linear to non-linear which is easier for network perception, compared with the prior art which only extracts the characteristics of the time domain or the frequency domain to obtain the characteristic vector, the invention considers the characteristic information of the voice data in the time domain and the frequency domain, and leads the characteristics which participate in classification to be more comprehensive.

Secondly, the classifier in the detector network is provided with a mean value and variance prediction module, so that the classification capability of the classifier is globally and locally enhanced by a classification method based on uncertain data learning and through parameters in the network self-adaptive learning classifier, and the problem that the traditional technical method needs to manually adjust model parameters is solved. And an uncertain learning method is introduced to generate a characteristic vector with uncertainty, so that the characteristic vector is used as the input of a full-connection module to achieve the effect of uncertain learning, the characteristic vector of the input network has randomness and uncertainty, and the robustness and generalization capability of the network are enhanced.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a Mel map generation module of the present invention;

FIG. 3 is a sub-flow diagram of the present invention for constructing a classifier structure block diagram in a detector network.

Detailed Description

Embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the specific implementation steps of this example include the following:

step 1, a data set is obtained.

1.1) selecting 15000 cough data and 15000 non-cough data with the sampling frequency of the voice data being more than 16000hz and the voice time length not shorter than 3s from ESC-50, COUGHVID, AUDIO and a public Chinese voice data set;

1.2) preprocessing the selected data:

first, voice data is resampled, the sampling rate is set to 16000hz,

then, carrying out normalization processing on the cough data, and mapping the cough data into a range from-1 to 1;

then, the cough data is intercepted into a voice section with the length of 0.5s-1s, and the blank voice is used for filling to expand the cough data into voice with the length of 1 s; non-cough data is directly intercepted as 1s long speech,

1.3) dividing the preprocessed 30000 voice data into a training set and a test set according to a ratio of 9:1, namely, selecting 13500 voice data from 15000 cough data and 15000 non-cough data randomly as the training set and the rest 3000 voice data as the test set.

And 2, constructing a detector network.

2.1) establishing a noise generation module:

the noise generation module is a voice signal adder and is used for adding noise to voice data so as to simulate the voice data under a real condition, the input of the noise generation module is a voice signal, a noise type and a signal-to-noise ratio, wherein the noise type is white noise or common background noise, Gaussian white noise is added to the voice signal when the white noise is selected, and background noise is added to the voice signal when the common background noise is selected; the noise signal to noise ratio parameter is used to determine the strength of the added noise.

2.2) establishing a Mel spectrogram generating module:

the module is formed by sequentially cascading a framing processor, a windowing device, an FFT generator, a time domain stacker and a Mel filter bank, wherein:

a framing processor for dividing the input noisy speech signal in the time domain at a fixed time interval to generate a speech segment in units of frames;

the windowing device is used for adding a rectangular window to each frame of voice section after framing so as to facilitate the subsequent FFT;

the FFT generator is used for performing fast Fourier transform on each frame of voice signals after windowing to obtain one-dimensional frequency domain signals;

the time domain stacker is used for stacking the frequency domain signals of all the frames on a time domain to obtain a spectrogram;

and the Mel filter bank is used for converting the sound spectrogram into a Mel spectrogram and outputting the Mel spectrogram.

Referring to fig. 2, the operation flow of the mel-map generation module is as follows:

the voice data is subjected to framing processing through a framing processor to obtain a multi-frame voice section with a fixed frame length;

the multiframe voice segment passes through a windowing device to obtain a multiframe voice segment which is truncated by a rectangular window;

fourier transform is carried out on each frame of voice section through an FFT generator, and a one-dimensional frequency domain signal of each frame of voice section is obtained;

stacking the one-dimensional frequency signals of each frame along a time domain through a time domain stacker to obtain a spectrogram;

the sound spectrogram passes through a Mel scale filter bank to obtain a Mel spectrogram.

2.3) establishing a classifier:

referring to fig. 3, the specific implementation of this step is as follows:

2.3.1) building a feature prediction Module

The feature prediction module structure is as follows:

input layer → 1 st max pooling layer → 1 st convolution layer → 1 st activation function layer → 2 nd convolution layer → 2 nd activation function layer → 2 nd max pooling layer → 1 st residual block layer → 2 nd residual block layer → 3 rd convolution layer → 3 rd activation function → 4 th convolution layer → 4 th activation function → 3 rd max pooling layer → output layer;

the parameters of each layer are as follows:

the input layer inputs the Mel spectrogram generated by the Mel spectrogram generating module;

the number of input channels of the 1 st convolution layer is 1, and the number of output channels is 16;

the number of input channels of the 2 nd convolution layer is 16, and the number of output channels is 64;

the input channel number of the 3 rd convolution layer is 64, and the output channel number is 16;

the number of input channels of the 4 th convolution layer is 16, and the number of output channels is 16;

the convolution kernel size of all convolution layers is set to be 5 multiplied by 5, the convolution step length is set to be 1, and the filling is set to be 2;

relu is used for activation functions of all activation function layers;

the convolution kernels for all the largest pooling layers are 2 x 2, and the step size is 2.

And the output channel number of each residual block is 64.

Each residual block is formed by a first convolution layer, a first activation function layer, a second convolution layer, a second activation function layer, an adder and a cascade in sequence, wherein:

the number of input channels of the first convolution layer and the second convolution layer is 64, and the number of output channels of the first convolution layer and the second convolution layer is 64;

the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1;

the activation functions of the first activation function layer and the second activation function layer use Relu;

the inputs to the adder are the second active function layer output and the input to the residual block.

2.3.2) establishing a feature prediction module establishing a mean and variance prediction module:

the module is formed by cascading an input layer, an I normalization layer, a Dropout layer, a Flatten layer, an I full connection layer, an I activation function layer, a II full connection layer, a II activation function layer, a II normalization layer, a prediction output layer and an uncertain vector generator, wherein the parameters of each layer are as follows:

inputting a feature diagram with the number of channels being 16 output by a feature prediction part of an input layer;

the first normalization layer uses a BN normalization function, and the number of channels is 16;

the second normalization layer uses a BN normalization function, and the number of channels is 128;

the number of input channels of the I full connection layer is 3840, and the number of output channels of the I full connection layer is 512;

the input channel number of the II full connection layer is 512, and the output channel number is 128;

the Dropout layer is used to randomly discard 15% of neurons;

the Flatten layer is used for flattening the two-dimensional characteristic diagram to a one-dimensional vector;

the activation functions of the I and II activation function layers both adopt Relu;

the prediction output layer outputs two characteristic vectors with the length of 128, namely a prediction mean value and a prediction variance;

the inputs to the uncertainty vector generator are the predicted mean and the predicted variance to generate different uncertainty feature vectors during the training process and the testing process, which are implemented as follows:

in the training process of the uncertain vector generator, the prediction mean value and the prediction variance output by the prediction output layer are used as parameters of normal distribution, and the uncertain characteristics with the length of 128 are randomly generatedVector, assuming mean result of prediction as μ_iVariance prediction variance result is sigma_iInput is x_iThe uncertainty feature vector z generated_iThe following relationships exist:

p(z_i|x_i)＝N(z_i；μ_i,σ²I)

in the formula, the average value prediction part outputs a result mu_iCan be regarded as a prediction of the speech data characteristics, and the variance prediction part outputs a result sigma_iCan be regarded as the prediction of mu_iI is an identity matrix of length 128, where each output sample is no longer a definite value but a normal distribution N (z)_i；μ_i,σ²I) Random sampling of (2); since the gradient cannot be solved by random sampling, the backward propagation of the gradient during training is prevented, and the network iteration is blocked during training, a new parameterization is needed to be adopted to ensure that the network can still apply the gradient iteration, namely, the normal distribution is firstly used for carrying out the sigma iteration_iSampling and regenerating s_iAs z_iEquivalent of (d): s_i＝μ_i+εσ_iε to N (0,1), s_iAs z_iThe equivalent expression of (a) is the uncertainty feature vector output by this module in the training process;

in the test process, the uncertain vector generator sends the feature vector with the length of 128 into the uncertain vector generator, and the uncertain vector generator directly outputs the prediction mean value.

2.3.3) establishing a fully connected Module

The structure of the full-connection module is as follows: input layer → fully connected layer → output layer, whose output is a vector of length 2;

the parameters of each layer are as follows:

inputting a feature vector with the length of 128 generated by an uncertain vector generator into an input layer;

the number of input channels of the fully connected layer is 128 and the number of output channels is 2.

The input of the full-connection module is a feature vector with the length of 128 generated by the uncertain vector generator, and a probability vector [ a, b ] with the length of 2 is output, wherein a represents the probability of being cough, and b represents the probability of not being cough.

2.3.4) sequentially cascading a feature prediction module, a mean value and variance prediction module and a full-connection module to obtain a classifier;

and 2.4) sequentially cascading the established noise generation module, the built Mel map generation module and the built classifier to form a detector network.

Step 3, setting an objective function L of the detector network_G

3.1) setting a cross entropy loss function L between the output result of the detector network and the data set label_cross：

L_cross＝-1/2[y log a+(1-y)log b]

Wherein y is the label of the voice data in the data set, when y is 1, the voice data contains cough, and when y is 0, the voice data does not contain cough; a is the cough probability output by the detector network, and b is the non-cough probability output by the detector network;

3.2) defining a divergence function L of a normal distribution and a standard normal distribution which are formed by taking a prediction mean and a prediction variance as parameters_klI.e. in μ_iAnd σ_iNormal distribution N (mu) formed for parameters_i,σ_i) According to the formula of N (mu)_i,σ_i) Obtaining divergence function L with standard normal distribution N (0,1)_kl：

In the formula, KL is used for solving KL divergence of two probability distributions after parentheses;

3.3) mixing L_crossAnd L_klAdding to obtain an optimization function L of the classifier_G：

L_G＝L_cross+λ·L_kl

Wherein λ is a weighting parameter, λ < 1.

And 5, training the detector network.

Setting the learning rate L to be 0.0001 and the maximum iterative training time T to be 200;

27000 sections of voice data in a training data set are input to a detector network, and a noisy data set is obtained through a noise generation module; the noisy data passes through a Mel spectrogram generating module to obtain a two-dimensional Mel spectrogram containing time-frequency information; the two-dimensional Mel spectrogram is subjected to classifier to obtain probability vectors of cough and non-cough;

substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function L_crossObtaining cross entropy result, and calculating the predicted mean value mu of the detector network output_iSum variance σ_iSubstituting divergence function L_klObtaining a loss value after one training, iterating the network by adopting a random gradient reduction method according to the change of the loss value obtained by each training, updating network parameters until a set network training frequency T is reached, and finishing the training of the detector network;

and 6, testing the test set to obtain a cough detection result.

Inputting a test set consisting of 3000 voice data into a trained detector network, generating a noise-containing test set by the test set through a noise generation module, obtaining a two-dimensional Mel spectrogram through a Mel spectrogram generation module, inputting the two-dimensional Mel spectrogram into a classifier, and outputting a probability vector [ a, b ] with the length of 2, wherein a represents the probability of cough, and b represents the probability of not cough, so that the detection of cough is completed.

The effect of the present invention will be further explained with the simulation experiment.

1. Simulation conditions are as follows:

the hardware environment of the simulation experiment is: a GPU of NVIDIA GTX 1080Ti model and a running memory of 128 GB;

the software environment of the simulation experiment is as follows: the deep learning framework pytorch1.8.0.

In the simulation experiment, the accuracy of the classifier is adopted as the objective quantitative evaluation index, assuming that the total number of the test set voice segments participating in the experiment is d, and the number of accurate classification after passing through the classifier is f, the accuracy can be expressed as:

P＝f/d。

2. simulation content and result analysis

To verify the effectiveness of the present invention and to introduce the superiority of uncertain learning, comparisons were made using the method of the present invention and the AI4 method, respectively.

The AI4 method is a Cough detection method proposed by LI IMRAN et al in the paper AI4COVID-19 AI Enabled Preliminary Diagnosis for COVID-19from Covid Samples via an App, and the classifier thereof consists of a convolutional layer and a full link layer and also adopts an end-to-end training mode. In the simulation experiment, the classifier in the method is replaced by the classifier provided by AI4, and the same training set and test set are adopted to carry out the comparison experiment.

The experimental procedure was as follows:

firstly, parameters of a noise generation module in two methods are adjusted simultaneously to carry out a plurality of comparison experiments, and Gaussian noise and background noise are respectively selected as noise types; selecting SNR values of 10, 8 and 5 respectively;

then, according to the steps in the specific implementation mode, performing experimental simulation to obtain the detection result of the test set, comparing the detection result with the label of the test set to obtain whether the detection is accurate, counting the number of the accurate detections and the number of the errors in the detection to obtain the detection accuracy P, wherein the results are shown in table 1:

TABLE 1 cough detection accuracy results

As can be seen from Table 1, the invention can obtain better detection results under different noise conditions and different noise intensities, which shows that the invention has good detection results under the condition of real environmental noise.

The detection accuracy of the method is higher than that of the AI4 method under different noise types and different noise intensities, and the method is proved to be capable of obtaining better detection effect by introducing an uncertain learning method.

Claims

1. A method for cough detection based on data uncertain learning, comprising:

(3) setting an objective function L of a detector network_G：

L_G＝L_cross+λL_kl，

(4) training the detector network:

4a) setting a learning rate L and a maximum iterative training time T;

4c) substituting the probability vectors for cough and non-cough with the labels of the training set into a cross-entropy loss function L_crossObtaining cross entropy result, and predicting mean value mu output by mean value and variance prediction module in detector network_iSum variance σ_iSubstituted into L_klObtaining a loss value after one training by using a divergence function, iterating the network by adopting a random gradient descent method according to the change of the loss value obtained by each training, updating network parameters until a set network training frequency T is reached, and finishing the training of the detector network;

2. The method of claim 1, wherein the cough detection data set is preprocessed in (1) by:

2a) setting the frequency of all downloaded cough voice data and non-cough voice data to 16000hz, carrying out normalization processing on the voice data, and mapping the voice data into a range from-1 to 1;

2b) and processing the cough voice data and the non-cough voice data after line normalization in different modes:

processing of cough voice data: firstly, intercepting the cough voice data into audio with the length of 0.5s-1 s; then expanding the intercepted voice data, namely filling the intercepted voice data with blank voice, and expanding the audio frequency into 1s long audio frequency;

processing non-cough voice data: the non-cough data is intercepted as 1s long audio.

3. The method according to claim 1, wherein the noise generation module in (2) is a voice signal adder, the input is voice signal, noise type and signal-to-noise ratio, and the output is a noise-added voice signal; when the noise type is white noise, adding Gaussian white noise to the voice signal; when the noise type is selected as background noise, the noise type is used for adding the background noise which is common in life to the voice signal, and the magnitude of the signal-to-noise ratio is related to the intensity of the added noise.

4. The method according to claim 1, wherein the Mel pattern generation module in (2) is structured as follows:

input → framing processor → windower → FFT generator → time domain stacker → Mel filter bank → output;

the framing processor is used for dividing the input voice signal after noise addition at a fixed time interval on a time domain to generate a voice section taking a frame as a unit;

the time domain stacker is used for stacking the frequency domain signals of all frames on a time domain to obtain a spectrogram;

the Mel filter bank is used for converting the sound spectrogram into Mel spectrogram and outputting the Mel spectrogram.

5. The method of claim 1, wherein the feature prediction module structure and parameters in (2) are as follows:

the structure is as follows: input layer → 1 st max pooling layer → 1 st convolution layer → 1 st activation function layer → 2 nd convolution layer → 2 nd activation function layer → 2 nd max pooling layer → 1 st residual block layer → 2 nd residual block layer → 3 rd convolution layer → 3 rd activation function → 4 th convolution layer → 4 th activation function → 3 rd max pooling layer → output layer;

parameters of each layer are as follows:

relu is used for activation functions of all activation function layers;

And the output channel number of each residual block is 64.

6. The method of claim 5, wherein each residual block in the feature prediction module is composed of a first convolutional layer, a first activation function layer, a second convolutional layer, a second activation function layer, an adder, and sequentially cascaded, and the parameters of each layer are as follows:

7. The method of claim 1, wherein the mean and variance prediction module, structure and parameters in (2) are as follows:

the structure is as follows: input layer → first normalization layer → Dropout layer → scatter layer → first fully-connected layer → first activation function layer → second fully-connected layer → second activation function layer → second normalization layer → prediction output layer → uncertain vector generator;

the parameters of each layer are as follows:

the Dropout layer is used to randomly discard 15% of neurons;

the input of the uncertainty vector generator is a prediction mean and a prediction variance, which are used for generating different uncertainty feature vectors in a training process and a testing process.

8. The method of claim 7, wherein the uncertainty vector generator generates different uncertainty feature vectors during the training process and the testing process, by:

in the training process, the prediction mean value and the prediction variance output by the prediction output layer are used as parameters of normal distribution, uncertainty characteristic vectors with the length of 128 are randomly generated, and the prediction mean value result is assumed to be mu_iVariance prediction variance result is sigma_iInput is x_iThe uncertainty feature vector z generated_iThe following relationships exist:

p(z_i|x_i)＝N(z_i；μ_i,σ²I)

wherein the mean value predicting part outputs a result mu_iCan be regarded as pair voicePrediction of data characteristics, and the result sigma output by variance prediction part_iCan be regarded as the prediction of mu_iI is an identity matrix of length 128, where each output sample is no longer a definite value but a normal distribution N (z)_i；μ_i,σ²I) Random sampling of (2); because the infinitesimal random sampling can prevent the back propagation of the gradient during training, the network iteration is blocked in the training process, and the network can still apply the gradient iteration by adopting reparameterization, namely firstly, the normal distribution is used for carrying out the sigma pair_iSampling and regenerating s_iAs z_iEquivalent of (d): s_i＝μ_i+εσ_iε～N(0,1)

S is_iAs z_iThe equivalent representation of (a) is the output of this module in the training process;

during the test, the feature vector with the length of 128 enters an uncertain vector generator, and the uncertain vector generator directly outputs the prediction mean.

9. The method of claim 1, wherein the fully connected modules, structures and parameters are as follows:

the structure is as follows: input layer → fully connected layer → output layer, whose output is a vector of length 2;

the parameters of each layer are as follows:

10. The method of claim 1, wherein the objective function L_GCross entropy loss function L in (1)_crossIs represented as follows:

L_cross＝-1/2[y log a+(1-y)log b]

wherein y is the label of the voice data, when y is 1, the voice data contains the cough, and when y is 0, the voice data does not contain the cough; a is the probability of coughing output by the fully connected module, and b is the probability of non-coughing output by the fully connected module.