CN113158916A

CN113158916A - Cough sound automatic identification algorithm, device, medium and equipment

Info

Publication number: CN113158916A
Application number: CN202110450820.0A
Authority: CN
Inventors: 莫鸿强; 曾键沣; 周樊; 章臻
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-04-26
Filing date: 2021-04-26
Publication date: 2021-07-23
Anticipated expiration: 2041-04-26
Also published as: CN113158916B

Abstract

The invention provides an automatic cough sound identification algorithm, a device, a medium and equipment; the algorithm comprises the following steps: obtaining a sample to be identified; carrying out end point detection on the sample, intercepting a fixed length sequence with the duration of t from the initial point, setting the fixed length sequence as a signal sequence and carrying out normalization processing; obtaining a prediction signal sequence and a residual signal sequence by utilizing linear prediction coding; determining MFCC parameters MFCC for a predicted signal sequence_rAnd the short-time energy en of the residual signal sequence_rConstructing a feature vector; and inputting the feature vector into a linear SVM classifier, and judging whether the type of the sample to be recognized is a cough sound sample or a non-cough sound sample according to the output of the linear SVM classifier. The algorithm overcomes the problem that the time domain characteristics of the excitation source are easily influenced by the resonance of the sound channel, and has good identification capability and high identification accuracy.

Description

Cough sound automatic identification algorithm, device, medium and equipment

Technical Field

The invention relates to the technical field of medical instruments and medical signal processing, in particular to an automatic cough sound identification algorithm, device, medium and equipment.

Background

Cough is a natural reflex action of a human body and is a common symptom of respiratory diseases such as asthma, pneumonia, laryngitis, chronic obstructive pulmonary disease and the like. In clinic, doctors often use information such as cough frequency and intensity of patients as important basis for diagnosis, but the information is mainly from subjective description of patients and has poor reliability. The recording and automatic identification of the cough sounds, unlike the subjective description of the patient, facilitates the doctor to obtain the patient cough information more objectively and accurately.

One of the major difficulties in automatic recognition of cough sounds is the small number of samples available for training and testing of recognition models. At present, a large-scale cough sound database is not disclosed, and the number of cough sound samples collected by each research team is limited; in addition, the non-cough sound samples are complex in type, and the number of samples is far larger than that of the cough sound samples, so that the problem of sample imbalance exists. The lack of samples of the cough sounds and the imbalance of the training samples cause great difficulty in establishing an automatic cough sound recognition model.

The selection of targeted features to minimize undetermined parameters of the recognition model is an effective measure to overcome the above difficulties. Many documents currently employ Mel-Frequency Cepstrum Coefficient (MFCC) as a feature of cough sound; however, from the mechanism of cough sound formation, MFCC is not comprehensive enough in characterization of cough sound.

In cough, the contraction of abdominal muscles and diaphragm muscles causes the lungs to exhale a large amount of gas instantly, and the airflow impact of the lungs causes the vocal tract to vibrate to give out cough sound. According to its generation mechanism, the coughing tone can be viewed as a response produced by an excitation source acting on a vocal tract model, where the excitation source reflects the pulmonary airflow and the vocal tract model reflects the vocal tract shape. In order to accurately identify the cough sound, the feature extraction of the cough sound signal needs to fully consider information of both an excitation source and a sound channel, but currently, common automatic cough sound identification algorithms mostly utilize features related to sound channel information, such as MFCC, and excitation source information is not utilized.

For cough sound recognition, the primary information of the excitation source is its intensity over time and its variation in intensity; however, the convolution relation exists between the vocal tract model and the excitation source in the time domain, and the intensity of the excitation source signal cannot be directly determined by the cough sound; if the short-time energy/intensity of the coughing sound is simply used as the intensity of the excitation source, the obtained result will be affected by the resonance of the vocal tract and have a large error. Therefore, the extraction of the excitation source feature has a certain difficulty.

Disclosure of Invention

To overcome the disadvantages and shortcomings of the prior art, the present invention provides an automatic cough sound recognition algorithm, apparatus, medium and device. The algorithm utilizes Linear Predictive Coding (LPC) to realize the separation of a vocal tract model from an excitation source so as to overcome the difficulty in the extraction of the characteristics of the excitation source caused by the influence of vocal tract resonance; the short-time energy of the MFCC and the excitation source is used as the characteristic of the automatic cough sound identification algorithm, and the characteristic simultaneously contains information of a vocal tract model and the excitation source so as to solve the problem that the common algorithm neglects the characteristic of the excitation source; the algorithm utilizes a Support Vector Machine (SVM) as a classifier to solve the problem that the model is difficult to train due to insufficient samples and unbalanced samples; in addition, the algorithm only utilizes the first stage extraction features of the cough sound, the model is simple, the calculation amount is small, and the method is convenient to implement in wearable equipment.

In order to achieve the purpose, the invention is realized by the following technical scheme: an automatic cough sound recognition algorithm, comprising: the method comprises the following steps:

s1, obtaining a sample to be identified, wherein the sampling frequency of the sample to be identified is f_s；

S2, carrying out end point detection on the sample, intercepting a fixed length sequence with duration of t from a starting point, and setting the fixed length sequence as a signal sequence r (k); wherein k is 1, 2, …, N; n is t × f_sAnd N is a positive integer;

s3, normalizing the signal sequence r (k) to obtain

S4, using linear predictive coding to encode the signal sequence

Decomposition into a predicted signal sequence s_r(k) And residual signal sequence e_r(k) (ii) a Wherein, the order of the linear predictive coding model is p;

s5, obtaining a predicted signal sequence S_r(k) The Q-dimensional MFCC parameter MFCC_r＝[m_r1，m_r2，…m_rQ]^T；

Step S6, obtaining a residual signal sequence e_r(k) Short-time energy en of_r(ii) a Wherein en_rIs a scalar;

s7, obtaining the MFCC parameter MFCC_rAnd short-term energy en_rConstructed as feature vectors c_r＝[m_r1，m_r2，…m_rQ,en_r]^T；

S8, obtaining the feature vector c_rInputting the data into a linear SVM classifier, and outputting f (c) according to the linear SVM classifier_r) Judging the type of the sample to be identified: if f (c)_r) If the value is 1, the sample is a cough sound sample; if f (c)_r) If the value is-1, the sample is a non-cough sound sample; the linear SVM classifier is a linear SVM classifier obtained by training and testing an initial linear SVM classifier.

Preferably, the training and testing process of the initial linear SVM classifier includes: the method comprises the following steps:

step Y1, at sampling frequency f_sCollecting a plurality of cough sound samples and non-cough sound samples; setting the cough sound sample label to be 1, and setting the non-cough sound sample label to be-1; respectively dividing a training set and a testing set from a cough sound sample and a non-cough sound sample;

y2, processing each sample of the training set according to the steps from S2 to S7 respectively to obtain a feature vector of each sample;

step Y3, establishing a linear SVM classifier with the decision function of f (c) sgn [ w^Tc+b](ii) a Wherein c is a feature vector of the signal to be decided; f (c) is a decision value of the feature vector c, f (c) ═ 1 indicates that the signal to be decided is a cough sound, and f (c) ═ 1 indicates that the signal to be decided is a non-cough sound; w is a weight vector; b is the hyperplane offset;

y4, characterizing the sampleUsing the quantity as input, using a sample label corresponding to the sample as expected output, and training the established linear SVM classifier to obtain a decision function f (c) ═ sgn [ w [ ]^*Tc+b^*](ii) a Wherein, w^*And b^*Respectively obtaining the optimal weight vector and the hyperplane offset obtained by training;

step Y5, processing each sample of the test set according to the step Y2 respectively to obtain a feature vector of each sample;

y6, inputting the feature vector of the sample into a trained linear SVM classifier; comparing the output f (c) of the linear SVM classifier with a sample label corresponding to the sample, and calculating the sensitivity and the specificity according to the comparison result;

judging whether the sensitivity and the specificity meet the preset requirements:

if so, ending the training and testing;

otherwise, jumping to the step Y4 to retrain the decision function f (c) of the linear SVM classifier again^*Tc+b^*]。

Preferably, in the step Y1, the divided training set includes m cough sound samples and n non-cough sound samples;

the step Y2 is as follows: processing each sample of the training set according to the step S2 to obtain a cough sound signal sequence x_i(k) And a sequence of non-cough tone signals y_j(k) (ii) a Wherein, i is 1, 2, … m, j is 1, 2, … n;

sequence of cough tone signals x_i(k) And a sequence of non-cough tone signals y_j(k) Respectively processing according to the steps from S3 to S7 to obtain the characteristic vector of the cough sound sample

And feature vectors of non-cough sound samples

The feature set of the training sample is

Which corresponds to a set of sample tags of

Wherein all cough sound samples are labeled as

All non-cough sound samples are labeled as

The index x of the eigenvector in the characteristic set phi and the sample label in the sample label set T_iOr y_jOne-to-one correspondence is realized;

and Y4, taking each feature vector of the feature set phi as input, taking a sample label corresponding to the feature vector in the sample label set T as expected output, and training the established linear SVM classifier to obtain a decision function f (c) ═ sgn [ w ]^*Tc+b^*]；

The step Y5, processing each sample of the test set according to the step Y2 respectively to obtain a feature vector of each sample; obtaining a feature set phi 'of the test sample and a corresponding sample label set T';

and Y6, taking each feature vector in the feature set phi 'of the test sample as the input of the linear SVM classifier, and comparing the output f (c) of the linear SVM classifier with the sample label corresponding to the feature vector in the sample label set T'.

Preferably, in step S2, the duration t has a value range of: t is more than or equal to 30ms and less than or equal to 50 ms.

Preferably, in step S3, the normalization formula is:

where max (×) represents the maximum of all elements in the set.

Preferably, in the step S4, the order p of the linear predictive coding model is 8 ≦ p ≦ 12;

in step S5, the MFCC dimension Q is 12.

Preferably, in step S6, the calculation formula of the short-time energy is

In the step Y6, in order to calculate sensitivity and specificity, the following convention is made:

1) appointing that the cough sound sample is a positive sample, and the non-cough sound sample is a negative sample;

2) if the sample label of a certain sample is 1 and the output of the linear SVM classifier taking the feature vector of the sample as input is 1, the sample is a true positive sample; if the sample label of a certain sample is-1 and the output of the linear SVM classifier taking the feature vector of the sample as input is-1, the sample is a true negative sample; if the sample label of a certain sample is-1 and the output of the linear SVM classifier taking the feature vector of the sample as input is 1, the sample is a false positive sample; if the sample label of a certain sample is 1 and the output of the linear SVM classifier taking the feature vector of the sample as input is-1, the sample is a false negative sample;

based on the convention, the calculation method of the sensitivity and the specificity comprises the following steps: counting the number of samples with true positive, true negative, false positive and false negative in the test set, and respectively recording as TP, TN, FP and FN, then the sensitivity is

With a specificity of

An automatic cough sound recognition device, characterized in that: the method comprises the following steps:

a sample obtaining module for obtaining a sample to be identified, wherein the sampling frequency of the sample to be identified is f_s；

The preprocessing module is used for carrying out end point detection on the sample, intercepting a fixed length sequence with the duration of t from a starting point and setting the fixed length sequence as a signal sequence r (k); wherein k is 1, 2, …, N; n is t × f_sAnd N is a positive integer; normalizing the signal sequence r (k) to

A feature extraction module for encoding the signal sequence using linear predictive coding

Decomposition into a predicted signal sequence s_r(k) And residual signal sequence e_r(k) (ii) a Wherein, the order of the linear predictive coding model is p; determining a predicted signal sequence s_r(k) The Q-dimensional MFCC parameter MFCC_r＝[m_r1，m_r2，…m_rQ]^T(ii) a Obtaining a residual signal sequence e_r(k) Short-time energy en of_r(ii) a Wherein en_rIs a scalar; the resulting MFCC parameter MFCC is measured_rAnd short-term energy en_rConstructed as feature vectors c_r＝[m_r1，m_r2，…m_rQ，en_r]^T；

An identification module for identifying the obtained feature vector c_rInputting the data into a linear SVM classifier, and outputting f (c) according to the linear SVM classifier_r) Judging the type of the sample to be identified: if f (c)_r) If the value is 1, the sample is a cough sound sample; if f (c)_r) If the value is-1, the sample is a non-cough sound sample; the linear SVM classifier is a linear SVM classifier obtained by training and testing an initial linear SVM classifier.

A storage medium, wherein the storage medium stores a computer program which, when executed by a processor, causes the processor to execute the above cough sound automatic recognition algorithm.

A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor implements the cough sound automatic identification algorithm when executing the program stored in the memory.

The principle of the automatic identification algorithm of the invention is as follows:

linear predictive coding can enable deconvolution of the cough tone signal channel model and the excitation source in the time domain. The cough tone signal can be viewed as a response from an input sequence (characterizing the pulmonary airflow) exciting an all-pole system (characterizing the shape of the vocal tract), which acts as a vocal tract model with a transfer function:

wherein R (z) and U (z) are respectively the z-transforms of the signal sequence r (k) and the excitation source u (k). The relationship between r (k) and u (k) can be expressed as a difference equation:

therein, the system

Called linear predictor, i.e. the current value is predicted by a linear combination of p historical time values, a_iReferred to as linear prediction coefficients. Linear prediction coefficients are a description of the vocal tract model, from which the time-domain sequence of the vocal tract model, i.e. the predicted signal sequence s, can be derived_r(k) In that respect While linearly predicted residual

The remainder of the original signal sequence minus the predicted signal sequence is described as the excitation source of the cough tone signal.

To find the optimal linear prediction coefficient, the sum of the squares of the errors of the linear prediction is defined as

Adjustment of parameter a_iWhen E is minimum, the optimal linear prediction coefficient is obtained. To pair

Finding E for a_jIf the partial derivative of (1) is 0, then

A total of p equations is obtained. Order to

The system of equations can be simplified to

The system of linear equations may be solved using an autocorrelation method. The solving method comprises the following steps: defining a windowed autocorrelation function of r (k) as

Thereby the device is provided with

Can be expressed as

Therefore, the formula (7) can be represented as

Equation (9) is called Yule-Walker equation, and the system of equations composed of such equations can be efficiently solved by the Levinson-Durbin algorithm. After the equation set is solved, the predicted signal sequence s can be obtained by linear prediction coefficient recursion_r(k) And a residual signal sequence e_r(k) In that respect Thus, the vocal tract model and the excitation source realize the deconvolution in the time domain.

The initial stage of the cough sound contains the most important information in the cough sound, and if only the initial stage of the cough sound is analyzed, a time-varying system can be simplified into a constant system, and the complexity of a model is reduced. Practice shows that the duration t of the first stage of the cough sound is about 30ms ≦ t ≦ 50ms, so that the target segment intercepted in step S2 of the present invention is preferably about 30ms to 50ms after the original signal from the start point of the end point detection.

During signal acquisition, the distance between a sound source and equipment and the gain of the equipment influence the amplitude of the acquired digital signal, and further unnecessary random errors are brought to the calculation of short-time energy in the step S6. Therefore, step S3 of the present invention normalizes the signal to reduce the error.

Linear predictive coding treats the vocal tract as a resonant cavity, and the formants are the resonant frequencies of this cavity. The vocal tract model is an all-pole model with the number of poles being p, and one pair of poles corresponds to one formant. Practice shows that for most signals, it is sufficient to model the sound channels accurately if the order p of linear predictive coding is 8 ≦ p ≦ 12. Therefore, the value range of the order p of the linear predictive coding in step S4 of the present invention is preferably: p is more than or equal to 8 and less than or equal to 12.

The vocal tract model often selects cepstrum-like parameters that reflect the shape of the vocal tract as features, but generally the cepstrum parameters are particularly sensitive to noise. The human ear is able to hear speech signals from noisy background noise because the human inner ear base membrane modulates incoming signals. For different frequencies, signals within the corresponding critical bandwidth will cause vibrations at different locations on the base membrane. MFCC imitates human ear hearing through band-pass filter bank to reduce the influence of noise to speech, have stronger robustness to noise. Therefore, the present invention, at step S5, selects MFCC as the vocal tract model feature.

The separation of the vocal tract model from the excitation source makes the determination of the time-domain characteristics of the excitation source free from vocal tract resonance, and the determination of the characteristics of the excitation source is thus possible. The short-time energy is defined as:

it represents the excitation source intensity of the intercepted segment. The exciting source reflects the airflow impacting the sound channel during coughing, and the cough sound has the characteristics of paroxysmal property and high intensity, so that the signal energy is strong and concentrated, and the time domain characteristics of the cough sound can be effectively reflected by using short-time energy. Therefore, the invention S6 selects the short-time energy of the residual signal as the excitation source characteristic.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. the invention realizes the separation of the vocal tract model and the excitation source through the linear predictive coding, and overcomes the problem that the time domain characteristic of the excitation source is easily influenced by vocal tract resonance;

2. the invention increases the short-time energy of the excitation source as the characteristic of the cough sound identification, overcomes the problem that the conventional common automatic cough sound identification algorithm only utilizes the characteristic related to the sound channel information but often does not utilize the excitation source information, and has more comprehensive description on the cough sound;

3. the invention overcomes the problems of model difficult training caused by insufficient samples and unbalanced samples;

4. the method has good identification capability on the cough sound and high identification accuracy;

5. the method only identifies the initial stage of the cough sound, simplifies the model, reduces the calculated amount, has low requirement on hardware and is convenient to realize in wearable equipment.

Drawings

FIG. 1 is a flow chart of an embodiment of an automatic cough sound identification algorithm;

fig. 2(a) to 2(c) are respectively a time-domain waveform diagram of a cough sound, a short-time energy diagram directly calculated by a cough sound sequence, and a short-time energy diagram calculated by a cough sound residual signal sequence;

3(a) -3 (i) are time domain waveform diagrams of cough sound, impact sound and fricative sound, time domain waveform diagrams of residual signal sequence and short-time energy diagrams of residual signal sequence, respectively;

FIG. 4 is a classification hyperplane schematic of a linear SVM classifier.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.

Example one

In the cough sound automatic identification algorithm, sample segments are intercepted and normalized for feature extraction; processing the sample by utilizing linear predictive coding to respectively obtain a predicted signal and a residual signal, wherein for a cough sound signal, the obtained predicted signal reflects the characteristics of a sound channel model, and the residual signal reflects the characteristics of an excitation source; respectively solving MFCC and short-time energy from the prediction signal and the residual signal and constructing a combination characteristic; and inputting the data into a linear SVM classifier to judge whether the sample type is cough sound or non-cough sound.

The working flow is shown in fig. 1, and comprises the following steps:

s1, obtaining a sample to be identified, wherein the sampling frequency of the sample to be identified is f_s。

S2, carrying out end point detection on the sample, intercepting a fixed length sequence with duration of t from a starting point, and setting the fixed length sequence as a signal sequence r (k); wherein k is 1, 2, …, N; n is t × f_sAnd N is a positive integer; the value range of the duration t is preferably: t is more than or equal to 30ms and less than or equal to 50 ms.

S3, normalizing the signal sequence r (k) to obtain

The normalized formula is:

where max (×) represents the maximum of all elements in the set.

S4, using linear predictive coding to encode the signal sequence

Decomposition into a predicted signal sequence s_r(k)And residual signal sequence e_r(k) (ii) a Wherein, the order of the linear predictive coding model is p; the order p is preferably 8. ltoreq. p.ltoreq.12.

S5, obtaining a predicted signal sequence S_r(k) The Q-dimensional MFCC parameter MFCC_r＝[m_r1,m_r2,…m_rQ]^T(ii) a The MFCC dimension Q is preferably 12.

Step S6, obtaining a residual signal sequence e_r(k) Short-time energy en of_r(ii) a Wherein en_rIs a scalar; the short-time energy is calculated by the formula

S7, obtaining the MFCC parameter MFCC_rAnd short-term energy en_rConstructed as feature vectors c_r＝[m_r1，m_r2，…m_rQ，en_r]^T。

S8, obtaining the feature vector c_rInputting to a linear SVM classifier, classifying according to the linear SVM

Output f (c) of the device_r) Judging the type of the sample to be identified: if f (c)_r) If the value is 1, the sample is a cough sound sample; if f (c)_r) If the value is-1, the sample is a non-cough sound sample; the linear SVM classifier is a linear SVM classifier obtained by training and testing an initial linear SVM classifier.

The training and testing processing of the initial linear SVM classifier is as follows: the method comprises the following steps:

step Y1, at sampling frequency f_sCollecting a plurality of cough sound samples and non-cough sound samples; setting the cough sound sample label to be 1, and setting the non-cough sound sample label to be-1; respectively dividing a training set and a testing set from a cough sound sample and a non-cough sound sample; for example, 2/3 samples accounting for the total number of the samples are respectively selected from the two types of samples to construct a training set; the remaining 1/3 samples, which account for the total number of respective samples, comprise the test set; the divided training set comprises m cough sound samples and n non-cough sound samples; the partitioned test set contains m 'cough sound samples and n' non-cough sound samples.

Y2, processing each sample of the training set according to S2 steps to obtain a cough sound signal sequence x_i(k) And a sequence of non-cough tone signals y_j(k) (ii) a Wherein, i is 1, 2, … m, j is 1, 2, … n.

And feature vectors of non-cough sound samples

The feature set of the training sample is

Which corresponds to a set of sample tags of

Wherein all cough sound samples are labeled as

All non-cough sound samples are labeled as

The index x of the eigenvector in the characteristic set phi and the sample label in the sample label set T_iOr y_jAnd correspond to each other.

Step Y3, establishing a linear SVM classifier with the decision function of f (c) sgn [ w^Tc+b](ii) a Wherein c is a feature vector of the signal to be decided; f (c) is a decision value of the feature vector c, f (c) ═ 1 indicates that the signal to be decided is a cough sound, and f (c) ═ 1 indicates that the signal to be decided is a non-cough sound; w is a weight vector; b is the hyperplane offset.

And Y4, taking each feature vector of the feature set phi as input, taking the sample label corresponding to the feature vector in the sample label set T as expected output, and training the established linear SVM classificationDerived decision function f (c) sgn [ w^*Tc+b^*](ii) a Wherein, w^*And b^*The optimal weight vector and the hyperplane offset obtained by training are respectively.

Step Y5, processing each sample of the test set according to the step Y2 respectively to obtain a feature vector of each sample; obtaining a feature set phi 'of the test sample and a corresponding sample label set T';

step Y6, taking each feature vector in the feature set phi 'of the test sample as the input of a linear SVM classifier, and comparing the output f (c) of the linear SVM classifier with the sample label corresponding to the feature vector in the sample label set T'; calculating the sensitivity and specificity according to the comparison result;

if so, ending the training and testing;

To calculate sensitivity and specificity, the following convention is made:

With a specificity of

If the sensitivity and specificity of the obtained test result reach the preset requirements, the model can be applied to cough sound recognition.

Fig. 2(a) is a time-domain waveform diagram of a cough sound, fig. 2(b) is a short-time energy diagram obtained by directly calculating the cough sound sequence of fig. 2(a), and fig. 2(c) is a short-time energy diagram obtained by decomposing the cough sound sequence into a prediction signal sequence and a residual signal sequence by linear predictive coding and then calculating the residual signal sequence.

Fig. 3(a) is a time domain waveform diagram of a cough sound, fig. 3(b) is a time domain waveform diagram of a cough sound residual signal sequence obtained by a linear predictive coding process, and fig. 3(c) is a short-term energy diagram corresponding to the residual signal sequence of fig. 3 (b); fig. 3(d) is a time domain waveform diagram of the impact sound, fig. 3(e) is a time domain waveform diagram of the impact sound residual signal sequence obtained by the linear predictive coding process, and fig. 3(f) is a short-time energy diagram corresponding to the residual signal sequence of fig. 3 (e); fig. 3(g) is a time domain waveform diagram of the fricatives, fig. 3(h) is a time domain waveform diagram of the fricatives residual signal sequence obtained by the linear predictive coding process, and fig. 3(i) is a short-time energy diagram corresponding to the residual signal sequence of fig. 3 (h).

The principle of the computational algorithm of the invention is:

as shown in fig. 2(a) to 2(c), during the first stage of the cough, due to the contraction of the abdominal and diaphragm muscles, the lungs exhale a large amount of gas instantaneously, the short-term energy of the residual signal shows a peak as in the front part of fig. 2(c), and then the lung gas is still exhausted continuously and slowly, the short-term energy of the residual signal tends to be flat, and during the third stage, due to the periodic vibration of the vocal cords caused by the closing of the glottis, the small fluctuation appears again, and fig. 2(c) reflects the characteristics of the impulsive airflow during the cough very well. While fig. 2(b) is affected by the vocal tract resonance, a high peak with a large amplitude appears after the first stage, which is obviously inconsistent with the impulsive airflow during the cough, and the excitation source of the cough sound signal cannot be accurately described. Therefore, before calculating the time domain feature of the excitation source, the deconvolution of the acoustic channel and the excitation source should be performed, and if the short-time energy of the original signal is directly used as the feature, a large error exists.

Comparing the cough sound residual signal of fig. 3(b) and the impact sound residual signal of fig. 3(e), the amplitudes of which are close in the initial burst phase, the residual signal of the following impact sound decays rapidly, while the residual signal of the cough sound lasts for a longer time because of the continuous airflow impact during the cough, while the impact sound is excited only at one instant of the impact. Comparing the short-term energy of the cough residual signal of fig. 3(c) with the short-term energy of the impact residual signal of fig. 3(f), the two peaks appear, except that the former has a longer peak duration and is maintained at a certain level after attenuation, while the latter has only one impact with a short duration. Generally, because the duration of an energy burst is short, the short-term energy of a residual signal of a burst sound such as a clap and an impact is generally smaller than that of a cough sound, and therefore, the cough sound and various burst sounds can be effectively distinguished through the short-term energy of the residual signal.

Comparing the cough sound waveform of fig. 3(a) with the frictional sound waveform of fig. 3(g), the two waveforms are relatively close and difficult to distinguish clearly. Comparing the cough sound residual signal of fig. 3(b) with the fricative sound residual signal of fig. 3(h), both have continuous amplitudes, but the former amplitude is significantly higher than the latter due to the fact that the air flow impact intensity at cough is much stronger than the excitation of normal fricative sound. Comparing the short-term energy of the cough residual signal of fig. 3(c) with the short-term energy of the fricative residual signal of fig. 3(i), the latter has no distinct peak and is lower in amplitude overall than the former. Generally, although the friction sound such as the friction of the stool and the fingernail has a continuous excitation effect, the short-time energy of the residual signal is generally smaller than that of the cough sound due to the low burst intensity, so that the cough sound and various friction sounds can be effectively distinguished through the short-time energy of the residual signal. In summary, it is appropriate that the step S6 of the present invention selects the short-term energy of the residual signal as the excitation source characteristic.

The SVM is a data mining method based on the statistical learning theory, which constructs a hyperplane with optimal decision based on the principle of minimizing structural risk, as shown in fig. 4, so that the hyperplane can ensure the classification accuracy and maximize the distance between the convex set (represented by support vector) of two types of samples and the hyperplane. In other words, the SVM classification hyperplane is determined by the support vector, and most training samples except the support vector do not need to be reserved after model training is completed. The characteristic of the SVM enables a large number of samples to participate in model training, and the influence of the missing of part of samples on the final model is small, so that the SVM has inherent advantages in the aspect of processing small sample statistics, can provide good popularization capability, and is insensitive to the problem of sample imbalance. In theory, SVM can achieve optimal classification of linearly separable problems, while for linearly inseparable problems, by introducing an inner product kernel function, the input space can be transformed to a high-dimensional space where it is linearly separable. Therefore, in step S8, a linear SVM classifier is selected as the classifier.

The following describes the training process of the linear SVM classifier in the automatic recognition algorithm of the present invention with specific examples:

step Y1, with f_sCollecting a plurality of cough sound samples and non-cough sound samples for 8000Hz, and respectively selecting 2/3 samples accounting for the total number of the samples from the two samples to construct a training set, wherein the obtained training set comprises 104 cough sound samples and 636 non-cough sound samples; the remaining 1/3 samples, which account for the total number of respective samples, constitute a test set, the resulting test set containing 51 cough sound samples and 320 non-cough sound samples; here, the cough sound sample flag is set to 1, and the non-cough sound sample flag is set to-1.

And Y2, respectively carrying out end point detection on each sample of the training set, and intercepting the sample with the length of N being 400 from the starting point and the duration of N being 400

Obtaining the sounding signal sequence x from the fixed length sequence_i(k) And a sequence of non-cough tone signals y_j(k) (ii) a Wherein, i is 1, 2, … 104, j is 1, 2, … 636, k is 1, 2, …, 400;

sequence of cough tone signals x_i(k) Normalized to obtain

Sequence of non-cough tone signals y_j(k) Normalized to obtain

Wherein max (×) represents the maximum of all elements in the set;

method for coding a cough signal using linear predictive coding

Decomposition into predicted signal sequences

And residual signal sequence

Sequencing non-cough tone signals

Decomposition into predicted signal sequences

And residual signal sequence

Wherein, the order of the linear predictive coding is p-12;

predicting signal sequences for cough sounds

Solving 12-dimensional MFCC parameters

Predicting signal sequences for non-cough sounds

Solving 12-dimensional MFCC parameters

As a cough residual signal sequence

Finding short-time energy

As non-coughing tone residual signal sequences

Finding short-time energy

Constructing MFCC parameters and short-time energy resulting from processing of cough sounds as a combined feature

Constructing MFCC parameters and short-time energy resulting from processing non-cough sounds as a combined signature

Thus, the feature set of the training sample is obtained

Which corresponds to a set of sample tags of

Wherein all cough sound samples are labeled as

All non-cough sound samples are labeled as

The characteristic vector in phi and the sample label in T are indexed by x_iOr y_jOne-to-one correspondence。

And step Y4, taking each feature vector of the feature set phi obtained in the step Y2 as input, taking a sample label corresponding to the feature vector in the sample label set T as expected output, and training a decision function f (c) of the linear SVM classifier established in the step Y3 to obtain sgn [ w^*Tc+b^*](ii) a Wherein the optimal weight vector is w^*＝[-0.0901，-0.1392，-0.1756，-0.0798，-0.0487，0.0179，-0.0777，-0.2485,0.0359，-0.2898，0.0889，-0.2116，0.1081]^TThe hyperplane offset is b^*＝-6.9896。

And Y5, calculating the feature vectors of the test samples in the test set according to the method of the Y2 step, and constructing a feature set phi 'and a corresponding test sample label set T'.

Y6, taking each feature vector in the test sample feature set phi 'as the input of the linear SVM classifier obtained by the training in the Y4 step, comparing the output f (c) of the linear SVM classifier with the corresponding sample label of the feature vector in the test sample label set T', and calculating the sensitivity and the specificity according to the comparison result; the sensitivity obtained is Sn ═ 0.961, and the specificity is Sp ═ 0.981, meeting the predetermined requirements, and the model can be applied to cough sound recognition.

When the linear SVM classifier obtained by the training and testing processing is applied to the cough sound automatic recognition algorithm, a sample to be recognized is obtained, and the processing is carried out according to steps from S2 to S7 to obtain a feature vector c_r＝[-10.3715，-14.0365,-19.3825，-6.7138，-0.1099，-4.5139，-1.5521，-8.1192,-1.9553,-1.6719，-0.5847，0.0388，25.6368](ii) a Substituting the linear SVM classifier into a linear SVM classifier obtained by training and testing to obtain classifier output f (c)_r) When the value is 1, the sample is judged as a cough sound sample.

In order to implement the cough sound automatic identification algorithm, the embodiment further provides a cough sound automatic identification device, including:

Example two

A storage medium of the present embodiment stores a computer program, and the computer program, when executed by a processor, causes the processor to execute the cough sound automatic recognition algorithm according to the first embodiment.

EXAMPLE III

The computing device of the embodiment includes a processor and a memory for storing a processor executable program, and when the processor executes the program stored in the memory, the cough sound automatic identification algorithm described in the first embodiment is implemented.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An automatic cough sound recognition algorithm, comprising: the method comprises the following steps:

s3, normalizing the signal sequence r (k) to obtain

S4, using linear predictive coding to encode the signal sequence

The step of step S6 is executed,obtaining a residual signal sequence e_r(k) Short-time energy en of_r(ii) a Wherein en_rIs a scalar;

s7, obtaining the MFCC parameter MFCC_rAnd short-term energy en_rConstructed as feature vectors c_r＝[m_r1，m_r2，…m_rQ，en_r]^T；

2. The cough sound automatic recognition algorithm of claim 1, wherein: the training and testing processing of the initial linear SVM classifier is as follows: the method comprises the following steps:

step Y4, taking the feature vector of the sample as input, taking the sample label corresponding to the sample as expected output, training the established linear SVM classifier to obtain the decision function f (c) ═ sgn [ w^*Tc+b^*](ii) a Wherein, w^*And b^*Respectively obtained by trainingThe optimal weight vector and the hyperplane offset;

if so, ending the training and testing;

3. The cough sound automatic recognition algorithm of claim 2, wherein: the step Y1, namely, the divided training set comprises m cough sound samples and n non-cough sound samples;

And feature vectors of non-cough sound samples

The feature set of the training sample is

Which corresponds to a set of sample tags of

Wherein all cough sound samples are labeled as

All non-cough sound samples are labeled as

4. The cough sound automatic recognition algorithm of claim 1, wherein: in the step S2, the value range of the duration t is: t is more than or equal to 30ms and less than or equal to 50 ms.

5. The cough sound automatic recognition algorithm of claim 1, wherein: in the step S3, the normalization formula is:

where max (×) represents the maximum of all elements in the set.

6. The cough sound automatic recognition algorithm of claim 1, wherein: in the step S4, the order p of the linear predictive coding model is more than or equal to 8 and less than or equal to 12;

in step S5, the MFCC dimension Q is 12.

7. The cough sound automatic recognition algorithm of claim 1, wherein: in the step S6, the formula for calculating the short-time energy is

8. An automatic cough sound recognition device, characterized in that: the method comprises the following steps:

9. A storage medium having stored thereon a computer program which, when executed by a processor, causes the processor to execute the cough sound automatic recognition algorithm of any one of claims 1-7.

10. A computing device comprising a processor and a memory for storing processor-executable programs, wherein the processor, when executing a program stored in the memory, implements the cough tone automatic recognition algorithm of any one of claims 1-7.