CN108701469B

CN108701469B - Cough sound recognition method, device, and storage medium

Info

Publication number: CN108701469B
Application number: CN201780008985.4A
Authority: CN
Inventors: 刘洪涛; 冯澍婷; 孟亚彬
Original assignee: Shenzhen H&T Intelligent Control Co Ltd
Current assignee: Shenzhen H&T Intelligent Control Co Ltd
Priority date: 2017-07-31
Filing date: 2017-07-31
Publication date: 2023-06-20
Anticipated expiration: 2037-07-31
Also published as: WO2019023879A1; CN108701469A

Abstract

A cough voice recognition method, apparatus, and storage medium, the method comprising: sampling a sound signal and obtaining a mel-frequency cepstrum coefficient characteristic parameter matrix (201) of the sound signal; extracting signal features from a mel-frequency cepstral coefficient feature parameter matrix of the sound signal (202); confirming whether the signal features match a pre-acquired cough signal feature model (203) based on a support vector data description algorithm; if so, the sound signal is confirmed as a cough sound (204). The method and the device can identify the cough sound, so that the cough condition can be monitored by monitoring the sound emitted by a user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.

Description

Cough sound recognition method, device, and storage medium

Technical Field

Embodiments of the present application relate to sound processing technology, and in particular, to a method, apparatus, and storage medium for recognizing cough sounds.

Background

Cough is an indicator of the efficacy of treatment or progression of certain diseases (e.g., asthma, etc.). The detailed and accurate cough state information (such as cough times per hour, cough time and the like) has important clinical guidance significance for disease diagnosis. Studies have shown that intelligent cough monitoring devices are more accurate than manual cough discrimination. The current intelligent cough monitoring equipment is mainly used for medical monitoring, and a patient is required to wear complex equipment for monitoring, which undoubtedly brings inconvenience to the user.

Currently, there are studies on the recognition of isolated cough sounds of a specific person by combining the characteristics of cough sounds with a speech recognition technique, creating a cough model, and using a model matching method based on a dynamic time warping algorithm (Dynamic Time Warping, DTW). The cough can be monitored by monitoring the sound made by the user without the need for the user to wear any detection means.

In the process of implementing the present application, the inventor finds that at least the following problems exist in the related art: the cough voice recognition method based on the DTW algorithm has high algorithm complexity, large calculation amount and higher requirement on hardware equipment.

Disclosure of Invention

The invention aims to provide a cough voice recognition method, device and storage medium, which can recognize cough voice, and has simple algorithm, small calculation amount and low requirement on hardware equipment.

To achieve the above object, in a first aspect, an embodiment of the present application provides a cough sound recognition method for a recognition device, the method including:

sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;

extracting signal features from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;

confirming whether the signal characteristics are matched with a pre-acquired cough signal characteristic model based on a support vector data description algorithm or not;

if so, the sound signal is confirmed as a cough sound.

Optionally, the method further comprises:

and pre-acquiring the cough signal characteristic model based on the support vector data description algorithm.

Optionally, the pre-acquiring the cough signal feature model based on the support vector data description algorithm includes:

collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;

extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;

and taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain the cough signal characteristic model based on the support vector data description algorithm.

Optionally, the signal features include one or more sub-signal features of an energy feature, a local feature, and an overall trend feature.

Optionally, if the signal features include energy features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:

selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;

the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the cough sound sample signal;

the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:

selecting the energy coefficient of the continuous frame sound signal with the maximum energy coefficient sum of the preset proportion from the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;

and rectifying the energy coefficient of the continuous frame sound signal to a preset length based on a dynamic time rectifying algorithm to obtain the energy characteristic of the sound signal.

Optionally, if the signal feature includes a local feature, the extracting the signal feature from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:

selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer;

determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain local characteristics of the cough sound sample signal, wherein the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal is positively correlated with the energy coefficient of the S2 frame cough sound sample signal;

selecting the mel frequency cepstrum coefficient of the S2 frame sound signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;

And determining the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the Mel frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristics of the sound signal, wherein the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal.

Optionally, if the signal features include global trend features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:

performing dimension reduction processing on the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the cough sound sample signal;

and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the sound signal.

Optionally, the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on the support vector data description algorithm, and one or more sub-signal feature models based on the support vector data description algorithm from a local feature model based on the support vector data description algorithm and an overall trend feature model based on the support vector data description algorithm;

if the cough signal feature model based on the support vector data description algorithm includes multiple sub-signal feature models based on the support vector data description algorithm, the determining whether the signal feature matches the pre-acquired cough signal feature model based on the support vector data description algorithm includes:

and respectively determining whether various sub-signal features in the signal features are matched with the pre-acquired sub-signal feature models based on the support vector data description algorithm.

In a second aspect, embodiments of the present application further provide a cough sound recognition device, including:

a sound input unit for receiving a sound signal;

a signal processing unit for performing analog signal processing on the sound signal;

The signal processing unit is connected with an internal or external operation processing unit of the cough voice recognition device, and the operation processing unit comprises:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform:

if so, the sound signal is confirmed as a cough sound.

Optionally, the at least one processor is further capable of performing:

and respectively determining whether various sub-signal features in the signal features are matched with various pre-acquired sub-signal feature models based on the support vector data description algorithm.

In a third aspect, embodiments of the present application further provide a storage medium storing executable instructions that, when executed by a cough sound recognition device, cause the cough sound recognition device to perform the above-described method.

In a fourth aspect, embodiments of the present application also provide a program product comprising a program stored on a storage medium, the program comprising program instructions which, when executed by a cough sound recognition device, cause the cough sound recognition device to perform the above-described method.

The cough voice recognition method, the device and the storage medium can recognize the cough voice, so that the cough condition can be monitored by monitoring the voice sent by the user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.

Drawings

One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.

FIG. 1 is a schematic structural diagram of an application environment of embodiments of the present application;

FIG. 2 is a time-amplitude plot of a cough sound signal;

FIG. 3 is a time-frequency plot of a cough sound signal;

FIG. 4 is a schematic diagram of Mel frequency filtering in the MFCC coefficients calculation process;

FIG. 5 is a schematic flow chart of a feature model obtained in advance based on a support vector data description algorithm in the cough voice recognition method according to the embodiment of the present application;

FIG. 6 is a flow chart of a cough voice recognition method according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of a cough sound recognition device according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a cough voice recognition apparatus provided in an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The embodiment of the application provides a cough voice recognition scheme based on a mel frequency cepstrum coefficient (Mel Frequency Cepstral Coefficients, MFCC) characteristic parameter and a support vector data description algorithm (Support Vector Data Description, SVDD) model, which is suitable for an application environment shown in fig. 1. The application environment includes a user 10 and a coughing sound recognition device 20, the coughing sound recognition device 20 being configured to receive a sound emitted by the user 10 and to recognize the sound to determine whether the sound is a coughing sound.

Further, after recognizing the sound as a cough sound, the cough sound recognition apparatus 20 may further record and process the cough sound to output cough condition information of the user 10, which may include the number of times the cough sound, the duration of the cough sound, and the decibel of the cough sound. For example, the coughing sounds may be counted by including a counter in the coughing sound recognition device for counting the coughing sounds when the coughing sounds are detected; the time duration of the cough sound may be counted when the cough sound is detected by including a timer in the cough sound recognition device; it is possible to detect the decibel of the coughing sound when the coughing sound is detected by including decibel detection means in the coughing sound recognition apparatus.

The recognition principle of the cough sound is similar to that of the voice recognition, and the input sound is processed and then compared with the sound model, so that a recognition result is obtained. It can be divided into two phases, a cough sound model training phase and a cough sound recognition phase. The cough sound model training stage mainly comprises the steps of collecting a certain number of cough sound samples, calculating MFCC characteristic parameters of a cough sound signal, extracting signal characteristics from the MFCC characteristic parameters, and carrying out model training on the signal characteristics based on an SVDD algorithm to obtain a reference characteristic model of the cough sound. In the cough sound identification stage, the MFCC characteristic parameters of the sound to be judged are calculated, the signal characteristics corresponding to the characteristic models are extracted, then whether the signal characteristics match the characteristic models is judged, if so, the cough sound is judged, and otherwise, the non-cough sound is judged. The recognition process mainly comprises preprocessing, feature extraction, model training, pattern matching, judgment and the like.

Wherein the preprocessing step comprises sampling the cough sound signal and calculating MFCC coefficients of the cough sound signal. In the feature extraction step, energy features, overall trend features, and local features of the cough sound signal are selected from the MFCC coefficient matrix as inputs to obtain an SVDD model. In the model training step, three SVDD models, namely an SVDD energy feature model, an SVDD local feature model and an SVDD overall trend feature model, are trained according to three types of features extracted from an MFCC coefficient matrix of a cough sound signal. In the pattern matching and deciding step, three SVDD models are used to identify whether the new sound signal is a cough sound signal. Firstly, calculating an MFCC coefficient matrix of a sound signal, then extracting energy characteristics, overall trend characteristics and local characteristics of the sound signal from the MFCC coefficient matrix, and respectively judging whether the three characteristics are matched with an SVDD energy characteristic model, an SVDD local characteristic model and an SVDD overall trend characteristic model, if so, judging that the sound signal is a cough sound signal, otherwise, judging that the sound signal is not the cough sound signal.

The scheme of combining the MFCC with the SVDD to identify the cough sound can simplify the complexity of the algorithm, reduce the calculated amount and remarkably improve the accuracy of the cough sound identification.

The embodiment of the present application provides a method for identifying a cough sound, which may be used in the above-mentioned device 20 for identifying a cough sound, where the method for identifying a cough sound needs to obtain in advance a feature model based on a support vector data description algorithm, that is, a feature model based on an SVDD algorithm, where the feature model based on an SVDD algorithm may be preconfigured, or may be obtained by training the method in steps 101 to 103, and after training to obtain a feature model based on an SVDD algorithm, the cough sound may be identified based on the feature model based on an SVDD algorithm, and further, if the accuracy is not acceptable when the feature model based on an SVDD algorithm is used for identifying a cough sound due to scene transformation or other reasons, the feature model based on an SVDD algorithm may be reconfigured or trained.

Wherein, as shown in fig. 5, the obtaining the feature model based on the support vector data description algorithm in advance includes:

step 101: collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;

And sampling to obtain a cough sound sample signal s (n), and acquiring a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal according to the cough sound sample signal. The mel-frequency cepstrum coefficient is mainly used for extracting sound data characteristics and reducing operation dimensionality. For example: for a frame of 512-dimensional (sampling point) data, the most important 40-dimensional data can be extracted after MFCC processing, and the purpose of dimension reduction is achieved. Mel-frequency cepstral coefficient calculation generally includes: pre-emphasis, framing, windowing, fast fourier transforms, mel-filter banks, and discrete cosine transforms.

The obtaining of the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal specifically comprises the following steps:

(1) pre-emphasis

The purpose of pre-emphasis is to boost the high frequency part, flatten the spectrum of the signal, remain in the whole frequency band from low frequency to high frequency, and can use the same signal-to-noise ratio to find the spectrum. At the same time, the high-frequency part of the sound signal restrained by the sound producing system is compensated for by eliminating the effects of the vocal cords and lips in the occurrence process, and the resonance peak of the high frequency is highlighted. The implementation method is that the sampled cough sound sample signal s (n) is pre-emphasized by a first-order finite length unit impulse response (Finite Impulse Response, FIR) high-pass digital filter, and the transfer function is:

H(z)＝1-a·z ^-1 (1)

Wherein z represents the input signal, the time domain represents the cough sound sample signal s (n), a represents the pre-emphasis coefficient, and a constant in 0.9-1.0 is generally adopted.

(2) Framing

Each P sampling points in the cough sound sample signal s (n) are collected into one observation unit, called a frame. The value of P may be 256 or 512, covering about 20-30 ms. To avoid excessive variation between two adjacent frames, an overlap region may be provided between two adjacent frames, where the overlap region includes M samples, and the value of M may be about 1/2 or 1/3 of P. Typically the sampling frequency of the sound signal is 8KHz or 16KHz, and for 8KHz, if the frame length is 256 sampling points, the corresponding time length is 256/8000 x 1000=32 ms.

(3) Window

Each frame is multiplied by a hamming window to increase the continuity at the left and right ends of the frame. Assuming that the framed signal is S (n), n=0, 1 …, P-1, P is the frame size, then after multiplication by a hamming window, S' (n) =s (n) ×w (n), where,

where l represents the window length.

(4) Fast fourier transform (Fast Fourier Transform, FFT)

Since the transformation of a signal in the time domain is often difficult to see the characteristics of the signal, it is often transformed into an energy distribution in the frequency domain for observation, and different energy distributions can represent the characteristics of different sounds. After multiplication by the hamming window, each frame must also undergo a fast fourier transform to obtain the energy distribution over the spectrum. And performing fast Fourier transform on each frame of signals subjected to framing and windowing to obtain the frequency spectrum of each frame. And square the spectrum of the voice signal by modulo to obtain the power spectrum of the voice signal.

(5) Triangular band-pass filter filtering

The energy spectrum is filtered through a set of triangular filter banks of mel scale. A filter bank with M filters (the number of filters is similar to the number of critical bands) is defined, the filter used is a triangular filter with a center frequency f (M), m=1, 2. M may be 22-26. The interval between f (m) decreases as the value of m decreases and increases as the value of m increases, see fig. 4.

The frequency response of the triangular filter is defined as:

wherein

(6) Discrete cosine transform

The logarithmic energy of each filter bank output is calculated as:

discrete cosine transform (Dual Clutch Transmission, DCT) of the logarithmic energy s (m) yields MFCC coefficients:

step 102: extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;

as can be seen from equation (5), the MFCC coefficient is a coefficient matrix of n×l, where N is the number of audio signal frames and L is the MFCC coefficient length. Since the MFCC coefficient matrix has a high dimension and the sound signal length is inconsistent, the number N of rows of the matrix is different, and the MFCC coefficient matrix cannot be used as a direct input to obtain the SVDD model. There is a need for further extraction of significant features from the MFCC coefficient matrix for direct input to the SVDD model.

In order to further extract the effective features from the MFCC coefficient matrix, the MFCC coefficient matrix needs to be reduced in dimension, but the effective features of the cough sound signal may be lost when the MFCC coefficient matrix is directly reduced in dimension, so that the effective features can be extracted from the MFCC coefficient matrix by combining the time domain and frequency domain characteristics of the cough sound signal.

Referring to fig. 2, fig. 2 is a time-amplitude diagram (time domain diagram) of the cough sound signal, and it can be seen from fig. 2 that the cough sound signal has a short occurrence process and obvious burstiness, and the duration of a single cough sound is usually less than 550ms, and even for patients suffering from serious throat and bronchia diseases, the duration of the single cough sound is also usually maintained at about 1000 ms. From an energy perspective, the energy of the cough sound signal is mainly concentrated in the first half of the signal. Thus, the energy coefficients of the relatively concentrated energy signal segments may be selected as energy features to characterize the cough sound sample signal, e.g. a set of energy coefficients of the first 1/2 part signal is selected from the cough sound sample signal as energy features and the energy features are used as inputs to build an SVDD model for identifying the sound signal.

The length of the energy coefficient is different because the different lengths of the cough sound sample signals lead to different numbers of the parameter matrix rows N. It is therefore necessary to unify the energy coefficients to the same length.

Specifically, extracting energy features from a mel-frequency cepstral coefficient feature parameter matrix of a cough sound sample signal includes:

and (3) rectifying the energy coefficient of the continuous frame cough sound sample signal to a preset length based on a DTW algorithm to obtain the energy characteristic of the cough sound sample signal.

In a specific application, in combination with the energy distribution of the cough sound signal, the continuous frame cough sound sample signal with the preset proportion of the maximum sum of the energy coefficients can be the first 1/2 part, the first 4/7 part or the first 5/9 part of the cough sound sample signal, and the like. The preset length can be set according to actual application conditions.

As can be seen from fig. 2, the most cough sound signals (about 90%) change in a substantially uniform manner, and after the cough pulse occurs, the signal energy decreases rapidly, the dry cough decreases more rapidly, and the wet cough decreases more slowly. Therefore, the characteristic of the cough sound signal can be well represented by the variation trend potential of the cough sound signal, the integral trend characteristic (the integral trend characteristic can reflect the variation trend of the signal) can be extracted from the MFCC coefficient matrix of the cough sound signal, and the integral trend characteristic is taken as input to establish an SVDD model to identify the sound signal.

Specifically, the overall trend characteristic of the cough sound sample signal can be obtained by performing dimension reduction processing on a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm (Landing Distance Available, LDA).

Fig. 3 is a time-frequency plot (spectrogram) of the cough sound signal, and it can be seen from fig. 3 that the spectrum energy is also concentrated in the beginning of the signal, and the frequency distribution range is wide (typically concentrated in 200-6000 Hz). Thus, the MFCC coefficients of several frames of signals in the spectrum energy set in the cough sound sample signal may be selected as local features to characterize the cough sound signal, and the local features are used as inputs to build an SVDD model to identify the sound signal. Specifically, the local features may be obtained by: and selecting a plurality of frames of signals with the most concentrated energy from the cough sound sample signals, and then distributing different weights for the MFCC coefficients of each frame of signals and adding the weights to obtain the local characteristics of the cough sound sample signals. Because the weights of the mel-frequency cepstrum coefficients of the cough sound sample signal are positively correlated with the energy coefficients of the cough sound sample signal, the weight values may be determined from the energy coefficients of the cough sound sample signal. Namely: selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer; and then determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain the local feature of the cough sound sample signal.

Through the analysis, the energy characteristic, the local characteristic and the overall trend characteristic can reflect the characteristics of the cough sound signal, and one or more sub-signal characteristics of the energy characteristic, the local characteristic and the overall trend characteristic are extracted from the MFCC coefficient matrix of the cough sound sample signal. And the one or more sub-signal features are used as input, and an SVDD model is established to identify the sound signals, so that the accuracy of cough sound identification is greatly improved, and the false identification rate is reduced. To improve the accuracy of cough voice recognition, energy features, local features, and overall trend features may be extracted simultaneously in the MFCC coefficient matrix. When the energy characteristics, the local characteristics and the overall trend characteristics are simultaneously extracted from the MFCC coefficient matrix of the cough sound sample signal to serve as input, and the SVDD model is trained to recognize the sound signal, the recognition rate of the cough sound can reach more than 95%.

Other dimensionality reduction methods may be used to reduce the MFCC coefficients of the cough sound sample signal, such as DTW, principal component analysis (Principal Component Analysis, PCA) or other algorithms. In the situation that the MFCC coefficient of the cough sound sample signal is subjected to dimensionality reduction by adopting the PCA algorithm, and the SVDD model is trained by utilizing the dimensionality-reduced parameters, the obtained SVDD model of the cough sound signal has small distinguishing degree of the cough sound and the noise, the cough sound recognition rate is about 85%, and the noise misrecognition rate is up to 65%.

Step 103: and taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain the cough signal characteristic model based on the support vector data description algorithm.

Where the signal features include energy features, local features, and global trend features, the energy features, local features, and global trend features are taken as inputs, respectively, and SVDD models are trained, i.e., an SVDD model for training energy features (energy feature model), an SVDD model for local features (local feature model), and an SVDD model for global trend features (global trend feature model). Thus, a cough signal characteristic model based on a support vector data description algorithm, which consists of an energy characteristic model, a local characteristic model and a whole trend characteristic model, is obtained.

The SVDD basic principle is to calculate a sphere decision boundary for the input samples, dividing the whole space into two parts, one part being the space within the boundary, and considering as an acceptable part; the other part is the space outside the boundary, which is regarded as the reject part. This gives the SVDD a class of classification characteristics for the sample.

Specifically, the SVDD is optimized by solving a minimum sphere with a center of a and a radius of R:

So that this sphere satisfies (for data x of 3 dimensions or more _i The spherical surface is an hypersphere. Wherein, the hypersphere refers to the sphere in the space of more than 3 dimensions, the curve is in the corresponding 2-dimensional space, the sphere is in the 3-dimensional space):

this condition is satisfied, that is, the data points in the training dataset are all contained in the sphere, where x _i Representing input sample data, i.e., cough sound sample signals.

Now, with the object to be solved and the constraint, the following solving method can use Lagrangian multiplier method:

wherein α_i ≥0,γ _i Not less than 0, respectively for the parameters R, a, xi _i Obtaining the partial derivative and making the derivative equal to 0:

substituting the above (7), (8) and (9) into formula (6) gives the dual problem:

wherein

The above vector inner product can be solved by the kernel function K, namely:

the value of the center a and the radius R can be obtained through the calculation process, namely, the SVDD model is determined. After the centers a1, a2 and a3 and the radiuses R1, R2 and R3 of the 3 SVDD models are respectively obtained by training through the calculation process, the energy characteristic model, the local characteristic model and the overall trend characteristic model are respectively corresponding, and the training process is completed.

In the training process, the size and the range of the hypersphere are controlled to enable the hypersphere to contain as many sample points as possible, and the radius of the hypersphere is required to be minimized to achieve the optimal classifying effect.

Specifically, in the embodiment of the application, each model corresponds to one hypersphere, and on the premise of containing all cough sound signals, the hypersphere boundary is optimized to minimize the radius of the hypersphere, and finally the cough signal characteristic model which is most satisfactory and is based on a support vector data description algorithm is obtained, so that the accuracy is high when the cough signal characteristic model based on the support vector data description algorithm is used for identifying the signal characteristics of the extracted sound signals.

As shown in fig. 6, the cough sound recognition method includes:

step 201, sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;

in practical applications, a sound input unit (for example, a microphone) may be provided on the cough sound recognition device 20 to collect a sound signal, and the sound signal may be converted into a digital signal after being amplified, filtered, or the like. The digital signal may be sampled and processed by a computing unit local to the cough voice recognition device 20, or may be uploaded to a cloud server, an intelligent terminal, or other servers via a network for processing.

The technical details of obtaining the mel-frequency cepstrum coefficient characteristic parameter matrix of the sound signal refer to step 101, and are not described herein.

Step 202, extracting signal characteristics from a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;

in the case where the feature model based on the support vector data description algorithm acquired in advance includes an energy feature model, a local feature model, and an overall trend feature model, one or more of the energy feature, the local feature, and the overall trend feature may be extracted from the feature parameter matrix of the sound signal. In order to improve the recognition accuracy, three features, namely energy features, local features and overall trend features, can be extracted entirely. The specific calculation method of the energy characteristics, the local characteristics and the overall trend characteristics of the sound signal is referred to step 102, and will not be described herein.

Step 203, confirming whether the signal characteristics match with a pre-acquired cough signal characteristic model based on a support vector data description algorithm;

in the case that the feature model obtained in advance based on the support vector data description algorithm includes an energy feature model, a local feature model and an overall trend feature model, whether the energy feature, the local feature and the overall trend feature obtained in the step 202 conform to the feature model, that is, whether the energy feature conforms to the energy feature model, whether the local feature conforms to the local feature model and whether the overall trend feature conforms to the overall trend feature model is respectively judged. As can be seen from the discussion of step 103, the energy feature model, the local feature model and the overall trend feature model are hyperspherical models with centers of a1, a2 and a3 and radii of R1, R2 and R3 respectively. When judging whether the energy feature, the local feature and the overall trend feature conform to the feature model, the distances D1, D2 and D3 from the energy feature, the local feature and the overall trend feature to the centers a1, a2 and a3 can be calculated respectively, and only when all three features are within the boundary of the SVDD model (namely, D1< R1, D2< R2 and D3< R3), the sound sample can be judged to be cough sound.

Step 204, if the voice signals are matched, confirming that the voice signals are cough voice.

According to the cough sound identification method, the cough sound can be identified, so that the cough condition can be monitored by monitoring the sound emitted by the user, and the user does not need to wear any detection component. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.

Accordingly, the embodiment of the present application further provides a cough sound recognition apparatus for recognizing the device 20, the apparatus includes:

the sampling and characteristic parameter obtaining module 301 is configured to sample a sound signal and obtain a mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;

a signal feature extraction module 302, configured to extract a signal feature from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;

a feature matching module 303, configured to confirm whether the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm;

and a confirmation module 304, configured to confirm the sound signal as a cough sound if the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm.

The cough sound recognition device provided by the embodiment of the application can recognize the cough sound, so that the cough condition can be monitored by monitoring the sound emitted by a user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.

Optionally, in other embodiments of the apparatus, the apparatus further comprises:

the feature model presetting module is used for acquiring the cough signal feature model based on the support vector data description algorithm in advance;

the feature model preset module is specifically configured to:

and training a support vector data description algorithm model by taking the signal characteristics of the cough sound sample signal as input so as to obtain a cough signal characteristic model based on the support vector data description algorithm.

Wherein optionally, in certain embodiments of the apparatus, the signal characteristics include: the signal features include one or more sub-signal features of energy features, local features, and global trend features.

Optionally, in some embodiments of the apparatus, if the signal features include energy features, the extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:

the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm;

and the energy coefficient of the continuous frame sound signal is regulated to a preset length based on a dynamic time regulating algorithm.

Optionally, in some embodiments of the apparatus, if the signal features include local features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:

Optionally, in some embodiments of the apparatus, if the signal features include global trend features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:

Optionally, in some embodiments of the apparatus, the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on the support vector data description algorithm, a sub-signal feature model based on the support vector data description algorithm that is one or more of a local feature model based on the support vector data description algorithm and a global trend feature model based on the support vector data description algorithm;

if the cough signal feature model based on the support vector data description algorithm includes multiple sub-signal feature models based on the support vector data description algorithm, the determining whether the cough signal feature matches the pre-acquired cough signal feature model based on the support vector data description algorithm includes:

It should be noted that, the above device may execute the method provided by the embodiment of the present application, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.

The embodiment of the application also provides a cough sound recognition device, as shown in fig. 8, the cough sound recognition device 20 includes a sound input unit 21, a signal processing unit 22, and an arithmetic processing unit 23. Wherein: a sound input unit 21 for receiving a sound signal, which may be, for example, a microphone or the like. A signal processing unit 22 for performing signal processing on the sound signal; the signal processing unit 22 may perform analog signal processing such as amplification, filtering, digital-to-analog conversion, etc. on the sound signal, and send the obtained digital signal to the operation processing unit 23.

The signal processing unit 22 is connected to an internal or external operation processing unit 23 of the cough voice recognition device (fig. 8 illustrates that the operation processing unit is built in the cough voice recognition device), the operation processing unit 23 may be built in the cough voice recognition device 20 or external to the cough voice recognition device 20, and the operation processing unit 23 may be a remotely located server, for example, a cloud server, an intelligent terminal, or other servers communicatively connected to the cough voice recognition device 20 through a network.

The arithmetic processing unit 23 includes:

at least one processor 232 (one processor is illustrated in fig. 8) and memory 231, the processor 232 and memory 231 may be connected by a bus or otherwise, with a bus connection being an example in fig. 8.

The memory 231 is used for storing nonvolatile software programs, nonvolatile computer executable programs, and modules, such as program instructions/modules (e.g., the sampling and feature parameter obtaining module 301 shown in fig. 7) corresponding to the cough voice recognition method in the embodiment of the present application. The processor 232 executes various functional applications and data processing by running nonvolatile software programs, instructions and modules stored in the memory 231, i.e., implements the cough sound recognition method of the above-described method embodiment.

The memory 231 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the cough voice recognition device, etc. In addition, memory 231 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 231 optionally includes memory remotely located relative to processor 232, which may be connected to the cough voice recognition device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The one or more modules are stored in the memory 231, which when executed by the one or more processors 232, perform the cough sound recognition method of any of the method embodiments described above, e.g., perform method steps 101-103 of fig. 5 described above, and method steps 201-204 of fig. 6; the functions of modules 301-304 in fig. 7 are implemented.

The cough voice recognition device can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.

Embodiments of the present application provide a storage medium storing computer-executable instructions that are executable by one or more processors (e.g., one processor 232 in fig. 8) to cause the one or more processors to perform the method of cough sound recognition in any of the method embodiments described above, e.g., perform method steps 101-103 in fig. 5, and method steps 201-204 in fig. 6 described above; the functions of modules 301-304 in fig. 7 are implemented.

The embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, but may also be implemented by means of hardware. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program may include processes implementing the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of the present application as described above, which are not provided in details for the sake of brevity; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of cough voice recognition, the method comprising:

wherein the signal features include energy features, local features, and global trend features;

taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain a cough signal characteristic model based on the support vector data description algorithm;

the cough signal feature model based on the support vector data description algorithm comprises an energy feature model based on the support vector data description algorithm, a local feature model based on the support vector data description algorithm and an overall trend feature model based on the support vector data description algorithm;

if the energy features match the energy feature model, the local features match the local feature model, and the overall trend features match the overall trend feature model, then the sound signal is confirmed to be cough sound;

wherein if the signal features include energy features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:

the energy coefficient of the continuous frame sound signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the sound signal;

the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:

wherein if the signal features include local features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:

Determining the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the mel-frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristic of the sound signal, wherein the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal;

If the signal features include global trend features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:

performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the sound signal;

and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the cough sound sample signal.

2. A cough sound recognition device, characterized in that the cough sound recognition device comprises:

a sound input unit for receiving a sound signal;

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

3. A storage medium storing executable instructions that, when executed by a cough sound recognition device, cause the cough sound recognition device to perform the method of claim 1.