CN108701469B - Cough sound recognition method, device, and storage medium - Google Patents

Cough sound recognition method, device, and storage medium Download PDF

Info

Publication number
CN108701469B
CN108701469B CN201780008985.4A CN201780008985A CN108701469B CN 108701469 B CN108701469 B CN 108701469B CN 201780008985 A CN201780008985 A CN 201780008985A CN 108701469 B CN108701469 B CN 108701469B
Authority
CN
China
Prior art keywords
signal
cough
sound
mel
coefficient
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201780008985.4A
Other languages
Chinese (zh)
Other versions
CN108701469A (en
Inventor
刘洪涛
冯澍婷
孟亚彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen H&T Intelligent Control Co Ltd
Original Assignee
Shenzhen H&T Intelligent Control Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen H&T Intelligent Control Co Ltd filed Critical Shenzhen H&T Intelligent Control Co Ltd
Publication of CN108701469A publication Critical patent/CN108701469A/en
Application granted granted Critical
Publication of CN108701469B publication Critical patent/CN108701469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

A cough voice recognition method, apparatus, and storage medium, the method comprising: sampling a sound signal and obtaining a mel-frequency cepstrum coefficient characteristic parameter matrix (201) of the sound signal; extracting signal features from a mel-frequency cepstral coefficient feature parameter matrix of the sound signal (202); confirming whether the signal features match a pre-acquired cough signal feature model (203) based on a support vector data description algorithm; if so, the sound signal is confirmed as a cough sound (204). The method and the device can identify the cough sound, so that the cough condition can be monitored by monitoring the sound emitted by a user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.

Description

Cough sound recognition method, device, and storage medium
Technical Field
Embodiments of the present application relate to sound processing technology, and in particular, to a method, apparatus, and storage medium for recognizing cough sounds.
Background
Cough is an indicator of the efficacy of treatment or progression of certain diseases (e.g., asthma, etc.). The detailed and accurate cough state information (such as cough times per hour, cough time and the like) has important clinical guidance significance for disease diagnosis. Studies have shown that intelligent cough monitoring devices are more accurate than manual cough discrimination. The current intelligent cough monitoring equipment is mainly used for medical monitoring, and a patient is required to wear complex equipment for monitoring, which undoubtedly brings inconvenience to the user.
Currently, there are studies on the recognition of isolated cough sounds of a specific person by combining the characteristics of cough sounds with a speech recognition technique, creating a cough model, and using a model matching method based on a dynamic time warping algorithm (Dynamic Time Warping, DTW). The cough can be monitored by monitoring the sound made by the user without the need for the user to wear any detection means.
In the process of implementing the present application, the inventor finds that at least the following problems exist in the related art: the cough voice recognition method based on the DTW algorithm has high algorithm complexity, large calculation amount and higher requirement on hardware equipment.
Disclosure of Invention
The invention aims to provide a cough voice recognition method, device and storage medium, which can recognize cough voice, and has simple algorithm, small calculation amount and low requirement on hardware equipment.
To achieve the above object, in a first aspect, an embodiment of the present application provides a cough sound recognition method for a recognition device, the method including:
sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
extracting signal features from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;
confirming whether the signal characteristics are matched with a pre-acquired cough signal characteristic model based on a support vector data description algorithm or not;
if so, the sound signal is confirmed as a cough sound.
Optionally, the method further comprises:
and pre-acquiring the cough signal characteristic model based on the support vector data description algorithm.
Optionally, the pre-acquiring the cough signal feature model based on the support vector data description algorithm includes:
collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;
extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;
and taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain the cough signal characteristic model based on the support vector data description algorithm.
Optionally, the signal features include one or more sub-signal features of an energy feature, a local feature, and an overall trend feature.
Optionally, if the signal features include energy features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;
the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the cough sound sample signal;
the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
selecting the energy coefficient of the continuous frame sound signal with the maximum energy coefficient sum of the preset proportion from the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
and rectifying the energy coefficient of the continuous frame sound signal to a preset length based on a dynamic time rectifying algorithm to obtain the energy characteristic of the sound signal.
Optionally, if the signal feature includes a local feature, the extracting the signal feature from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer;
determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain local characteristics of the cough sound sample signal, wherein the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal is positively correlated with the energy coefficient of the S2 frame cough sound sample signal;
the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
selecting the mel frequency cepstrum coefficient of the S2 frame sound signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
And determining the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the Mel frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristics of the sound signal, wherein the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal.
Optionally, if the signal features include global trend features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
performing dimension reduction processing on the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the cough sound sample signal;
the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the sound signal.
Optionally, the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on the support vector data description algorithm, and one or more sub-signal feature models based on the support vector data description algorithm from a local feature model based on the support vector data description algorithm and an overall trend feature model based on the support vector data description algorithm;
if the cough signal feature model based on the support vector data description algorithm includes multiple sub-signal feature models based on the support vector data description algorithm, the determining whether the signal feature matches the pre-acquired cough signal feature model based on the support vector data description algorithm includes:
and respectively determining whether various sub-signal features in the signal features are matched with the pre-acquired sub-signal feature models based on the support vector data description algorithm.
In a second aspect, embodiments of the present application further provide a cough sound recognition device, including:
a sound input unit for receiving a sound signal;
a signal processing unit for performing analog signal processing on the sound signal;
The signal processing unit is connected with an internal or external operation processing unit of the cough voice recognition device, and the operation processing unit comprises:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform:
sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
extracting signal features from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;
confirming whether the signal characteristics are matched with a pre-acquired cough signal characteristic model based on a support vector data description algorithm or not;
if so, the sound signal is confirmed as a cough sound.
Optionally, the at least one processor is further capable of performing:
and pre-acquiring the cough signal characteristic model based on the support vector data description algorithm.
Optionally, the pre-acquiring the cough signal feature model based on the support vector data description algorithm includes:
Collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;
extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;
and taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain the cough signal characteristic model based on the support vector data description algorithm.
Optionally, the signal features include one or more sub-signal features of an energy feature, a local feature, and an overall trend feature.
Optionally, if the signal features include energy features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;
the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the cough sound sample signal;
The extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
selecting the energy coefficient of the continuous frame sound signal with the maximum energy coefficient sum of the preset proportion from the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
and rectifying the energy coefficient of the continuous frame sound signal to a preset length based on a dynamic time rectifying algorithm to obtain the energy characteristic of the sound signal.
Optionally, if the signal feature includes a local feature, the extracting the signal feature from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer;
determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain local characteristics of the cough sound sample signal, wherein the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal is positively correlated with the energy coefficient of the S2 frame cough sound sample signal;
The extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
selecting the mel frequency cepstrum coefficient of the S2 frame sound signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
and determining the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the Mel frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristics of the sound signal, wherein the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal.
Optionally, if the signal features include global trend features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
performing dimension reduction processing on the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the cough sound sample signal;
The extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the sound signal.
Optionally, the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on the support vector data description algorithm, and one or more sub-signal feature models based on the support vector data description algorithm from a local feature model based on the support vector data description algorithm and an overall trend feature model based on the support vector data description algorithm;
if the cough signal feature model based on the support vector data description algorithm includes multiple sub-signal feature models based on the support vector data description algorithm, the determining whether the signal feature matches the pre-acquired cough signal feature model based on the support vector data description algorithm includes:
and respectively determining whether various sub-signal features in the signal features are matched with various pre-acquired sub-signal feature models based on the support vector data description algorithm.
In a third aspect, embodiments of the present application further provide a storage medium storing executable instructions that, when executed by a cough sound recognition device, cause the cough sound recognition device to perform the above-described method.
In a fourth aspect, embodiments of the present application also provide a program product comprising a program stored on a storage medium, the program comprising program instructions which, when executed by a cough sound recognition device, cause the cough sound recognition device to perform the above-described method.
The cough voice recognition method, the device and the storage medium can recognize the cough voice, so that the cough condition can be monitored by monitoring the voice sent by the user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.
Drawings
One or more embodiments are illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements, and in which the figures of the drawings are not to be taken in a limiting sense, unless otherwise indicated.
FIG. 1 is a schematic structural diagram of an application environment of embodiments of the present application;
FIG. 2 is a time-amplitude plot of a cough sound signal;
FIG. 3 is a time-frequency plot of a cough sound signal;
FIG. 4 is a schematic diagram of Mel frequency filtering in the MFCC coefficients calculation process;
FIG. 5 is a schematic flow chart of a feature model obtained in advance based on a support vector data description algorithm in the cough voice recognition method according to the embodiment of the present application;
FIG. 6 is a flow chart of a cough voice recognition method according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a cough sound recognition device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a cough voice recognition apparatus provided in an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The embodiment of the application provides a cough voice recognition scheme based on a mel frequency cepstrum coefficient (Mel Frequency Cepstral Coefficients, MFCC) characteristic parameter and a support vector data description algorithm (Support Vector Data Description, SVDD) model, which is suitable for an application environment shown in fig. 1. The application environment includes a user 10 and a coughing sound recognition device 20, the coughing sound recognition device 20 being configured to receive a sound emitted by the user 10 and to recognize the sound to determine whether the sound is a coughing sound.
Further, after recognizing the sound as a cough sound, the cough sound recognition apparatus 20 may further record and process the cough sound to output cough condition information of the user 10, which may include the number of times the cough sound, the duration of the cough sound, and the decibel of the cough sound. For example, the coughing sounds may be counted by including a counter in the coughing sound recognition device for counting the coughing sounds when the coughing sounds are detected; the time duration of the cough sound may be counted when the cough sound is detected by including a timer in the cough sound recognition device; it is possible to detect the decibel of the coughing sound when the coughing sound is detected by including decibel detection means in the coughing sound recognition apparatus.
The recognition principle of the cough sound is similar to that of the voice recognition, and the input sound is processed and then compared with the sound model, so that a recognition result is obtained. It can be divided into two phases, a cough sound model training phase and a cough sound recognition phase. The cough sound model training stage mainly comprises the steps of collecting a certain number of cough sound samples, calculating MFCC characteristic parameters of a cough sound signal, extracting signal characteristics from the MFCC characteristic parameters, and carrying out model training on the signal characteristics based on an SVDD algorithm to obtain a reference characteristic model of the cough sound. In the cough sound identification stage, the MFCC characteristic parameters of the sound to be judged are calculated, the signal characteristics corresponding to the characteristic models are extracted, then whether the signal characteristics match the characteristic models is judged, if so, the cough sound is judged, and otherwise, the non-cough sound is judged. The recognition process mainly comprises preprocessing, feature extraction, model training, pattern matching, judgment and the like.
Wherein the preprocessing step comprises sampling the cough sound signal and calculating MFCC coefficients of the cough sound signal. In the feature extraction step, energy features, overall trend features, and local features of the cough sound signal are selected from the MFCC coefficient matrix as inputs to obtain an SVDD model. In the model training step, three SVDD models, namely an SVDD energy feature model, an SVDD local feature model and an SVDD overall trend feature model, are trained according to three types of features extracted from an MFCC coefficient matrix of a cough sound signal. In the pattern matching and deciding step, three SVDD models are used to identify whether the new sound signal is a cough sound signal. Firstly, calculating an MFCC coefficient matrix of a sound signal, then extracting energy characteristics, overall trend characteristics and local characteristics of the sound signal from the MFCC coefficient matrix, and respectively judging whether the three characteristics are matched with an SVDD energy characteristic model, an SVDD local characteristic model and an SVDD overall trend characteristic model, if so, judging that the sound signal is a cough sound signal, otherwise, judging that the sound signal is not the cough sound signal.
The scheme of combining the MFCC with the SVDD to identify the cough sound can simplify the complexity of the algorithm, reduce the calculated amount and remarkably improve the accuracy of the cough sound identification.
The embodiment of the present application provides a method for identifying a cough sound, which may be used in the above-mentioned device 20 for identifying a cough sound, where the method for identifying a cough sound needs to obtain in advance a feature model based on a support vector data description algorithm, that is, a feature model based on an SVDD algorithm, where the feature model based on an SVDD algorithm may be preconfigured, or may be obtained by training the method in steps 101 to 103, and after training to obtain a feature model based on an SVDD algorithm, the cough sound may be identified based on the feature model based on an SVDD algorithm, and further, if the accuracy is not acceptable when the feature model based on an SVDD algorithm is used for identifying a cough sound due to scene transformation or other reasons, the feature model based on an SVDD algorithm may be reconfigured or trained.
Wherein, as shown in fig. 5, the obtaining the feature model based on the support vector data description algorithm in advance includes:
step 101: collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;
And sampling to obtain a cough sound sample signal s (n), and acquiring a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal according to the cough sound sample signal. The mel-frequency cepstrum coefficient is mainly used for extracting sound data characteristics and reducing operation dimensionality. For example: for a frame of 512-dimensional (sampling point) data, the most important 40-dimensional data can be extracted after MFCC processing, and the purpose of dimension reduction is achieved. Mel-frequency cepstral coefficient calculation generally includes: pre-emphasis, framing, windowing, fast fourier transforms, mel-filter banks, and discrete cosine transforms.
The obtaining of the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal specifically comprises the following steps:
(1) pre-emphasis
The purpose of pre-emphasis is to boost the high frequency part, flatten the spectrum of the signal, remain in the whole frequency band from low frequency to high frequency, and can use the same signal-to-noise ratio to find the spectrum. At the same time, the high-frequency part of the sound signal restrained by the sound producing system is compensated for by eliminating the effects of the vocal cords and lips in the occurrence process, and the resonance peak of the high frequency is highlighted. The implementation method is that the sampled cough sound sample signal s (n) is pre-emphasized by a first-order finite length unit impulse response (Finite Impulse Response, FIR) high-pass digital filter, and the transfer function is:
H(z)=1-a·z -1 (1)
Wherein z represents the input signal, the time domain represents the cough sound sample signal s (n), a represents the pre-emphasis coefficient, and a constant in 0.9-1.0 is generally adopted.
(2) Framing
Each P sampling points in the cough sound sample signal s (n) are collected into one observation unit, called a frame. The value of P may be 256 or 512, covering about 20-30 ms. To avoid excessive variation between two adjacent frames, an overlap region may be provided between two adjacent frames, where the overlap region includes M samples, and the value of M may be about 1/2 or 1/3 of P. Typically the sampling frequency of the sound signal is 8KHz or 16KHz, and for 8KHz, if the frame length is 256 sampling points, the corresponding time length is 256/8000 x 1000=32 ms.
(3) Window
Each frame is multiplied by a hamming window to increase the continuity at the left and right ends of the frame. Assuming that the framed signal is S (n), n=0, 1 …, P-1, P is the frame size, then after multiplication by a hamming window, S' (n) =s (n) ×w (n), where,
Figure BDA0001747300170000091
where l represents the window length.
(4) Fast fourier transform (Fast Fourier Transform, FFT)
Since the transformation of a signal in the time domain is often difficult to see the characteristics of the signal, it is often transformed into an energy distribution in the frequency domain for observation, and different energy distributions can represent the characteristics of different sounds. After multiplication by the hamming window, each frame must also undergo a fast fourier transform to obtain the energy distribution over the spectrum. And performing fast Fourier transform on each frame of signals subjected to framing and windowing to obtain the frequency spectrum of each frame. And square the spectrum of the voice signal by modulo to obtain the power spectrum of the voice signal.
(5) Triangular band-pass filter filtering
The energy spectrum is filtered through a set of triangular filter banks of mel scale. A filter bank with M filters (the number of filters is similar to the number of critical bands) is defined, the filter used is a triangular filter with a center frequency f (M), m=1, 2. M may be 22-26. The interval between f (m) decreases as the value of m decreases and increases as the value of m increases, see fig. 4.
The frequency response of the triangular filter is defined as:
Figure BDA0001747300170000101
wherein
Figure BDA0001747300170000102
(6) Discrete cosine transform
The logarithmic energy of each filter bank output is calculated as:
Figure BDA0001747300170000103
discrete cosine transform (Dual Clutch Transmission, DCT) of the logarithmic energy s (m) yields MFCC coefficients:
Figure BDA0001747300170000104
step 102: extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;
as can be seen from equation (5), the MFCC coefficient is a coefficient matrix of n×l, where N is the number of audio signal frames and L is the MFCC coefficient length. Since the MFCC coefficient matrix has a high dimension and the sound signal length is inconsistent, the number N of rows of the matrix is different, and the MFCC coefficient matrix cannot be used as a direct input to obtain the SVDD model. There is a need for further extraction of significant features from the MFCC coefficient matrix for direct input to the SVDD model.
In order to further extract the effective features from the MFCC coefficient matrix, the MFCC coefficient matrix needs to be reduced in dimension, but the effective features of the cough sound signal may be lost when the MFCC coefficient matrix is directly reduced in dimension, so that the effective features can be extracted from the MFCC coefficient matrix by combining the time domain and frequency domain characteristics of the cough sound signal.
Referring to fig. 2, fig. 2 is a time-amplitude diagram (time domain diagram) of the cough sound signal, and it can be seen from fig. 2 that the cough sound signal has a short occurrence process and obvious burstiness, and the duration of a single cough sound is usually less than 550ms, and even for patients suffering from serious throat and bronchia diseases, the duration of the single cough sound is also usually maintained at about 1000 ms. From an energy perspective, the energy of the cough sound signal is mainly concentrated in the first half of the signal. Thus, the energy coefficients of the relatively concentrated energy signal segments may be selected as energy features to characterize the cough sound sample signal, e.g. a set of energy coefficients of the first 1/2 part signal is selected from the cough sound sample signal as energy features and the energy features are used as inputs to build an SVDD model for identifying the sound signal.
The length of the energy coefficient is different because the different lengths of the cough sound sample signals lead to different numbers of the parameter matrix rows N. It is therefore necessary to unify the energy coefficients to the same length.
Specifically, extracting energy features from a mel-frequency cepstral coefficient feature parameter matrix of a cough sound sample signal includes:
selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;
and (3) rectifying the energy coefficient of the continuous frame cough sound sample signal to a preset length based on a DTW algorithm to obtain the energy characteristic of the cough sound sample signal.
In a specific application, in combination with the energy distribution of the cough sound signal, the continuous frame cough sound sample signal with the preset proportion of the maximum sum of the energy coefficients can be the first 1/2 part, the first 4/7 part or the first 5/9 part of the cough sound sample signal, and the like. The preset length can be set according to actual application conditions.
As can be seen from fig. 2, the most cough sound signals (about 90%) change in a substantially uniform manner, and after the cough pulse occurs, the signal energy decreases rapidly, the dry cough decreases more rapidly, and the wet cough decreases more slowly. Therefore, the characteristic of the cough sound signal can be well represented by the variation trend potential of the cough sound signal, the integral trend characteristic (the integral trend characteristic can reflect the variation trend of the signal) can be extracted from the MFCC coefficient matrix of the cough sound signal, and the integral trend characteristic is taken as input to establish an SVDD model to identify the sound signal.
Specifically, the overall trend characteristic of the cough sound sample signal can be obtained by performing dimension reduction processing on a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm (Landing Distance Available, LDA).
Fig. 3 is a time-frequency plot (spectrogram) of the cough sound signal, and it can be seen from fig. 3 that the spectrum energy is also concentrated in the beginning of the signal, and the frequency distribution range is wide (typically concentrated in 200-6000 Hz). Thus, the MFCC coefficients of several frames of signals in the spectrum energy set in the cough sound sample signal may be selected as local features to characterize the cough sound signal, and the local features are used as inputs to build an SVDD model to identify the sound signal. Specifically, the local features may be obtained by: and selecting a plurality of frames of signals with the most concentrated energy from the cough sound sample signals, and then distributing different weights for the MFCC coefficients of each frame of signals and adding the weights to obtain the local characteristics of the cough sound sample signals. Because the weights of the mel-frequency cepstrum coefficients of the cough sound sample signal are positively correlated with the energy coefficients of the cough sound sample signal, the weight values may be determined from the energy coefficients of the cough sound sample signal. Namely: selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer; and then determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain the local feature of the cough sound sample signal.
Through the analysis, the energy characteristic, the local characteristic and the overall trend characteristic can reflect the characteristics of the cough sound signal, and one or more sub-signal characteristics of the energy characteristic, the local characteristic and the overall trend characteristic are extracted from the MFCC coefficient matrix of the cough sound sample signal. And the one or more sub-signal features are used as input, and an SVDD model is established to identify the sound signals, so that the accuracy of cough sound identification is greatly improved, and the false identification rate is reduced. To improve the accuracy of cough voice recognition, energy features, local features, and overall trend features may be extracted simultaneously in the MFCC coefficient matrix. When the energy characteristics, the local characteristics and the overall trend characteristics are simultaneously extracted from the MFCC coefficient matrix of the cough sound sample signal to serve as input, and the SVDD model is trained to recognize the sound signal, the recognition rate of the cough sound can reach more than 95%.
Other dimensionality reduction methods may be used to reduce the MFCC coefficients of the cough sound sample signal, such as DTW, principal component analysis (Principal Component Analysis, PCA) or other algorithms. In the situation that the MFCC coefficient of the cough sound sample signal is subjected to dimensionality reduction by adopting the PCA algorithm, and the SVDD model is trained by utilizing the dimensionality-reduced parameters, the obtained SVDD model of the cough sound signal has small distinguishing degree of the cough sound and the noise, the cough sound recognition rate is about 85%, and the noise misrecognition rate is up to 65%.
Step 103: and taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain the cough signal characteristic model based on the support vector data description algorithm.
Where the signal features include energy features, local features, and global trend features, the energy features, local features, and global trend features are taken as inputs, respectively, and SVDD models are trained, i.e., an SVDD model for training energy features (energy feature model), an SVDD model for local features (local feature model), and an SVDD model for global trend features (global trend feature model). Thus, a cough signal characteristic model based on a support vector data description algorithm, which consists of an energy characteristic model, a local characteristic model and a whole trend characteristic model, is obtained.
The SVDD basic principle is to calculate a sphere decision boundary for the input samples, dividing the whole space into two parts, one part being the space within the boundary, and considering as an acceptable part; the other part is the space outside the boundary, which is regarded as the reject part. This gives the SVDD a class of classification characteristics for the sample.
Specifically, the SVDD is optimized by solving a minimum sphere with a center of a and a radius of R:
Figure BDA0001747300170000131
So that this sphere satisfies (for data x of 3 dimensions or more i The spherical surface is an hypersphere. Wherein, the hypersphere refers to the sphere in the space of more than 3 dimensions, the curve is in the corresponding 2-dimensional space, the sphere is in the 3-dimensional space):
Figure BDA0001747300170000132
this condition is satisfied, that is, the data points in the training dataset are all contained in the sphere, where x i Representing input sample data, i.e., cough sound sample signals.
Now, with the object to be solved and the constraint, the following solving method can use Lagrangian multiplier method:
Figure BDA0001747300170000133
wherein αi ≥0,γ i Not less than 0, respectively for the parameters R, a, xi i Obtaining the partial derivative and making the derivative equal to 0:
Figure BDA0001747300170000134
Figure BDA0001747300170000135
Figure BDA0001747300170000136
substituting the above (7), (8) and (9) into formula (6) gives the dual problem:
Figure BDA0001747300170000141
wherein
Figure BDA0001747300170000142
The above vector inner product can be solved by the kernel function K, namely:
Figure BDA0001747300170000143
the value of the center a and the radius R can be obtained through the calculation process, namely, the SVDD model is determined. After the centers a1, a2 and a3 and the radiuses R1, R2 and R3 of the 3 SVDD models are respectively obtained by training through the calculation process, the energy characteristic model, the local characteristic model and the overall trend characteristic model are respectively corresponding, and the training process is completed.
In the training process, the size and the range of the hypersphere are controlled to enable the hypersphere to contain as many sample points as possible, and the radius of the hypersphere is required to be minimized to achieve the optimal classifying effect.
Specifically, in the embodiment of the application, each model corresponds to one hypersphere, and on the premise of containing all cough sound signals, the hypersphere boundary is optimized to minimize the radius of the hypersphere, and finally the cough signal characteristic model which is most satisfactory and is based on a support vector data description algorithm is obtained, so that the accuracy is high when the cough signal characteristic model based on the support vector data description algorithm is used for identifying the signal characteristics of the extracted sound signals.
As shown in fig. 6, the cough sound recognition method includes:
step 201, sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
in practical applications, a sound input unit (for example, a microphone) may be provided on the cough sound recognition device 20 to collect a sound signal, and the sound signal may be converted into a digital signal after being amplified, filtered, or the like. The digital signal may be sampled and processed by a computing unit local to the cough voice recognition device 20, or may be uploaded to a cloud server, an intelligent terminal, or other servers via a network for processing.
The technical details of obtaining the mel-frequency cepstrum coefficient characteristic parameter matrix of the sound signal refer to step 101, and are not described herein.
Step 202, extracting signal characteristics from a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
in the case where the feature model based on the support vector data description algorithm acquired in advance includes an energy feature model, a local feature model, and an overall trend feature model, one or more of the energy feature, the local feature, and the overall trend feature may be extracted from the feature parameter matrix of the sound signal. In order to improve the recognition accuracy, three features, namely energy features, local features and overall trend features, can be extracted entirely. The specific calculation method of the energy characteristics, the local characteristics and the overall trend characteristics of the sound signal is referred to step 102, and will not be described herein.
Step 203, confirming whether the signal characteristics match with a pre-acquired cough signal characteristic model based on a support vector data description algorithm;
in the case that the feature model obtained in advance based on the support vector data description algorithm includes an energy feature model, a local feature model and an overall trend feature model, whether the energy feature, the local feature and the overall trend feature obtained in the step 202 conform to the feature model, that is, whether the energy feature conforms to the energy feature model, whether the local feature conforms to the local feature model and whether the overall trend feature conforms to the overall trend feature model is respectively judged. As can be seen from the discussion of step 103, the energy feature model, the local feature model and the overall trend feature model are hyperspherical models with centers of a1, a2 and a3 and radii of R1, R2 and R3 respectively. When judging whether the energy feature, the local feature and the overall trend feature conform to the feature model, the distances D1, D2 and D3 from the energy feature, the local feature and the overall trend feature to the centers a1, a2 and a3 can be calculated respectively, and only when all three features are within the boundary of the SVDD model (namely, D1< R1, D2< R2 and D3< R3), the sound sample can be judged to be cough sound.
Step 204, if the voice signals are matched, confirming that the voice signals are cough voice.
According to the cough sound identification method, the cough sound can be identified, so that the cough condition can be monitored by monitoring the sound emitted by the user, and the user does not need to wear any detection component. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.
Accordingly, the embodiment of the present application further provides a cough sound recognition apparatus for recognizing the device 20, the apparatus includes:
the sampling and characteristic parameter obtaining module 301 is configured to sample a sound signal and obtain a mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
a signal feature extraction module 302, configured to extract a signal feature from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;
a feature matching module 303, configured to confirm whether the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm;
and a confirmation module 304, configured to confirm the sound signal as a cough sound if the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm.
The cough sound recognition device provided by the embodiment of the application can recognize the cough sound, so that the cough condition can be monitored by monitoring the sound emitted by a user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.
Optionally, in other embodiments of the apparatus, the apparatus further comprises:
the feature model presetting module is used for acquiring the cough signal feature model based on the support vector data description algorithm in advance;
the feature model preset module is specifically configured to:
collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;
extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;
and training a support vector data description algorithm model by taking the signal characteristics of the cough sound sample signal as input so as to obtain a cough signal characteristic model based on the support vector data description algorithm.
Wherein optionally, in certain embodiments of the apparatus, the signal characteristics include: the signal features include one or more sub-signal features of energy features, local features, and global trend features.
Optionally, in some embodiments of the apparatus, if the signal features include energy features, the extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;
the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm;
the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
selecting the energy coefficient of the continuous frame sound signal with the maximum energy coefficient sum of the preset proportion from the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
and the energy coefficient of the continuous frame sound signal is regulated to a preset length based on a dynamic time regulating algorithm.
Optionally, in some embodiments of the apparatus, if the signal features include local features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer;
determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain local characteristics of the cough sound sample signal, wherein the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal is positively correlated with the energy coefficient of the S2 frame cough sound sample signal;
the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
selecting the mel frequency cepstrum coefficient of the S2 frame sound signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
And determining the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the Mel frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristics of the sound signal, wherein the weight of the Mel frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal.
Optionally, in some embodiments of the apparatus, if the signal features include global trend features, the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal includes:
performing dimension reduction processing on the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the cough sound sample signal;
the extracting signal features from the mel-frequency cepstral coefficient feature parameter matrix of the sound signal comprises:
and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the sound signal.
Optionally, in some embodiments of the apparatus, the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on the support vector data description algorithm, a sub-signal feature model based on the support vector data description algorithm that is one or more of a local feature model based on the support vector data description algorithm and a global trend feature model based on the support vector data description algorithm;
if the cough signal feature model based on the support vector data description algorithm includes multiple sub-signal feature models based on the support vector data description algorithm, the determining whether the cough signal feature matches the pre-acquired cough signal feature model based on the support vector data description algorithm includes:
and respectively determining whether various sub-signal features in the signal features are matched with various pre-acquired sub-signal feature models based on the support vector data description algorithm.
It should be noted that, the above device may execute the method provided by the embodiment of the present application, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.
The embodiment of the application also provides a cough sound recognition device, as shown in fig. 8, the cough sound recognition device 20 includes a sound input unit 21, a signal processing unit 22, and an arithmetic processing unit 23. Wherein: a sound input unit 21 for receiving a sound signal, which may be, for example, a microphone or the like. A signal processing unit 22 for performing signal processing on the sound signal; the signal processing unit 22 may perform analog signal processing such as amplification, filtering, digital-to-analog conversion, etc. on the sound signal, and send the obtained digital signal to the operation processing unit 23.
The signal processing unit 22 is connected to an internal or external operation processing unit 23 of the cough voice recognition device (fig. 8 illustrates that the operation processing unit is built in the cough voice recognition device), the operation processing unit 23 may be built in the cough voice recognition device 20 or external to the cough voice recognition device 20, and the operation processing unit 23 may be a remotely located server, for example, a cloud server, an intelligent terminal, or other servers communicatively connected to the cough voice recognition device 20 through a network.
The arithmetic processing unit 23 includes:
at least one processor 232 (one processor is illustrated in fig. 8) and memory 231, the processor 232 and memory 231 may be connected by a bus or otherwise, with a bus connection being an example in fig. 8.
The memory 231 is used for storing nonvolatile software programs, nonvolatile computer executable programs, and modules, such as program instructions/modules (e.g., the sampling and feature parameter obtaining module 301 shown in fig. 7) corresponding to the cough voice recognition method in the embodiment of the present application. The processor 232 executes various functional applications and data processing by running nonvolatile software programs, instructions and modules stored in the memory 231, i.e., implements the cough sound recognition method of the above-described method embodiment.
The memory 231 may include a storage program area that may store an operating system, at least one application program required for functions, and a storage data area; the storage data area may store data created from the use of the cough voice recognition device, etc. In addition, memory 231 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 231 optionally includes memory remotely located relative to processor 232, which may be connected to the cough voice recognition device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 231, which when executed by the one or more processors 232, perform the cough sound recognition method of any of the method embodiments described above, e.g., perform method steps 101-103 of fig. 5 described above, and method steps 201-204 of fig. 6; the functions of modules 301-304 in fig. 7 are implemented.
The cough sound recognition device provided by the embodiment of the application can recognize the cough sound, so that the cough condition can be monitored by monitoring the sound emitted by a user without wearing any detection component by the user. And because the identification algorithm based on the MFCC characteristic parameters and the SVDD model is adopted, the algorithm complexity is low, the calculated amount is small, the requirement on hardware is low, and the manufacturing cost of the product is reduced.
The cough voice recognition device can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.
Embodiments of the present application provide a storage medium storing computer-executable instructions that are executable by one or more processors (e.g., one processor 232 in fig. 8) to cause the one or more processors to perform the method of cough sound recognition in any of the method embodiments described above, e.g., perform method steps 101-103 in fig. 5, and method steps 201-204 in fig. 6 described above; the functions of modules 301-304 in fig. 7 are implemented.
The embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
From the above description of embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus a general purpose hardware platform, but may also be implemented by means of hardware. Those skilled in the art will appreciate that all or part of the processes implementing the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and where the program may include processes implementing the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present application, the steps may be implemented in any order, and there are many other variations of the different aspects of the present application as described above, which are not provided in details for the sake of brevity; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (3)

1. A method of cough voice recognition, the method comprising:
sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
extracting signal features from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;
wherein the signal features include energy features, local features, and global trend features;
Collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;
extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;
taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain a cough signal characteristic model based on the support vector data description algorithm;
the cough signal feature model based on the support vector data description algorithm comprises an energy feature model based on the support vector data description algorithm, a local feature model based on the support vector data description algorithm and an overall trend feature model based on the support vector data description algorithm;
if the energy features match the energy feature model, the local features match the local feature model, and the overall trend features match the overall trend feature model, then the sound signal is confirmed to be cough sound;
wherein if the signal features include energy features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:
Selecting the energy coefficient of the continuous frame sound signal with the maximum energy coefficient sum of the preset proportion from the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
the energy coefficient of the continuous frame sound signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the sound signal;
the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:
selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;
the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the cough sound sample signal;
wherein if the signal features include local features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:
selecting the mel frequency cepstrum coefficient of the S2 frame sound signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
Determining the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the mel-frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristic of the sound signal, wherein the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal;
the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:
selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer;
determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain local characteristics of the cough sound sample signal, wherein the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal is positively correlated with the energy coefficient of the S2 frame cough sound sample signal;
If the signal features include global trend features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:
performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the sound signal;
the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:
and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the cough sound sample signal.
2. A cough sound recognition device, characterized in that the cough sound recognition device comprises:
a sound input unit for receiving a sound signal;
a signal processing unit for performing analog signal processing on the sound signal;
the signal processing unit is connected with an internal or external operation processing unit of the cough voice recognition device, and the operation processing unit comprises:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform:
sampling a sound signal and obtaining a Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
extracting signal features from a mel-frequency cepstrum coefficient feature parameter matrix of the sound signal;
wherein the signal features include energy features, local features, and global trend features;
collecting a preset number of cough sound sample signals and obtaining a mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signals;
extracting the signal features from a mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal;
taking the signal characteristics of the cough sound sample signal as input, training a support vector data description algorithm model to obtain a cough signal characteristic model based on the support vector data description algorithm;
the cough signal feature model based on the support vector data description algorithm comprises an energy feature model based on the support vector data description algorithm, a local feature model based on the support vector data description algorithm and an overall trend feature model based on the support vector data description algorithm;
If the energy features match the energy feature model, the local features match the local feature model, and the overall trend features match the overall trend feature model, then the sound signal is confirmed to be cough sound;
wherein if the signal features include energy features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:
selecting the energy coefficient of the continuous frame sound signal with the maximum energy coefficient sum of the preset proportion from the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
the energy coefficient of the continuous frame sound signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the sound signal;
the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:
selecting the energy coefficient of the continuous frame cough sound sample signal with the maximum sum of the energy coefficients in the preset proportion from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal;
the energy coefficient of the continuous frame cough sound sample signal is regulated to a preset length based on a dynamic time regulation algorithm to obtain the energy characteristic of the cough sound sample signal;
Wherein if the signal features include local features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:
selecting the mel frequency cepstrum coefficient of the S2 frame sound signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal;
determining the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal based on the energy coefficient of the S2 frame sound signal, and carrying out weighted summation on the mel-frequency cepstrum coefficient of the S2 frame sound signal according to the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal to obtain the local characteristic of the sound signal, wherein the weight of the mel-frequency cepstrum coefficient of the S2 frame sound signal positively correlates with the energy coefficient of the S2 frame sound signal;
the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:
selecting a mel frequency cepstrum coefficient of a continuous S2-frame cough sound sample signal with the maximum sum of energy coefficients from the mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal, wherein S2 is a positive integer;
Determining the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal based on the energy coefficient of the S2 frame cough sound sample signal, and carrying out weighted summation on the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal according to the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal to obtain local characteristics of the cough sound sample signal, wherein the weight of the mel frequency cepstrum coefficient of the S2 frame cough sound sample signal is positively correlated with the energy coefficient of the S2 frame cough sound sample signal;
if the signal features include global trend features, the extracting signal features from the mel-frequency cepstrum coefficient feature parameter matrix of the sound signal includes:
performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the sound signal by adopting a linear discriminant analysis algorithm to obtain the overall trend characteristic of the sound signal;
the extracting the signal features from the mel-frequency cepstral coefficient feature parameter matrix of the cough sound sample signal comprises:
and performing dimension reduction processing on the Mel frequency cepstrum coefficient characteristic parameter matrix of the cough sound sample signal by adopting a linear discriminant analysis algorithm to obtain the integral trend characteristic of the cough sound sample signal.
3. A storage medium storing executable instructions that, when executed by a cough sound recognition device, cause the cough sound recognition device to perform the method of claim 1.
CN201780008985.4A 2017-07-31 2017-07-31 Cough sound recognition method, device, and storage medium Active CN108701469B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095263 WO2019023879A1 (en) 2017-07-31 2017-07-31 Cough sound recognition method and device, and storage medium

Publications (2)

Publication Number Publication Date
CN108701469A CN108701469A (en) 2018-10-23
CN108701469B true CN108701469B (en) 2023-06-20

Family

ID=63844118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201780008985.4A Active CN108701469B (en) 2017-07-31 2017-07-31 Cough sound recognition method, device, and storage medium

Country Status (2)

Country Link
CN (1) CN108701469B (en)
WO (1) WO2019023879A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108369813B (en) * 2017-07-31 2022-10-25 深圳和而泰智能家居科技有限公司 Specific voice recognition method, apparatus and storage medium
CN109360584A (en) * 2018-10-26 2019-02-19 平安科技(深圳)有限公司 Cough monitoring method and device based on deep learning
CN109498228B (en) * 2018-11-06 2021-03-30 林枫 Lung rehabilitation treatment device based on cough sound feedback
CN109567806A (en) * 2018-11-08 2019-04-05 广州军区广州总医院 A kind of traumatic Cervical cord injuries patient coughs sound evaluation system and evaluation method
CN109782666A (en) * 2019-01-22 2019-05-21 山东钰耀弘圣智能科技有限公司 A kind of rabbit epidemic disease multi-antenna and method
JP7312037B2 (en) * 2019-06-25 2023-07-20 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ Cough detection device, operating method and program for cough detection device
CN111179967B (en) * 2019-12-17 2022-05-24 华南理工大学 Algorithm, medium and equipment for linear classification of true and false cough sounds of patients with cervical and spinal cord injuries
EP3839971A1 (en) * 2019-12-19 2021-06-23 Koninklijke Philips N.V. A cough detection system and method
CN111524537B (en) * 2020-03-24 2023-04-14 苏州数言信息技术有限公司 Cough and sneeze identification method aiming at real-time voice flow
CN112233700A (en) * 2020-10-09 2021-01-15 平安科技(深圳)有限公司 Audio-based user state identification method and device and storage medium
CN112331231B (en) * 2020-11-24 2024-04-19 南京农业大学 Broiler feed intake detection system based on audio technology
CN113746583A (en) * 2021-09-18 2021-12-03 鹰潭市广播电视传媒集团有限责任公司 Remote management system, method, device and storage medium of public broadcasting equipment
CN114330454A (en) * 2022-01-05 2022-04-12 东北农业大学 Live pig cough sound identification method based on DS evidence theory fusion characteristics

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489446A (en) * 2013-10-10 2014-01-01 福州大学 Twitter identification method based on self-adaption energy detection under complex environment
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN104321015A (en) * 2012-03-29 2015-01-28 昆士兰大学 A method and apparatus for processing patient sounds
CN105095624A (en) * 2014-05-15 2015-11-25 中国电子科技集团公司第三十四研究所 Method for identifying optical fibre sensing vibration signal
CN105147252A (en) * 2015-08-24 2015-12-16 四川长虹电器股份有限公司 Heart disease recognition and assessment method
CN105761720A (en) * 2016-04-19 2016-07-13 北京地平线机器人技术研发有限公司 Interaction system based on voice attribute classification, and method thereof
CN106251880A (en) * 2015-06-03 2016-12-21 创心医电股份有限公司 Identify method and the system of physiological sound
CN106847262A (en) * 2016-12-28 2017-06-13 华中农业大学 A kind of porcine respiratory disease automatic identification alarm method
CN106847293A (en) * 2017-01-19 2017-06-13 内蒙古农业大学 Facility cultivation sheep stress behavior acoustical signal monitoring method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7727161B2 (en) * 2003-04-10 2010-06-01 Vivometrics, Inc. Systems and methods for monitoring cough
WO2006132596A1 (en) * 2005-06-07 2006-12-14 Matsushita Electric Industrial Co., Ltd. Method and apparatus for audio clip classification
US8532800B2 (en) * 2007-05-24 2013-09-10 Mavs Lab. Inc. Uniform program indexing method with simple and robust audio feature enhancing methods
CN101894551B (en) * 2010-07-02 2012-05-09 华南理工大学 Device for automatically identifying cough
CN102664011B (en) * 2012-05-17 2014-03-12 吉林大学 Method for quickly recognizing speaker

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104321015A (en) * 2012-03-29 2015-01-28 昆士兰大学 A method and apparatus for processing patient sounds
CN103489446A (en) * 2013-10-10 2014-01-01 福州大学 Twitter identification method based on self-adaption energy detection under complex environment
CN103730130A (en) * 2013-12-20 2014-04-16 中国科学院深圳先进技术研究院 Detection method and system for pathological voice
CN105095624A (en) * 2014-05-15 2015-11-25 中国电子科技集团公司第三十四研究所 Method for identifying optical fibre sensing vibration signal
CN106251880A (en) * 2015-06-03 2016-12-21 创心医电股份有限公司 Identify method and the system of physiological sound
CN105147252A (en) * 2015-08-24 2015-12-16 四川长虹电器股份有限公司 Heart disease recognition and assessment method
CN105761720A (en) * 2016-04-19 2016-07-13 北京地平线机器人技术研发有限公司 Interaction system based on voice attribute classification, and method thereof
CN106847262A (en) * 2016-12-28 2017-06-13 华中农业大学 A kind of porcine respiratory disease automatic identification alarm method
CN106847293A (en) * 2017-01-19 2017-06-13 内蒙古农业大学 Facility cultivation sheep stress behavior acoustical signal monitoring method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems;Yongwha Chung etc;《Sensors 2013》;20130925;第12929-12942页 *
Yongwha Chung etc.Automatic Detection and Recognition of Pig Wasting Diseases Using Sound Data in Audio Surveillance Systems.《Sensors 2013》.2013,第12929-12942页. *
设施羊舍声信号的特征提取和分类识别研究;宣传忠;《中国优秀博硕士学位论文全文数据库(博士) 信息科技辑》;20170115(第01期);第51-65页 *

Also Published As

Publication number Publication date
WO2019023879A1 (en) 2019-02-07
CN108701469A (en) 2018-10-23

Similar Documents

Publication Publication Date Title
CN108701469B (en) Cough sound recognition method, device, and storage medium
CN108369813B (en) Specific voice recognition method, apparatus and storage medium
CN109074822B (en) Specific voice recognition method, apparatus and storage medium
KR102635469B1 (en) Method and apparatus for recognition of sound events based on convolutional neural network
CN110111769B (en) Electronic cochlea control method and device, readable storage medium and electronic cochlea
CN110755108A (en) Heart sound classification method, system and device based on intelligent stethoscope and readable storage medium
CN109147798B (en) Speech recognition method, device, electronic equipment and readable storage medium
CN110123367A (en) Computer equipment, recognition of heart sound device, method, model training apparatus and storage medium
CN108847253B (en) Vehicle model identification method, device, computer equipment and storage medium
CN105147252A (en) Heart disease recognition and assessment method
CN103514877A (en) Vibration signal characteristic parameter extracting method
CN109065043A (en) A kind of order word recognition method and computer storage medium
CN108682433A (en) The heart sound kind identification method of first-order difference coefficient based on MFCC
CN113870903A (en) Pathological voice recognition method, device, equipment and storage medium
CN108937857A (en) A kind of identification and appraisal procedure of cardiechema signals
CN111145726B (en) Deep learning-based sound scene classification method, system, device and storage medium
CN112329819A (en) Underwater target identification method based on multi-network fusion
CN111862991A (en) Method and system for identifying baby crying
Kamińska et al. Comparison of perceptual features efficiency for automatic identification of emotional states from speech
WO2023077592A1 (en) Intelligent electrocardiosignal processing method
CN115064182A (en) Fan fault feature identification method of self-adaptive Mel filter in strong noise environment
CN111192569B (en) Double-microphone voice feature extraction method and device, computer equipment and storage medium
CN107993666A (en) Audio recognition method, device, computer equipment and readable storage medium storing program for executing
WO2014027962A1 (en) Device, system and method for detection of fluid accumulation
US11763805B2 (en) Speaker recognition method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20221028

Address after: 1010-1011, 10 / F, block D, Shenzhen Aerospace Science and Technology Innovation Research Institute building, no.6, Keji south 10 road, high tech South Zone, Nanshan District, Shenzhen, Guangdong 518000

Applicant after: SHENZHEN H&T INTELLIGENT CONTROL Co.,Ltd.

Address before: 1002, 10 / F, block D, Shenzhen Aerospace Science and Technology Innovation Research Institute building, no.6, Keji south 10 road, high tech South Zone, Nanshan District, Shenzhen, Guangdong 518000

Applicant before: SHENZHEN H&T SMART HOME TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant