WO2019023879A1 - 咳嗽声音识别方法、设备和存储介质 - Google Patents

咳嗽声音识别方法、设备和存储介质 Download PDF

Info

Publication number
WO2019023879A1
WO2019023879A1 PCT/CN2017/095263 CN2017095263W WO2019023879A1 WO 2019023879 A1 WO2019023879 A1 WO 2019023879A1 CN 2017095263 W CN2017095263 W CN 2017095263W WO 2019023879 A1 WO2019023879 A1 WO 2019023879A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
sound
coughing
characteristic
mel frequency
Prior art date
Application number
PCT/CN2017/095263
Other languages
English (en)
French (fr)
Inventor
刘洪涛
冯澍婷
孟亚彬
Original Assignee
深圳和而泰智能家居科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳和而泰智能家居科技有限公司 filed Critical 深圳和而泰智能家居科技有限公司
Priority to PCT/CN2017/095263 priority Critical patent/WO2019023879A1/zh
Priority to CN201780008985.4A priority patent/CN108701469B/zh
Publication of WO2019023879A1 publication Critical patent/WO2019023879A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Definitions

  • the embodiments of the present application relate to sound processing technologies, and in particular, to a cough sound recognition method, device, and storage medium.
  • Cough is an indicator of the therapeutic effect or progression of certain diseases (such as asthma).
  • Detailed and accurate information on cough status (such as the number of coughs per hour, cough time, etc.) has important clinical guiding significance for disease diagnosis.
  • intelligent cough monitoring devices are more accurate than manual identification of coughs.
  • the current intelligent cough monitoring device is mainly used for medical monitoring purposes, and requires the patient to wear complicated equipment for monitoring, which undoubtedly brings inconvenience to the user.
  • DTW Dynamic Time Warping
  • the cough sound recognition method based on the DTW algorithm has high algorithm complexity, large calculation amount, and higher requirements on hardware devices.
  • the purpose of the present application is to provide a cough sound recognition method, device and storage medium, which can recognize cough sound, and has a simple algorithm, a small amount of calculation, and low requirements on hardware devices.
  • an embodiment of the present application provides a cough sound recognition method for identifying a device, the method comprising:
  • the sound signal is a coughing sound.
  • the method further includes:
  • the cough signal feature model based on the support vector data description algorithm is acquired in advance.
  • the pre-acquisition of the cough signal feature model based on the support vector data description algorithm includes:
  • the support vector data description algorithm model is trained to obtain the cough signal feature model based on the support vector data description algorithm.
  • the signal feature comprises one or more sub-signal features of the energy feature, the local feature, and the overall trend feature.
  • extracting the signal feature from a matrix parameter of a Mel frequency cepstral coefficient of the coughing sound sample signal includes:
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the energy coefficients of the continuous frame sound signal are rounded to a preset length based on a dynamic time rounding algorithm to obtain an energy characteristic of the sound signal.
  • the extracting the signal feature from a parameter matrix of a frequency coefficient of a crest frequency cepstrum of the coughing sound sample signal includes:
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • extracting the signal feature from a Mel frequency cepstral coefficient feature parameter matrix of the coughing sound sample signal includes:
  • the linear discriminant analysis algorithm is used to perform dimension reduction processing on the characteristic parameter matrix of the frequency coefficient of the cough frequency of the coughing sound sample signal to obtain an overall trend characteristic of the cough sound sample signal;
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the linear discriminant analysis algorithm is used to perform dimensionality reduction on the characteristic parameter matrix of the Mel frequency cepstral coefficient of the sound signal, and the overall trend characteristic of the sound signal is obtained.
  • the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on a support vector data description algorithm, a local feature model based on a support vector data description algorithm, and an overall trend feature based on a support vector data description algorithm.
  • the cough signal feature model based on the support vector data description algorithm includes a plurality of sub-signal feature models based on a support vector data description algorithm, the confirming whether the signal feature matches a pre-acquired cough signal based on a support vector data description algorithm Feature models, including:
  • the embodiment of the present application further provides a coughing voice recognition device, where the cough voice recognition device includes:
  • a sound input unit for receiving a sound signal
  • a signal processing unit configured to perform analog signal processing on the sound signal
  • the signal processing unit is connected to an internal or external operation processing unit of the cough sound recognition device, and the operation processing unit includes:
  • At least one processor and,
  • the memory stores instructions executable by the at least one processor, the instructions being executed by the at least one processor to enable the at least one processor to:
  • the sound signal is a coughing sound.
  • the at least one processor is further capable of:
  • the cough signal feature model based on the support vector data description algorithm is acquired in advance.
  • the pre-acquisition of the cough signal feature model based on the support vector data description algorithm includes:
  • the support vector data description algorithm model is trained to obtain the cough signal feature model based on the support vector data description algorithm.
  • the signal feature comprises one or more sub-signal features of the energy feature, the local feature, and the overall trend feature.
  • extracting the signal feature from a matrix parameter of a Mel frequency cepstral coefficient of the coughing sound sample signal includes:
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the energy coefficients of the continuous frame sound signal are rounded to a preset length based on a dynamic time rounding algorithm to obtain an energy characteristic of the sound signal.
  • the extracting the signal feature from a parameter matrix of a frequency coefficient of a crest frequency cepstrum of the coughing sound sample signal includes:
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the signal feature includes an overall trend feature
  • the coughing sound sample letter The signal characteristics of the Mel frequency cepstral coefficient characteristic parameter matrix are extracted, including:
  • the linear discriminant analysis algorithm is used to perform dimension reduction processing on the characteristic parameter matrix of the frequency coefficient of the cough frequency of the coughing sound sample signal to obtain an overall trend characteristic of the cough sound sample signal;
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the linear discriminant analysis algorithm is used to perform dimensionality reduction on the characteristic parameter matrix of the Mel frequency cepstral coefficient of the sound signal, and the overall trend characteristic of the sound signal is obtained.
  • the cough signal feature model based on the support vector data description algorithm includes an energy feature model based on a support vector data description algorithm, a local feature model based on a support vector data description algorithm, and an overall trend feature based on a support vector data description algorithm.
  • the cough signal feature model based on the support vector data description algorithm includes a plurality of sub-signal feature models based on a support vector data description algorithm, the confirming whether the signal feature matches a pre-acquired cough signal based on a support vector data description algorithm Feature models, including:
  • the embodiment of the present application further provides a storage medium, where the storage medium stores executable instructions, when the executable instructions are executed by a coughing voice recognition device, causing the cough voice recognition device to perform the foregoing method .
  • the embodiment of the present application further provides a program product, where the program product includes a program stored on a storage medium, where the program includes program instructions, when the program instruction is executed by a coughing voice recognition device, The coughing sound recognition device performs the above method.
  • the coughing voice recognition method, device and storage medium provided by the embodiments of the present application can recognize the coughing sound, so that the coughing condition can be monitored by monitoring the sound emitted by the user without the user wearing any detecting component. Because the recognition algorithm based on MFCC characteristic parameters and SVDD model is adopted, the algorithm has low complexity and less calculation, which has low hardware requirements and reduces product manufacturing cost.
  • FIG. 1 is a schematic structural diagram of an application environment of each embodiment of the present application.
  • Figure 2 is a time-amplitude diagram of a coughing sound signal
  • Figure 3 is a time-frequency diagram of a coughing sound signal
  • FIG. 4 is a schematic diagram of a Meyer frequency filtering process in the MFCC coefficient calculation process
  • FIG. 5 is a schematic flowchart of obtaining a feature model based on a support vector data description algorithm in a cough voice recognition method according to an embodiment of the present application
  • FIG. 6 is a schematic flow chart of a cough sound recognition method provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a coughing voice recognition device according to an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a cough sound recognition apparatus according to an embodiment of the present application.
  • the embodiment of the present application proposes a coughing voice recognition scheme based on the Mel Frequency Cepstral Coefficients (MFCC) feature parameter and the Support Vector Data Description (SVDD) model, which is applicable to FIG. 1
  • the application environment includes a user 10 and a coughing voice recognition device 20 for receiving a sound from the user 10 and identifying the sound to determine if the sound is a coughing sound.
  • MFCC Mel Frequency Cepstral Coefficients
  • SVDD Support Vector Data Description
  • the coughing sound recognition device 20 may also record and process the coughing sound to output coughing information of the user 10, the coughing information may include the number of coughing sounds, The length of the coughing sound and the decibel of the coughing sound.
  • the coughing sound recognition device 20 may also record and process the coughing sound to output coughing information of the user 10, the coughing information may include the number of coughing sounds, The length of the coughing sound and the decibel of the coughing sound.
  • a counter may be included in the coughing sound recognition device for counting the coughing sound when the coughing sound is detected; by using a timer in the coughing sound recognition device for detecting the coughing sound, The duration of the coughing sound is counted; the decibel detecting means may be included in the coughing sound recognition device for detecting the decibel of the coughing sound when the coughing sound is detected.
  • the recognition principle of the coughing sound in the embodiment of the present application is similar to the principle of the speech recognition, and the input sound is processed and compared with the sound model to obtain the recognition result. It can be divided into two stages, namely the coughing sound model training stage and the coughing sound recognition stage.
  • the coughing sound model training stage mainly collects a certain number of coughing sound samples, calculates the MFCC characteristic parameters of the coughing sound signal, extracts the signal features from the MFCC characteristic parameters, and trains the signal features based on the SVDD algorithm to obtain a coughing sound reference.
  • Feature model mainly collects a certain number of coughing sound samples, calculates the MFCC characteristic parameters of the coughing sound signal, extracts the signal features from the MFCC characteristic parameters, and trains the signal features based on the SVDD algorithm to obtain a coughing sound reference.
  • the MFCC feature parameters are calculated for the sounds that need to be judged, and the signal features corresponding to the feature models are extracted, and then the signal features are judged to match the feature model. If they match, the cough sound is judged, otherwise the judgment is non- Coughing sound.
  • the identification process mainly includes preprocessing, feature extraction, model training, pattern matching and decision making.
  • the coughing sound signal is sampled and the MFCC coefficient of the coughing sound signal is calculated.
  • the feature extraction step the energy characteristics, the overall trend characteristics, and the local features of the cough sound signal are selected from the MFCC coefficient matrix, and the SVDD model is acquired as an input.
  • the model training step according to the three types of features extracted from the MFCC coefficient matrix of coughing sound signals, three SVDD models are trained, which are SVDD energy feature model, SVDD local feature model and SVDD overall trend feature model.
  • three SVDD models are utilized to identify whether the new sound signal is a coughing sound signal.
  • the MFCC coefficient matrix of the sound signal is calculated, then the energy characteristics, the overall trend characteristics and the local features of the sound signal are extracted from the MFCC coefficient matrix, and then the three characteristics are respectively matched to the SVDD energy feature model, the SVDD local feature model and the SVDD.
  • the overall trend feature model if matched, determines that the sound signal is a coughing sound signal, otherwise, the sound signal is determined not to be a coughing sound signal.
  • the MFCC combined with SVDD to identify cough sounds can simplify the complexity of the algorithm, reduce the amount of calculations, and significantly improve the accuracy of cough sound recognition.
  • the embodiment of the present application provides a coughing voice recognition method, which can be used in the cough voice recognition device 20, and the cough voice recognition method needs to obtain a feature model based on a support vector data description algorithm in advance, that is, a feature model based on the SVDD algorithm.
  • the feature model based on the SVDD algorithm may be pre-configured or may be trained by the methods in steps 101 to 103. After training the feature model based on the SVDD algorithm, the subsequent feature model based on the SVDD algorithm may be identified. Coughing sound, further, if the SVDD algorithm-based feature model is used to identify cough sound due to scene change or other reasons, the accuracy rate is unqualified, and can be reconfigured or trained based on SVDD The feature model of the algorithm.
  • the feature model obtained by the support vector data description algorithm in advance includes:
  • Step 101 Collect a preset number of coughing sound sample signals and acquire a characteristic parameter matrix of a Mel frequency cepstral coefficient of the coughing sound sample signal;
  • the coughing sound sample signal s(n) is sampled, and the Mel frequency cepstral coefficient characteristic parameter matrix of the coughing sound sample signal is obtained according to the coughing sound sample signal.
  • the Mel frequency cepstrum coefficient is mainly used for sound data feature extraction and reduction operation dimensions. For example, for a frame with 512 dimensions (sampling points), after processing by MFCC, the most important 40-dimensional data can be extracted, and the purpose of dimensionality reduction is also achieved.
  • the calculation of the Mel frequency cepstral coefficient generally includes: pre-emphasis, framing, windowing, fast Fourier transform, mel filter bank and discrete cosine transform.
  • pre-emphasis The purpose of pre-emphasis is to raise the high-frequency portion, flatten the spectrum of the signal, and maintain the spectrum in the same frequency-to-noise ratio in the entire frequency band from low frequency to high frequency. At the same time, it is also to eliminate the effect of the vocal cords and lips during the process of occurrence, to compensate for the high-frequency part of the sound signal that is suppressed by the pronunciation system, and to highlight the high-frequency formant.
  • the implementation method is that the sampled cough sound sample signal s(n) is pre-emphasized by a first-order finite-length unit impulse response (FIR) high-pass digital filter, and the transfer function is:
  • the time domain representation is the coughing sound sample signal s(n), and a is the pre-emphasis coefficient, which is generally a constant from 0.9 to 1.0.
  • Each P sample points in the cough sound sample signal s(n) are grouped into one observation unit, called a frame.
  • the value of P can be 256 or 512, and the time covered is about 20 to 30 ms.
  • the overlapping area contains M sampling points, and the value of M may be about 1/2 or 1/3 of P.
  • each frame must also undergo a fast Fourier transform to obtain the energy distribution in the spectrum. Performing fast Fourier transform on each frame signal after the frame is windowed to obtain the spectrum of each frame.
  • the power spectrum of the sound signal is obtained by modulo the square of the spectrum of the speech signal.
  • the energy spectrum is filtered through a set of Mel scale triangular filter banks.
  • a filter bank with M filters (the number of filters is close to the number of critical bands).
  • the interval between each f(m) decreases as the value of m decreases, and widens as the value of m increases. Please refer to FIG.
  • the frequency response of the triangular filter is defined as:
  • the MFCC coefficients are obtained by the discrete clutter transform (DCT) for the logarithmic energy s(m):
  • Step 102 Extract the signal feature from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the coughing sound sample signal
  • the MFCC coefficient is a coefficient matrix of N*L, where N is the number of sound signal frames and L is the length of the MFCC coefficients. Since the MFCC coefficient matrix has a high dimension and the length of the sound signal is inconsistent, the number of matrix rows N is different, and the MFCC coefficient matrix cannot be used as a direct input to obtain the SVDD model. Therefore, it is necessary to further extract effective features from the MFCC coefficient matrix for direct input to the SVDD model.
  • the effective features are extracted from the coefficient matrix.
  • Figure 2 is the time-magnitude diagram of the coughing sound signal (time domain diagram).
  • the coughing sound signal is very short, with obvious suddenness, monophonic coughing sound.
  • the duration of the duration is usually less than 550ms, and even patients with severe throat and bronchial diseases generally maintain a duration of about 1000ms. From the energy point of view, the energy of the coughing sound signal is mainly concentrated in the first half of the signal.
  • the energy coefficient of the signal segment with relatively concentrated energy can be selected as the energy feature to characterize the coughing sound sample signal, for example, selecting a set of energy coefficients of the first 1/2 partial signal from the coughing sound sample signal as an energy characteristic, Using the energy signature as an input, the SVDD model is established to identify the sound signal.
  • extracting energy characteristics from the characteristic parameter matrix of the Mel frequency cepstral coefficient of the coughing sound sample signal includes:
  • the energy coefficient of the coughing sound sample signal is obtained by normalizing the energy coefficient of the continuous frame coughing sound sample signal to a preset length based on the DTW algorithm.
  • the continuous frame coughing sound sample signal of the preset ratio of the sum of the energy coefficients may be the first 1/2 part of the coughing sound sample signal, before Part 4/7 or the first 5/9 part and so on.
  • the preset length can be set according to the actual application.
  • the coughing sound signals (about 90%) have basically the same trend. After the coughing pulse occurs, the signal energy decreases rapidly. When the cough is dry, the rate of decline is faster, and when the cough is slow, the rate of decline is slightly slower. Therefore, the trend of the coughing sound signal can well characterize the characteristics of the coughing sound signal, and the overall trend characteristic can be extracted from the MFCC coefficient matrix of the coughing sound signal (the overall trend characteristic can reflect the trend of the signal), and the overall trend The feature is used as an input to establish an SVDD model to identify the sound signal.
  • the overall trend characteristic of the coughing sound sample signal can be obtained by using a linear discriminant analysis algorithm (Landing Distance Available) (LDA) to perform dimension reduction processing on the characteristic parameter matrix of the Mel frequency cepstral coefficient of the coughing sound sample signal.
  • LDA linear discriminant analysis algorithm
  • Figure 3 is a time-frequency diagram (spectral map) of the coughing sound signal.
  • the spectrum energy is also concentrated in the beginning of the signal, and the frequency distribution is wider (generally concentrated in the range of 200 to 6000 Hz). Therefore, the MFCC coefficient of several frames of the spectral energy concentration in the coughing sound sample signal can be selected as a local feature to characterize the coughing sound signal, and the local feature is taken as an input, and the SVDD model is established to recognize the sound signal.
  • the local feature can be obtained by selecting a few frames of the most concentrated energy from the coughing sound sample signal, and then assigning different weights to the MFCC coefficients of the frame signals and adding them to obtain a partial of the coughing sound sample signal. feature.
  • the weight value can be determined based on the energy coefficient of the chirped sound sample signal. That is, the Mel frequency cepstral coefficient of the continuous S2 frame coughing sound sample signal having the largest sum of energy coefficients is selected from the Mel frequency cepstral coefficient characteristic parameter matrix of the coughing sound sample signal, and the S2 is a positive integer; Determining a weight of a Mel frequency cepstral coefficient of the S2 frame coughing sound sample signal based on an energy coefficient of the S2 frame coughing sound sample signal, and weighting a Mel frequency cepstral coefficient according to the S2 frame coughing sound sample signal Performing weighted summation of the Mel frequency cepstral coefficients of the S2 frame coughing sound sample signal to obtain local features of the coughing sound sample signal.
  • the energy characteristics, local features and overall trend characteristics can reflect the characteristics of the coughing sound signal, and extract one or more sub-signal characteristics from the MFCC coefficient matrix of the coughing sound sample signal to extract energy characteristics, local features, and overall trend characteristics.
  • the SVDD model is established to identify the sound signal, which greatly improves the accuracy of coughing sound recognition and reduces the false recognition rate.
  • Energy features, local features, and overall trend characteristics are extracted simultaneously in the matrix.
  • the energy feature, the local feature and the overall trend feature are simultaneously extracted from the MFCC coefficient matrix of the cough sound sample signal as input.
  • the recognition rate of the cough sound can reach 95% or more.
  • dimensionality reduction methods can also be used to reduce the MFCC coefficients of cough sound sample signals.
  • DTW Principal Component Analysis
  • PCA Principal Component Analysis
  • other algorithms are used to reduce the MFCC coefficients.
  • the PCA algorithm is used to reduce the MFCC coefficient of the coughing sound sample signal and use the dimensionally reduced parameters to train the SVDD model
  • the SVDD model of the coughing sound signal obtained has a small discrimination between the coughing sound and the noise, and the coughing sound.
  • the recognition rate is about 85%, and the noise misrecognition rate is 65%.
  • Step 103 Taking the signal feature of the coughing sound sample signal as an input, training the support vector data description algorithm model to obtain the cough signal feature model based on the support vector data description algorithm.
  • the signal characteristics include energy features, local features, and overall trend characteristics
  • the energy features, local features, and overall trend features are respectively input as inputs, and the SVDD model, ie, the SVDD model (energy feature model) of the training energy feature, is trained.
  • the cough signal feature model based on the support vector data description algorithm composed of the energy feature model, the local feature model and the overall trend feature model is obtained.
  • the basic principle of SVDD is to calculate a spherical decision boundary for the input sample, divide the whole space into two parts, one part is the space inside the boundary, which is regarded as an acceptable part; the other part is the space outside the boundary, which is regarded as rejected. section. This gives SVDD a classification feature for a class of samples.
  • the optimization goal of SVDD is to find a minimum sphere with a center of a and a radius of R:
  • the spherical surface is a hypersphere.
  • the hypersphere refers to the spherical surface in the space above 3D, and the corresponding 2D space is the curve, in the 3D space Spherical):
  • Satisfying this condition means that the data points in the training data set are included in the sphere, where x i represents the input sample data, that is, the cough sound sample signal.
  • the inner product of the above vector can be solved by the kernel function K, namely:
  • the value of the center a and the radius R can be obtained, that is, the SVDD model is determined.
  • the energy feature model, the local feature model and the overall trend feature model are respectively matched, and the training process is completed.
  • the energy feature model, the local feature model, and the overall trend feature model each model corresponds to a hypersphere, and under the premise of including all cough sound signals, the hypersphere boundary is optimized such that its radius To achieve the minimum, and finally get the most suitable cough signal feature model based on the support vector data description algorithm, so that the cough signal feature model based on the support vector data description algorithm can be used to identify the signal characteristics of the extracted sound signal. high.
  • the cough sound recognition method includes:
  • Step 201 sampling a sound signal and acquiring a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal;
  • a sound input unit for example, a microphone
  • the sound signal is amplified, filtered, and the like, and then converted into a digital signal.
  • the digital signal may be sampled and processed in the computing processing unit local to the coughing voice recognition device 20, or may be uploaded to a cloud server, a smart terminal, or other server for processing.
  • step 101 For the technical details of obtaining the characteristic parameter matrix of the Mel frequency cepstral coefficient of the sound signal, refer to step 101, and details are not described herein again.
  • Step 202 Extract a signal feature from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal.
  • the energy feature, the local feature and the overall trend feature may be extracted from the feature parameter matrix of the sound signal.
  • the feature parameter matrix of the sound signal One or several.
  • three features, namely energy features, local features and overall trend features, can be extracted.
  • Step 203 Confirm whether the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm
  • the feature model based on the support vector data description algorithm acquired in advance includes the energy feature model, the local feature model, and the overall trend feature model, respectively, whether the energy feature, the local feature, and the overall trend feature acquired in step 202 conform to the feature model are respectively determined. That is, whether the energy feature conforms to the energy feature model, whether the local feature conforms to the local feature model, and whether the overall trend feature conforms to the overall trend feature model. It can be seen from the discussion of step 103 that the energy feature model, the local feature model and the overall trend feature model are hypersphere models with centers a1, a2, and a3 and radiuses R1, R2, and R3, respectively.
  • the distances D1, D2, and D3 of the energy feature, the local feature, and the overall trend feature to the centers a1, a2, and a3 can be separately calculated, only when all three features are
  • the sound sample can be judged to be a coughing sound within the boundary of the SVDD model (ie, D1 ⁇ R1, D2 ⁇ R2, D3 ⁇ R3).
  • Step 204 If it matches, confirm that the sound signal is a coughing sound.
  • the coughing voice recognition method can recognize the coughing sound, so that the coughing condition can be monitored by monitoring the sound emitted by the user without the user wearing any detecting component. Because the recognition algorithm based on MFCC characteristic parameters and SVDD model is adopted, the algorithm has low complexity and less calculation, which has low hardware requirements and reduces product manufacturing cost.
  • the embodiment of the present application further provides a coughing voice recognition device for identifying the device 20, the device comprising:
  • the sampling and feature parameter obtaining module 301 is configured to sample the sound signal and obtain a characteristic parameter matrix of the Mel frequency cepstral coefficient of the sound signal;
  • the signal feature extraction module 302 is configured to extract a signal feature from a matrix parameter of the Mel frequency cepstral coefficient of the sound signal;
  • a feature matching module 303 configured to confirm whether the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm
  • the confirmation module 304 is configured to confirm that the sound signal is a coughing sound if the signal feature matches a pre-acquired cough signal feature model based on a support vector data description algorithm.
  • the coughing voice recognition device provided by the embodiment of the present application can recognize the coughing sound, so that the coughing condition can be monitored by monitoring the sound emitted by the user without the user wearing any detecting component. Because the recognition algorithm based on MFCC characteristic parameters and SVDD model is adopted, the algorithm has low complexity and less calculation, which has low hardware requirements and reduces product manufacturing cost.
  • the device further includes:
  • a feature model preset module configured to pre-acquire the cough signal feature model based on the support vector data description algorithm
  • the feature model preset module is specifically configured to:
  • the support vector data description algorithm model is trained to obtain a cough signal feature model based on the support vector data description algorithm.
  • the signal feature comprises: the signal feature comprises one or more sub-signal features of an energy feature, a local feature, and an overall trend feature.
  • extracting the signal characteristic from a matrix of a frequency coefficient of a crest frequency cepstral coefficient of the coughing sound sample signal include:
  • the energy coefficient of the continuous frame coughing sound sample signal is rounded to a preset length based on a dynamic time rounding algorithm
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the energy coefficient of the continuous frame sound signal is rounded to a preset length based on a dynamic time rounding algorithm.
  • the signal feature if the signal feature includes a local feature, the signal feature is extracted from a matrix of a frequency coefficient of a cepstral coefficient of the coughing sound sample signal, include:
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the signal feature is extracted from a matrix of a frequency coefficient of a crest frequency cepstral coefficient of the coughing sound sample signal ,include:
  • the linear discriminant analysis algorithm is used to perform dimension reduction processing on the characteristic parameter matrix of the frequency coefficient of the cough frequency of the coughing sound sample signal to obtain an overall trend characteristic of the cough sound sample signal;
  • Extracting a signal characteristic from a characteristic parameter matrix of a Mel frequency cepstral coefficient of the sound signal comprising:
  • the linear discriminant analysis algorithm is used to perform dimensionality reduction on the characteristic parameter matrix of the Mel frequency cepstral coefficient of the sound signal, and the overall trend characteristic of the sound signal is obtained.
  • the cough signal feature model based on a support vector data description algorithm includes an energy feature model based on a support vector data description algorithm, and a local feature model based on a support vector data description algorithm And one or more sub-signal feature models based on the support vector data description algorithm in the overall trend feature model based on the support vector data description algorithm;
  • the cough signal feature model based on the support vector data description algorithm includes a plurality of sub-signal feature models based on a support vector data description algorithm, the confirming whether the cough signal feature matches a pre-acquired cough based on a support vector data description algorithm
  • Signal feature models including:
  • the foregoing apparatus can perform the method provided by the embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.
  • the foregoing apparatus can perform the method provided by the embodiment of the present application, and has the corresponding functional modules and beneficial effects of the execution method.
  • the cough voice recognition device 20 includes a voice input unit 21, a signal processing unit 22, and an operation processing unit 23.
  • the sound input unit 21 is configured to receive a sound signal, and the sound input unit may be, for example, a microphone or the like.
  • the signal processing unit 22 is configured to perform signal processing on the sound signal; the signal processing unit 22 may perform analog signal processing such as amplification, filtering, digital-to-analog conversion, etc. on the sound signal, and send the obtained digital signal to the arithmetic processing.
  • Unit 23 is configured to perform analog signal processing such as amplification, filtering, digital-to-analog conversion, etc.
  • the signal processing unit 22 and the cough sound recognition device are built in or externally operated by the operation processing unit 23
  • the connection processing is illustrated in FIG. 8 as an example in which the arithmetic processing unit is built in the coughing sound recognition device.
  • the arithmetic processing unit 23 may be built in the coughing sound recognition device 20 or may be externally disposed outside the coughing sound recognition device 20, and the arithmetic processing is performed.
  • the unit 23 may also be a remotely located server, such as a cloud server, smart terminal or other server communicatively coupled to the cough voice recognition device 20 over a network.
  • the operation processing unit 23 includes:
  • At least one processor 232 (illustrated by one processor in FIG. 8) and a memory 231, the processor 232 and the memory 231 may be connected by a bus or the like, and the bus connection is taken as an example in FIG.
  • the memory 231 is configured to store a non-volatile software program, a non-volatile computer-executable program, and a module, such as a program instruction/module corresponding to the cough sound recognition method in the embodiment of the present application (for example, the sampling shown in FIG. 7) And a feature parameter acquisition module 301).
  • the processor 232 executes various functional applications and data processing by executing non-volatile software programs, instructions, and modules stored in the memory 231, that is, the cough sound recognition method of the above-described method embodiments.
  • the memory 231 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to use of the cough sound recognition device, and the like. Further, the memory 231 may include a high speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, or other nonvolatile solid state storage device. In some embodiments, the memory 231 can optionally include a memory remotely located relative to the processor 232 that can be connected to the coughing voice recognition device via a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • the one or more modules are stored in the memory 231, and when executed by the one or more processors 232, perform a cough sound recognition method in any of the above method embodiments, for example, performing FIG. 5 described above Method steps 101-103, method steps 201 through 204 in FIG. 6; implement the functions of modules 301-304 in FIG.
  • the coughing voice recognition device provided by the embodiment of the present application can recognize the coughing sound, so that the coughing condition can be monitored by monitoring the sound emitted by the user without the user wearing any detecting component. Because the recognition algorithm based on MFCC characteristic parameters and SVDD model is adopted, the algorithm has low complexity and less calculation, which has low hardware requirements and reduces product manufacturing cost.
  • the coughing voice recognition device can perform the method provided by the embodiment of the present application, and has an execution method. Corresponding functional modules and benefits. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiments of the present application.
  • Embodiments of the present application provide a storage medium storing computer executable instructions that are executed by one or more processors (eg, one processor 232 in FIG. 8), such that The one or more processors may perform the cough sound recognition method in any of the above method embodiments, for example, perform the method steps 101-103 in FIG. 5 described above, the method steps 201 to 204 in FIG. 6, and implement FIG. The function of modules 301-304 in .
  • processors eg, one processor 232 in FIG. 8
  • the embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, ie may be located in one Places, or they can be distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the embodiments can be implemented by means of software plus a general hardware platform, and of course, by hardware.
  • a person skilled in the art can understand that all or part of the process of implementing the above embodiments can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium. When executed, the flow of an embodiment of the methods as described above may be included.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

一种咳嗽声音识别方法、设备和存储介质,所述方法包括:采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵(201);从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征(202);确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型(203);如果匹配,则确认所述声音信号为咳嗽声音(204)。所述方法、设备能对咳嗽声音进行识别,从而能够通过监测使用者发出的声音对咳嗽情况进行监测,无需使用者佩戴任何检测部件。且由于采用基于MFCC特征参数和SVDD模型的识别算法,算法复杂度低、计算量少,从而对硬件要求低,降低了产品制造成本。

Description

咳嗽声音识别方法、设备和存储介质 技术领域
本申请实施例涉及声音处理技术,尤其涉及一种咳嗽声音识别方法、设备和存储介质。
背景技术
咳嗽是某些疾病(如哮喘等)治疗效果或病情进展的评估指标。详细准确的咳嗽状态信息(如每小时咳嗽次数、咳嗽时间等),对疾病诊断有着重要的临床指导意义。有研究指出,智能的咳嗽监测设备,比人工辨别咳嗽更加准确。目前的智能咳嗽监测设备主要是用于医学监护用途,需要患者佩戴复杂设备进行监测,这无疑给使用者带来了不便。
目前,有研究通过将咳嗽声音的特性和语音识别技术相结合,建立咳嗽模型,采用基于动态时间规整算法(Dynamic Time Warping,DTW)的模型匹配方法对特定人的孤立咳嗽声音进行识别。能通过监测使用者发出的声音对咳嗽情况进行监测,无需使用者佩戴任何检测部件。
实现本申请过程中,发明人发现相关技术中至少存在如下问题:基于DTW算法的咳嗽声音识别方法,算法复杂度高、计算量大,对硬件设备要求更高。
发明内容
本申请的目的在于提供一种咳嗽声音识别方法、设备和存储介质,能对咳嗽声音进行识别,且算法简单、计算量小,对硬件设备要求低。
为实现上述目的,第一方面,本申请实施例提供了一种咳嗽声音识别方法,用于识别设备,所述方法包括:
采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵;
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征;
确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型;
如果匹配,则确认所述声音信号为咳嗽声音。
可选的,所述方法还包括:
预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
可选的,所述预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型,包括:
采集预设数量的咳嗽声音样本信号并获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵;
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征;
将所述咳嗽声音样本信号的信号特征作为输入,训练支持向量数据描述算法模型,以获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
可选的,所述信号特征包括能量特征、局部特征以及整体趋势特征中的一种或多种子信号特征。
可选的,若所述信号特征包括能量特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧咳嗽声音样本信号的能量系数;
将所述连续帧咳嗽声音样本信号的能量系数基于动态时间归整算法归整到预设长度获得所述咳嗽声音样本信号的能量特征;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧声音信号的能量系数;
将所述连续帧声音信号的能量系数基于动态时间归整算法归整到预设长度获得所述声音信号的能量特征。
可选的,若所述信号特征包括局部特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的连续S2帧咳嗽声音样本信号的梅尔频率倒谱系数,所述S2为正整数;
基于所述S2帧咳嗽声音样本信号的能量系数确定所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重,并根据所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重对所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数进行加权求和,获得所述咳嗽声音样本信号的局部特征,所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重与所述S2帧咳嗽声音样本信号的能量系数正相关;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的S2帧声音信号的梅尔频率倒谱系数;
基于所述S2帧声音信号的能量系数确定所述S2帧声音信号的梅尔频率倒谱系数的权重,并根据所述S2帧声音信号的梅尔频率倒谱系数的权重对所述S2帧声音信号的梅尔频率倒谱系数进行加权求和,获得所述声音信号的局部特征,所述S2帧声音信号的梅尔频率倒谱系数的权重与所述S2帧声音信号的能量系数正相关。
可选的,若所述信号特征包括整体趋势特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
采用线性判别分析算法对所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述咳嗽声音样本信号的整体趋势特征;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
采用线性判别分析算法对所述声音信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述声音信号的整体趋势特征。
可选的,所述基于支持向量数据描述算法的咳嗽信号特征模型包括基于支持向量数据描述算法的能量特征模型,基于支持向量数据描述算法的局部特征模型以及基于支持向量数据描述算法的整体趋势特征模型中的一种或多种基于支持向量数据描述算法的子信号特征模型;
若所述基于支持向量数据描述算法的咳嗽信号特征模型包括多种基于支持向量数据描述算法的子信号特征模型,所述确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型,包括:
分别确定所述信号特征中的多种子信号特征是否均匹配预先获取的所述多 种基于支持向量数据描述算法的子信号特征模型。
第二方面,本申请实施例还提供了一种咳嗽声音识别设备,所述咳嗽声音识别设备包括:
声音输入单元,用于接收声音信号;
信号处理单元,用于对所述声音信号进行模拟信号处理;
所述信号处理单元与咳嗽声音识别设备内置或者外置的运算处理单元相连,所述运算处理单元包括:
至少一个处理器;以及,
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行:
采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵;
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征;
确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型;
如果匹配,则确认所述声音信号为咳嗽声音。
可选的,所述至少一个处理器还能够执行:
预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
可选的,所述预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型,包括:
采集预设数量的咳嗽声音样本信号并获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵;
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征;
将所述咳嗽声音样本信号的信号特征作为输入,训练支持向量数据描述算法模型,以获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
可选的,所述信号特征包括能量特征、局部特征以及整体趋势特征中的一种或多种子信号特征。
可选的,若所述信号特征包括能量特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧咳嗽声音样本信号的能量系数;
将所述连续帧咳嗽声音样本信号的能量系数基于动态时间归整算法归整到预设长度获得所述咳嗽声音样本信号的能量特征;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧声音信号的能量系数;
将所述连续帧声音信号的能量系数基于动态时间归整算法归整到预设长度获得所述声音信号的能量特征。
可选的,若所述信号特征包括局部特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的连续S2帧咳嗽声音样本信号的梅尔频率倒谱系数,所述S2为正整数;
基于所述S2帧咳嗽声音样本信号的能量系数确定所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重,并根据所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重对所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数进行加权求和,获得所述咳嗽声音样本信号的局部特征,所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重与所述S2帧咳嗽声音样本信号的能量系数正相关;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的S2帧声音信号的梅尔频率倒谱系数;
基于所述S2帧声音信号的能量系数确定所述S2帧声音信号的梅尔频率倒谱系数的权重,并根据所述S2帧声音信号的梅尔频率倒谱系数的权重对所述S2帧声音信号的梅尔频率倒谱系数进行加权求和,获得所述声音信号的局部特征,所述S2帧声音信号的梅尔频率倒谱系数的权重与所述S2帧声音信号的能量系数正相关。
可选的,若所述信号特征包括整体趋势特征,所述从所述咳嗽声音样本信 号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
采用线性判别分析算法对所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述咳嗽声音样本信号的整体趋势特征;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
采用线性判别分析算法对所述声音信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述声音信号的整体趋势特征。
可选的,所述基于支持向量数据描述算法的咳嗽信号特征模型包括基于支持向量数据描述算法的能量特征模型,基于支持向量数据描述算法的局部特征模型以及基于支持向量数据描述算法的整体趋势特征模型中的一种或多种基于支持向量数据描述算法的子信号特征模型;
若所述基于支持向量数据描述算法的咳嗽信号特征模型包括多种基于支持向量数据描述算法的子信号特征模型,所述确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型,包括:
分别确定所述信号特征中的多种子信号特征是否均匹配预先获取的多种所述基于支持向量数据描述算法的子信号特征模型。
第三方面,本申请实施例还提供了一种存储介质,所述存储介质存储有可执行指令,所述可执行指令被咳嗽声音识别设备执行时,使所述咳嗽声音识别设备执行上述的方法。
第四方面,本申请实施例还提供了一种程序产品,所述程序产品包括存储在存储介质上的程序,所述程序包括程序指令,当所述程序指令被咳嗽声音识别设备执行时,使所述咳嗽声音识别设备执行上述的方法。
本申请实施例提供的咳嗽声音识别方法、设备和存储介质,能对咳嗽声音进行识别,从而能够通过监测使用者发出的声音对咳嗽情况进行监测,无需使用者佩戴任何检测部件。且由于采用基于MFCC特征参数和SVDD模型的识别算法,算法复杂度低、计算量少,从而对硬件要求低,降低了产品制造成本。
附图说明
一个或多个实施例通过与之对应的附图中的图片进行示例性说明,这些示例性说明并不构成对实施例的限定,附图中具有相同参考数字标号的元件表示 为类似的元件,除非有特别申明,附图中的图不构成比例限制。
图1是本申请各实施例的应用环境的结构示意图;
图2是咳嗽声音信号的时间-幅度图;
图3是咳嗽声音信号的时间-频率图;
图4是MFCC系数计算过程中梅尔频率滤波处理示意图;
图5是本申请实施例提供的咳嗽声音识别方法中预先获得基于支持向量数据描述算法的特征模型的流程示意图;
图6是本申请实施例提供的咳嗽声音识别方法的流程示意图;
图7是本申请实施例提供的咳嗽声音识别装置的结构示意图;
图8是本申请实施例提供的咳嗽声音识别设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提出一种基于梅尔频率倒谱系数(Mel Frequency Cepstral Coefficients,MFCC)特征参数和支持向量数据描述算法(Support Vector Data Description,SVDD)模型的咳嗽声音识别方案,适用于图1所示的应用环境。所述应用环境包括用户10和咳嗽声音识别设备20,咳嗽声音识别设备20用于接收用户10发出的声音,并对该声音进行识别,以确定该声音是否为咳嗽声音。
进一步的,在识别出该声音为咳嗽声音之后,所述咳嗽声音识别设备20还可以对咳嗽声音进行记录和处理,以输出用户10的咳嗽情况信息,该咳嗽情况信息可以包括咳嗽声音的次数、咳嗽声音的时长以及咳嗽声音的分贝。例如,可以通过在咳嗽声音识别设备中包括计数器,用于在检测到咳嗽声音时,对咳嗽声音进行计数统计;可以通过在咳嗽声音识别设备中包括计时器,用于在检测到咳嗽声音时,对咳嗽声音的持续时长进行统计;可以通过在咳嗽声音识别设备中包括分贝检测装置,用于在检测到咳嗽声音时,检测该咳嗽声音的分贝。
本申请实施例对咳嗽声音的识别原理与语音识别的原理相似,都是将输入的声音经过处理后将其和声音模型进行比较,从而得到识别结果。其可分为两个阶段,分别为咳嗽声音模型训练阶段和咳嗽声音识别阶段。咳嗽声音模型训练阶段主要是采集一定数量的咳嗽声音样本,计算咳嗽声音信号的MFCC特征参数,从MFCC特征参数中提取信号特征,将所述信号特征基于SVDD算法进行模型训练,得到咳嗽声音的参考特征模型。在咳嗽声音识别阶段,对需要判断的声音,计算其MFCC特征参数,并提取与特征模型对应的信号特征,然后判断信号特征是否匹配特征模型,如果匹配,则判为咳嗽声音,否则判为非咳嗽声音。其识别过程主要包括预处理、特征提取、模型训练、模式匹配及判决等。
其中,在预处理步骤,包括采样咳嗽声音信号以及计算所述咳嗽声音信号的MFCC系数。在特征提取步骤,从MFCC系数矩阵中选择咳嗽声音信号的能量特征、整体趋势特征和局部特征,作为输入获取SVDD模型。在模型训练步骤,根据从咳嗽声音信号的MFCC系数矩阵中提取出的三类特征,分别训练出三个SVDD模型,分别为SVDD能量特征模型、SVDD局部特征模型和SVDD整体趋势特征模型。在模式匹配及判决步骤,利用三个SVDD模型来识别新的声音信号是否为咳嗽声音信号。首先计算声音信号的MFCC系数矩阵,然后从MFCC系数矩阵中提取声音信号的能量特征、整体趋势特征和局部特征,再分别判断上述三个特征是否均匹配SVDD能量特征模型、SVDD局部特征模型和SVDD整体趋势特征模型,如果匹配,则判断所述声音信号为咳嗽声音信号,否则,判断所述声音信号不是咳嗽声音信号。
MFCC结合SVDD识别咳嗽声音的方案可以简化算法的复杂度,减少计算量,并能够显著提高咳嗽声音识别的准确性。
本申请实施例提供了一种咳嗽声音识别方法,可以用于上述的咳嗽声音识别设备20,所述咳嗽声音识别方法需要预先获得基于支持向量数据描述算法的特征模型,即基于SVDD算法的特征模型,该基于SVDD算法的特征模型可以是预先配置的,也可以通过步骤101至步骤103中的方法训练得到,在训练得到基于SVDD算法的特征模型后,后续可基于该基于SVDD算法的特征模型识别咳嗽声音,更进一步地,若由于场景变换或其它原因导致该基于SVDD算法的特征模型用于识别咳嗽声音时准确率不合格,可重新配置或训练基于SVDD 算法的特征模型。
其中,如图5所示,所述预先获得基于支持向量数据描述算法的特征模型包括:
步骤101:采集预设数量的咳嗽声音样本信号并获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵;
采样得到咳嗽声音样本信号s(n),并根据所述咳嗽声音样本信号获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵。梅尔频率倒谱系数主要用于声音数据特征提取和降低运算维度。例如:对于一帧有512维(采样点)的数据,经过MFCC处理后可以提取出最重要的40维数据,同时也达到了降维的目的。梅尔频率倒谱系数计算一般包括:预加重、分帧、加窗、快速傅里叶变换、梅尔滤波器组和离散余弦变换。
获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵具体包括以下步骤:
①预加重
预加重的目的是提升高频部分,使信号的频谱变得平坦,保持在低频到高频的整个频带中,能用同样的信噪比求频谱。同时,也是为了消除发生过程中声带和嘴唇的效应,来补偿声音信号受到发音系统所抑制的高频部分,也为了突出高频的共振峰。其实现方法是将经采样后的咳嗽声音样本信号s(n)通过一个一阶有限长单位冲激响应(Finite Impulse Response,FIR)高通数字滤波器来进行预加重,其传递函数为:
H(z)=1-a·z-1   (1)
其中,z表示输入信号,时域表示即为咳嗽声音样本信号s(n),a表示预加重系数,一般取0.9~1.0中的常数。
②分帧
将咳嗽声音样本信号s(n)中每P个采样点集合成一个观测单位,称为帧。P的值可以取256或512,涵盖的时间约为20~30ms左右。为了避免相邻两帧的变化过大,可以让两相邻帧之间有一段重叠区域,此重叠区域包含了M个取样点,M的值可以约为P的1/2或1/3。通常声音信号的采样频率为8KHz或16KHz,以8KHz来说,若帧长度为256个采样点,则对应的时间长度是256/8000×1000=32ms。
③加窗
将每一帧乘以汉明窗,以增加帧左端和右端的连续性。假设分帧后的信号为S(n),n=0,1…,P-1,P为帧的大小,那么乘上汉明窗后:S′(n)=S(n)×W(n),其中,
Figure PCTCN2017095263-appb-000001
其中,l表示窗长。
④快速傅里叶变换(Fast Fourier Transform,FFT)
由于信号在时域上的变换通常很难看出信号的特性,所以通常将它转换为频域上的能量分布来观察,不同的能量分布,就能代表不同声音的特性。所以在乘上汉明窗后,每帧还必须再经过快速傅里叶变换以得到在频谱上的能量分布。对分帧加窗后的各帧信号进行快速傅里叶变换得到各帧的频谱。并对语音信号的频谱取模平方得到声音信号的功率谱。
⑤三角带通滤波器滤波
将能量谱通过一组梅尔尺度的三角形滤波器组进行滤波。定义一个有M个滤波器的滤波器组(滤波器的个数和临界带的个数相近),采用的滤波器为三角滤波器,中心频率为f(m),m=1,2,...,M。M可以取22-26。各f(m)之间的间隔随着m值的减小而缩小,随着m值的增大而增宽,请参照图4。
三角滤波器的频率响应定义为:
Figure PCTCN2017095263-appb-000002
其中
Figure PCTCN2017095263-appb-000003
⑥离散余弦变换
计算每个滤波器组输出的对数能量为:
Figure PCTCN2017095263-appb-000004
对对数能量s(m)经离散余弦变换(Dual Clutch Transmission,DCT)得到MFCC系数:
Figure PCTCN2017095263-appb-000005
步骤102:从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征;
由式(5)可知,MFCC系数为一个N*L的系数矩阵,其中N为声音信号帧数,L为MFCC系数长度。由于MFCC系数矩阵维度较高,且声音信号长度不一致导致矩阵行数N不同,MFCC系数矩阵无法作为直接输入获得SVDD模型。因此需要进一步的从MFCC系数矩阵中提取有效特征,以用于直接输入SVDD模型。
为了进一步从MFCC系数矩阵中提取有效特征,需要对MFCC系数矩阵进行降维,而直接对MFCC矩阵降维可能会损失咳嗽声音信号有效特征,可以结合咳嗽声音信号的时域与频域特性在MFCC系数矩阵中提取有效特征。
请参考图2,图2为咳嗽声音信号的时间-幅度图(时域图),从图2可以看出,咳嗽声音信号的发生过程很短,具有明显的突发性,单声咳嗽声音所持续的时长通常小于550ms,甚至患上严重的咽喉和支气管疾病的病人,他们的单声咳嗽声音的时长也一般维持在1000ms左右。从能量上看,咳嗽声音信号的能量主要集中在信号的前半部分。因此,可以选择能量相对集中的信号段的能量系数作为能量特征来表征咳嗽声音样本信号的特性,例如从所述咳嗽声音样本信号中选择前1/2部分信号的一组能量系数作为能量特征,并将该能量特征作为输入,建立SVDD模型对声音信号进行识别。
由于各个咳嗽声音样本信号长度不一致将导致参数矩阵行数N不同,则能量系数的长度亦不同。因此需要将所述能量系数统一归一化到相同长度。
具体的,从咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取能量特征包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧咳嗽声音样本信号的能量系数;
将所述连续帧咳嗽声音样本信号的能量系数基于DTW算法归整到预设长度获得所述咳嗽声音样本信号的能量特征。
具体应用中,结合咳嗽声音信号的能量分布,所述能量系数之和最大的预设比例的连续帧咳嗽声音样本信号,可以是咳嗽声音样本信号的前1/2部分、前 4/7部分或者前5/9部分等等。其中,预设长度可以根据实际应用情况进行设定。
从图2中可以看出,大部分咳嗽声音信号(约90%)变化趋势基本一致,在咳嗽脉冲发生之后,信号能量迅速降低,干咳时下降速度较快,湿咳时下降速度稍慢。因此,咳嗽声音信号的变化趋势能很好的表征咳嗽声音信号的特性,可以从咳嗽声音信号的MFCC系数矩阵中提取整体趋势特征(整体趋势特征能反映信号的变化趋势),并将该整体趋势特征作为输入,建立SVDD模型对声音信号进行识别。
具体的,咳嗽声音样本信号的整体趋势特征可以采用线性判别分析算法(Landing Distance Available,LDA),对所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵进行降维处理获得。
图3为咳嗽声音信号的时间-频率图(频谱图),从图3可以看出,频谱能量也集中在信号开始段,且频率分布范围较广(一般集中在200~6000Hz内)。因此,可以选择咳嗽声音样本信号中频谱能量集中的几帧信号的MFCC系数作为局部特征来表征咳嗽声音信号的特性,并将该局部特征作为输入,建立SVDD模型对声音信号进行识别。具体的,局部特征可以通过如下方法获得:从咳嗽声音样本信号中选取能量最为集中的几帧信号,然后为各帧信号的MFCC系数分配不同的权重并相加,可以获得咳嗽声音样本信号的局部特征。因为咳嗽声音样本信号的梅尔频率倒谱系数的权重与所述咳嗽声音样本信号的能量系数正相关,因此权重值可以根据嗽声音样本信号的能量系数确定。即:从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的连续S2帧咳嗽声音样本信号的梅尔频率倒谱系数,所述S2为正整数;然后基于所述S2帧咳嗽声音样本信号的能量系数确定所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重,并根据所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重对所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数进行加权求和,获得所述咳嗽声音样本信号的局部特征。
通过上述分析,能量特征、局部特征和整体趋势特征能反映咳嗽声音信号的特性,从咳嗽声音样本信号的MFCC系数矩阵中提取能量特征、局部特征、整体趋势特征中的一种或多种子信号特征。并将该一种或多种子信号特征作为输入,建立SVDD模型对声音信号进行识别,大大提高了咳嗽声音识别的准确率,同时降低了误识别率。为提高咳嗽声音识别的准确率,可以在MFCC系数 矩阵中同时提取能量特征、局部特征和整体趋势特征。在从咳嗽声音样本信号的MFCC系数矩阵中同时提取能量特征、局部特征和整体趋势特征作为输入,训练SVDD模型对声音信号进行识别时,咳嗽声音的识别率可以达到95%以上。
也可以采用其他降维方法对咳嗽声音样本信号的MFCC系数进行降维,例如采用DTW、主成分分析(Principal Component Analysis,PCA)等算法对MFCC系数进行降维。在采用PCA算法对咳嗽声音样本信号的MFCC系数进行降维,并利用降维后的参数训练SVDD模型的场合,获得的咳嗽声音信号的SVDD模型,对咳嗽声音与噪声区分度很小,咳嗽声音识别率约为85%,噪声误识别率达到65%。
步骤103:将所述咳嗽声音样本信号的信号特征作为输入,训练支持向量数据描述算法模型,以获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
在所述信号特征包括能量特征、局部特征和整体趋势特征的场合,分别将所述能量特征、局部特征和整体趋势特征作为输入,训练SVDD模型,即训练能量特征的SVDD模型(能量特征模型)、局部特征的SVDD模型(局部特征模型)和整体趋势特征的SVDD模型(整体趋势特征模型)。从而获得由能量特征模型、局部特征模型和整体趋势特征模型组成的基于支持向量数据描述算法的咳嗽信号特征模型。
SVDD基本原理是为输入样本计算一个球状的决策边界,将整个空间划分为两部分,一部分是边界内的空间,看作可接受的部分;另一部分则是边界外的空间,看作是拒绝的部分。这就使SVDD具有一类样本的分类特征。
具体地,SVDD的优化目标就是,求一个中心为a,半径为R的最小球面:
Figure PCTCN2017095263-appb-000006
使得这个球面满足(对于3维以上的数据xi,该球面即为超球面。其中,超球面是指3维以上的空间中的球面,对应的2维空间中就是曲线,3维空间中就是球面):
Figure PCTCN2017095263-appb-000007
满足这个条件就是说要把训练数据集中的数据点都包含在球面里,其中xi表示输入样本数据,即咳嗽声音样本信号。
现在有了要求解的目标,又有了约束,接下来的求解方法可以采用Lagrangian乘子法:
Figure PCTCN2017095263-appb-000008
其中αi≥0,γi≥0,分别对参数R,a,ξi求偏导并令导数等于0得到:
Figure PCTCN2017095263-appb-000009
Figure PCTCN2017095263-appb-000010
Figure PCTCN2017095263-appb-000011
将上面(7)、(8)、(9)代入式(6)中,便可得到其对偶问题:
Figure PCTCN2017095263-appb-000012
其中0≤αi≤C,
Figure PCTCN2017095263-appb-000013
上面的向量内积可以通过核函数K解决,即:
Figure PCTCN2017095263-appb-000014
通过上述计算过程可以得到中心a,半径为R的取值,也即确定了SVDD模型。分别利用上述计算过程训练得到3个SVDD模型的中心a1、a2、a3和半径R1、R2、R3后,分别对应能量特征模型、局部特征模型和整体趋势特征模型,训练过程完成。
在训练的过程中,我们一方面通过控制超球的大小和范围使超球面包含住尽可能多的样本点,另一方面我们又要求它的半径达到最小,使其达到最优化的分类效果。
具体地,在本申请实施例中,能量特征模型、局部特征模型和整体趋势特征模型,每一模型对应一个超球面,并在包含所有咳嗽声音信号前提下,优化超球面边界,使得它的半径达到最小,最终得到最符合要求的基于支持向量数据描述算法的咳嗽信号特征模型,从而使得利用该基于支持向量数据描述算法的咳嗽信号特征模型对提取到的声音信号的信号特征进行识别时准确率高。
如图6所示,所述咳嗽声音识别方法包括:
步骤201:采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵;
在实际应用中,可以在所述咳嗽声音识别设备20上设置声音输入单元(例如麦克风)来采集声音信号,对声音信号进行放大、滤波等处理后转换成数字信号。该数字信号可以在咳嗽声音识别设备20本地的运算处理单元中进行采样及其他计算处理,也可以通过网络上传到云端服务器、智能终端或者其他服务器中进行处理。
其中,获取声音信号的梅尔频率倒谱系数特征参数矩阵的技术细节请参照步骤101,在此不再赘述。
步骤202:从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征;
在预先获取的基于支持向量数据描述算法的特征模型包括能量特征模型、局部特征模型和整体趋势特征模型的场合,可以从声音信号的特征参数矩阵中提取能量特征、局部特征和整体趋势特征中的一种或者几种。为提高识别准确率,可以全部提取三个特征,即能量特征、局部特征和整体趋势特征。其中,所述声音信号的能量特征、局部特征和整体趋势特征的具体计算方法请参照步骤102,在此不再赘述。
步骤203:确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型;
在预先获取的基于支持向量数据描述算法的特征模型包括能量特征模型、局部特征模型和整体趋势特征模型的场合,分别判断步骤202中获取的能量特征、局部特征和整体趋势特征是否符合特征模型,即能量特征是否符合能量特征模型、局部特征是否符合局部特征模型以及整体趋势特征是否符合整体趋势特征模型。由步骤103的论述可知,能量特征模型、局部特征模型和整体趋势特征模型分别是中心为a1、a2、a3,半径为R1、R2、R3的超球面模型。在判断能量特征、局部特征和整体趋势特征是否符合特征模型时,可以分别计算能量特征、局部特征和整体趋势特征到中心a1、a2、a3的距离D1、D2、D3,只有当三个特征全部在SVDD模型边界内(即D1<R1,D2<R2,D3<R3)时,才能判定该声音样本为咳嗽声音。
步骤204:如果匹配,则确认所述声音信号为咳嗽声音。
本申请实施例提供的咳嗽声音识别方法,能对咳嗽声音进行识别,从而能够通过监测使用者发出的声音对咳嗽情况进行监测,无需使用者佩戴任何检测部件。且由于采用基于MFCC特征参数和SVDD模型的识别算法,算法复杂度低、计算量少,从而对硬件要求低,降低了产品制造成本。
相应的,本申请实施例还提供了一种咳嗽声音识别装置,用于识别设备20,所述装置包括:
采样及特征参数获取模块301,用于采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵;
信号特征提取模块302,用于从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征;
特征匹配模块303,用于确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型;
确认模块304,用于如果所述信号特征匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型,则确认所述声音信号为咳嗽声音。
本申请实施例提供的咳嗽声音识别装置,能对咳嗽声音进行识别,从而能够通过监测使用者发出的声音对咳嗽情况进行监测,无需使用者佩戴任何检测部件。且由于采用基于MFCC特征参数和SVDD模型的识别算法,算法复杂度低、计算量少,从而对硬件要求低,降低了产品制造成本。
可选的,在所述装置的其他实施例中,所述装置还包括:
特征模型预设模块,用于预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型;
所述特征模型预设模块,具体用于:
采集预设数量的咳嗽声音样本信号并获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵;
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征;
将所述咳嗽声音样本信号的信号特征作为输入,训练支持向量数据描述算法模型,以获取基于支持向量数据描述算法的咳嗽信号特征模型。
其中,可选的,在所述装置的某些实施例中,所述信号特征包括:所述信号特征包括能量特征、局部特征以及整体趋势特征中的一种或多种子信号特征。
可选的,在所述装置的某些实施例中,若所述信号特征包括能量特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧咳嗽声音样本信号的能量系数;
将所述连续帧咳嗽声音样本信号的能量系数基于动态时间归整算法归整到预设长度;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧声音信号的能量系数;
将所述连续帧声音信号的能量系数基于动态时间归整算法归整到预设长度。
可选的,在所述装置的某些实施例中,若所述信号特征包括局部特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的连续S2帧咳嗽声音样本信号的梅尔频率倒谱系数,所述S2为正整数;
基于所述S2帧咳嗽声音样本信号的能量系数确定所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重,并根据所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重对所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数进行加权求和,获得所述咳嗽声音样本信号的局部特征,所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重与所述S2帧咳嗽声音样本信号的能量系数正相关;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的S2帧声音信号的梅尔频率倒谱系数;
基于所述S2帧声音信号的能量系数确定所述S2帧声音信号的梅尔频率倒谱系数的权重,并根据所述S2帧声音信号的梅尔频率倒谱系数的权重对所述S2帧声音信号的梅尔频率倒谱系数进行加权求和,获得所述声音信号的局部特征, 所述S2帧声音信号的梅尔频率倒谱系数的权重与所述S2帧声音信号的能量系数正相关。
可选的,在所述装置的某些实施例中,若所述信号特征包括整体趋势特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
采用线性判别分析算法对所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述咳嗽声音样本信号的整体趋势特征;
所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
采用线性判别分析算法对所述声音信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述声音信号的整体趋势特征。
可选的,在所述装置的某些实施例中,所述基于支持向量数据描述算法的咳嗽信号特征模型包括基于支持向量数据描述算法的能量特征模型,基于支持向量数据描述算法的局部特征模型以及基于支持向量数据描述算法的整体趋势特征模型中的一种或多种基于支持向量数据描述算法的子信号特征模型;
若所述基于支持向量数据描述算法的咳嗽信号特征模型包括多种基于支持向量数据描述算法的子信号特征模型,所述确认所述咳嗽信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型,包括:
分别确定所述信号特征中的多种子信号特征是否均匹配预先获取的多种所述基于支持向量数据描述算法的子信号特征模型。
需要说明的是,上述装置可执行本申请实施例所提供的方法,具备执行方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
本申请实施例还提供了一种咳嗽声音识别设备,如图8所示,所述咳嗽声音识别设备20包括声音输入单元21、信号处理单元22和运算处理单元23。其中:声音输入单元21,用于接收声音信号,所述声音输入单元可以例如是麦克风等。信号处理单元22,用于对所述声音信号进行信号处理;所述信号处理单元22可以对所述声音信号进行放大、滤波、数模转换等模拟信号处理,将获得的数字信号发送给运算处理单元23。
所述信号处理单元22与咳嗽声音识别设备内置或者外置的运算处理单元23 相连(图8以运算处理单元内置在咳嗽声音识别设备中为例说明),运算处理单元23可以内置在咳嗽声音识别设备20上,也可以外置在咳嗽声音识别设备20外部,所述运算处理单元23还可以是远程设置的服务器,例如可以是通过网络与咳嗽声音识别设备20通信连接的云端服务器、智能终端或者其他服务器。
所述运算处理单元23包括:
至少一个处理器232(图8中以一个处理器举例说明)和存储器231,处理器232和存储器231可以通过总线或者其他方式连接,图8中以通过总线连接为例。
存储器231用于存储非易失性软件程序、非易失性计算机可执行程序以及模块,如本申请实施例中的咳嗽声音识别方法对应的程序指令/模块(例如,附图7所示的采样及特征参数获取模块301)。处理器232通过运行存储在存储器231中的非易失性软件程序、指令以及模块,从而执行各种功能应用以及数据处理,即实现上述方法实施例的咳嗽声音识别方法。
存储器231可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据咳嗽声音识别装置使用所创建的数据等。此外,存储器231可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器231可选包括相对于处理器232远程设置的存储器,这些远程存储器可以通过网络连接至咳嗽声音识别装置。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
所述一个或者多个模块存储在所述存储器231中,当被所述一个或者多个处理器232执行时,执行上述任意方法实施例中的咳嗽声音识别方法,例如,执行以上描述的图5中的方法步骤101-103,图6中的方法步骤201至步骤204;实现图7中的模块301-304的功能。
本申请实施例提供的咳嗽声音识别设备,能对咳嗽声音进行识别,从而能够通过监测使用者发出的声音对咳嗽情况进行监测,无需使用者佩戴任何检测部件。且由于采用基于MFCC特征参数和SVDD模型的识别算法,算法复杂度低、计算量少,从而对硬件要求低,降低了产品制造成本。
上述咳嗽声音识别设备可执行本申请实施例所提供的方法,具备执行方法 相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节,可参见本申请实施例所提供的方法。
本申请实施例提供了一种存储介质,所述存储介质存储有计算机可执行指令,该计算机可执行指令被一个或多个处理器执行(例如图8中的一个处理器232),可使得上述一个或多个处理器可执行上述任意方法实施例中的咳嗽声音识别方法,例如,执行以上描述的图5中的方法步骤101-103,图6中的方法步骤201至步骤204;实现图7中的模块301-304的功能。
以上所描述的实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
通过以上的实施例的描述,本领域普通技术人员可以清楚地了解到各实施例可借助软件加通用硬件平台的方式来实现,当然也可以通过硬件。本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
最后应说明的是:以上实施例仅用以说明本申请的技术方案,而非对其限制;在本申请的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,步骤可以以任意顺序实现,并存在如上所述的本申请的不同方面的许多其它变化,为了简明,它们没有在细节中提供;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (17)

  1. 一种咳嗽声音识别方法,其特征在于,所述方法包括:
    采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵;
    从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征;
    确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型;
    如果匹配,则确认所述声音信号为咳嗽声音。
  2. 根据权利要求1所述的咳嗽声音识别方法,其特征在于,所述方法还包括:
    预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
  3. 根据权利要求2所述的咳嗽声音识别方法,其特征在于,所述预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型,包括:
    采集预设数量的咳嗽声音样本信号并获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵;
    从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征;
    将所述咳嗽声音样本信号的信号特征作为输入,训练支持向量数据描述算法模型,以获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
  4. 根据权利要求3所述的咳嗽声音识别方法,其特征在于,所述信号特征包括能量特征、局部特征以及整体趋势特征中的一种或多种子信号特征。
  5. 根据权利要求4所述的咳嗽声音识别方法,其特征在于,若所述信号特征包括能量特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
    从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧咳嗽声音样本信号的能量系数;
    将所述连续帧咳嗽声音样本信号的能量系数基于动态时间归整算法归整到预设长度获得所述咳嗽声音样本信号的能量特征;
    所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
    从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧声音信号的能量系数;
    将所述连续帧声音信号的能量系数基于动态时间归整算法归整到预设长度获得所述声音信号的能量特征。
  6. 根据权利要求4所述的咳嗽声音识别方法,其特征在于,若所述信号特征包括局部特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
    从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的连续S2帧咳嗽声音样本信号的梅尔频率倒谱系数,所述S2为正整数;
    基于所述S2帧咳嗽声音样本信号的能量系数确定所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重,并根据所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重对所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数进行加权求和,获得所述咳嗽声音样本信号的局部特征,所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重与所述S2帧咳嗽声音样本信号的能量系数正相关;
    所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
    从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的S2帧声音信号的梅尔频率倒谱系数;
    基于所述S2帧声音信号的能量系数确定所述S2帧声音信号的梅尔频率倒谱系数的权重,并根据所述S2帧声音信号的梅尔频率倒谱系数的权重对所述S2帧声音信号的梅尔频率倒谱系数进行加权求和,获得所述声音信号的局部特征,所述S2帧声音信号的梅尔频率倒谱系数的权重与所述S2帧声音信号的能量系数正相关。
  7. 根据权利要求4所述的咳嗽声音识别方法,其特征在于,若所述信号特征包括整体趋势特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
    采用线性判别分析算法对所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述咳嗽声音样本信号的整体趋势特征;
    所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征, 包括:
    采用线性判别分析算法对所述声音信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述声音信号的整体趋势特征。
  8. 根据权利要求4所述的咳嗽声音识别方法,其特征在于,所述基于支持向量数据描述算法的咳嗽信号特征模型包括基于支持向量数据描述算法的能量特征模型,基于支持向量数据描述算法的局部特征模型以及基于支持向量数据描述算法的整体趋势特征模型中的一种或多种基于支持向量数据描述算法的子信号特征模型;
    若所述基于支持向量数据描述算法的咳嗽信号特征模型包括多种基于支持向量数据描述算法的子信号特征模型,所述确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型,包括:
    分别确定所述信号特征中的多种子信号特征是否均匹配预先获取的所述多种基于支持向量数据描述算法的子信号特征模型。
  9. 一种咳嗽声音识别设备,其特征在于,所述咳嗽声音识别设备包括:
    声音输入单元,用于接收声音信号;
    信号处理单元,用于对所述声音信号进行模拟信号处理;
    所述信号处理单元与咳嗽声音识别设备内置或者外置的运算处理单元相连,所述运算处理单元包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行:
    采样声音信号并获取所述声音信号的梅尔频率倒谱系数特征参数矩阵;
    从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征;
    确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型;
    如果匹配,则确认所述声音信号为咳嗽声音。
  10. 根据权利要求9所述的咳嗽声音识别设备,其特征在于,所述至少一个处理器还能够执行:
    预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
  11. 根据权利要求10所述的咳嗽声音识别设备,其特征在于,所述预先获取所述基于支持向量数据描述算法的咳嗽信号特征模型,包括:
    采集预设数量的咳嗽声音样本信号并获取所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵;
    从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征;
    将所述咳嗽声音样本信号的信号特征作为输入,训练支持向量数据描述算法模型,以获取所述基于支持向量数据描述算法的咳嗽信号特征模型。
  12. 根据权利要求11所述的咳嗽声音识别设备,其特征在于,所述信号特征包括能量特征、局部特征以及整体趋势特征中的一种或多种子信号特征。
  13. 根据权利要求12所述的咳嗽声音识别设备,其特征在于,若所述信号特征包括能量特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
    从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧咳嗽声音样本信号的能量系数;
    将所述连续帧咳嗽声音样本信号的能量系数基于动态时间归整算法归整到预设长度获得所述咳嗽声音样本信号的能量特征;
    所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
    从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的预设比例的连续帧声音信号的能量系数;
    将所述连续帧声音信号的能量系数基于动态时间归整算法归整到预设长度获得所述声音信号的能量特征。
  14. 根据权利要求12所述的咳嗽声音识别设备,其特征在于,若所述信号特征包括局部特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
    从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的连续S2帧咳嗽声音样本信号的梅尔频率倒谱系数,所述S2为正整数;
    基于所述S2帧咳嗽声音样本信号的能量系数确定所述S2帧咳嗽声音样本 信号的梅尔频率倒谱系数的权重,并根据所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重对所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数进行加权求和,获得所述咳嗽声音样本信号的局部特征,所述S2帧咳嗽声音样本信号的梅尔频率倒谱系数的权重与所述S2帧咳嗽声音样本信号的能量系数正相关;
    所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
    从所述声音信号的梅尔频率倒谱系数特征参数矩阵中选择能量系数之和最大的S2帧声音信号的梅尔频率倒谱系数;
    基于所述S2帧声音信号的能量系数确定所述S2帧声音信号的梅尔频率倒谱系数的权重,并根据所述S2帧声音信号的梅尔频率倒谱系数的权重对所述S2帧声音信号的梅尔频率倒谱系数进行加权求和,获得所述声音信号的局部特征,所述S2帧声音信号的梅尔频率倒谱系数的权重与所述S2帧声音信号的能量系数正相关。
  15. 根据权利要求12所述的咳嗽声音识别设备,其特征在于,若所述信号特征包括整体趋势特征,所述从所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵中提取所述信号特征,包括:
    采用线性判别分析算法对所述咳嗽声音样本信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述咳嗽声音样本信号的整体趋势特征;
    所述从所述声音信号的梅尔频率倒谱系数特征参数矩阵中提取信号特征,包括:
    采用线性判别分析算法对所述声音信号的梅尔频率倒谱系数特征参数矩阵进行降维处理,获得所述声音信号的整体趋势特征。
  16. 根据权利要求12所述的咳嗽声音识别设备,其特征在于,所述基于支持向量数据描述算法的咳嗽信号特征模型包括基于支持向量数据描述算法的能量特征模型,基于支持向量数据描述算法的局部特征模型以及基于支持向量数据描述算法的整体趋势特征模型中的一种或多种基于支持向量数据描述算法的子信号特征模型;
    若所述基于支持向量数据描述算法的咳嗽信号特征模型包括多种基于支持向量数据描述算法的子信号特征模型,所述确认所述信号特征是否匹配预先获取的基于支持向量数据描述算法的咳嗽信号特征模型,包括:
    分别确定所述信号特征中的多种子信号特征是否均匹配预先获取的多种所述基于支持向量数据描述算法的子信号特征模型。
  17. 一种存储介质,其特征在于,所述存储介质存储有可执行指令,所述可执行指令被咳嗽声音识别设备执行时,使所述咳嗽声音识别设备执行权利要求1-8任意一项所述的方法。
PCT/CN2017/095263 2017-07-31 2017-07-31 咳嗽声音识别方法、设备和存储介质 WO2019023879A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/095263 WO2019023879A1 (zh) 2017-07-31 2017-07-31 咳嗽声音识别方法、设备和存储介质
CN201780008985.4A CN108701469B (zh) 2017-07-31 2017-07-31 咳嗽声音识别方法、设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/095263 WO2019023879A1 (zh) 2017-07-31 2017-07-31 咳嗽声音识别方法、设备和存储介质

Publications (1)

Publication Number Publication Date
WO2019023879A1 true WO2019023879A1 (zh) 2019-02-07

Family

ID=63844118

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/095263 WO2019023879A1 (zh) 2017-07-31 2017-07-31 咳嗽声音识别方法、设备和存储介质

Country Status (2)

Country Link
CN (1) CN108701469B (zh)
WO (1) WO2019023879A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112331231A (zh) * 2020-11-24 2021-02-05 南京农业大学 基于音频技术的肉鸡采食量检测系统
EP3839971A1 (en) * 2019-12-19 2021-06-23 Koninklijke Philips N.V. A cough detection system and method
WO2021189903A1 (zh) * 2020-10-09 2021-09-30 平安科技(深圳)有限公司 基于音频的用户状态识别方法、装置、电子设备及存储介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019023877A1 (zh) * 2017-07-31 2019-02-07 深圳和而泰智能家居科技有限公司 特定声音识别方法、设备和存储介质
CN109360584A (zh) * 2018-10-26 2019-02-19 平安科技(深圳)有限公司 基于深度学习的咳嗽监测方法及装置
CN109498228B (zh) * 2018-11-06 2021-03-30 林枫 基于咳嗽音反馈的肺康复治疗装置
CN109567806A (zh) * 2018-11-08 2019-04-05 广州军区广州总医院 一种创伤性颈髓损伤患者咳嗽音评价系统及评价方法
CN109782666A (zh) * 2019-01-22 2019-05-21 山东钰耀弘圣智能科技有限公司 一种家兔疫病远程自动监测系统与方法
JP7312037B2 (ja) * 2019-06-25 2023-07-20 パナソニック インテレクチュアル プロパティ コーポレーション オブ アメリカ 咳検出装置、咳検出装置の作動方法及びプログラム
CN111179967B (zh) * 2019-12-17 2022-05-24 华南理工大学 颈脊髓损伤患者真假咳嗽音线性分类算法、介质和设备
CN111524537B (zh) * 2020-03-24 2023-04-14 苏州数言信息技术有限公司 针对实时语音流的咳嗽及打喷嚏识别方法
CN113746583A (zh) * 2021-09-18 2021-12-03 鹰潭市广播电视传媒集团有限责任公司 公共播音设备的远程管理系统、方法、装置和存储介质
CN114330454A (zh) * 2022-01-05 2022-04-12 东北农业大学 一种基于ds证据理论融合特征的生猪咳嗽声音识别方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006132596A1 (en) * 2005-06-07 2006-12-14 Matsushita Electric Industrial Co., Ltd. Method and apparatus for audio clip classification
CN101894551A (zh) * 2010-07-02 2010-11-24 华南理工大学 一种咳嗽自动识别方法及装置
CN102664011A (zh) * 2012-05-17 2012-09-12 吉林大学 一种快速说话人识别方法
CN106847262A (zh) * 2016-12-28 2017-06-13 华中农业大学 一种猪呼吸道疾病自动识别报警方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7727161B2 (en) * 2003-04-10 2010-06-01 Vivometrics, Inc. Systems and methods for monitoring cough
US8532800B2 (en) * 2007-05-24 2013-09-10 Mavs Lab. Inc. Uniform program indexing method with simple and robust audio feature enhancing methods
AU2013239327B2 (en) * 2012-03-29 2018-08-23 The University Of Queensland A method and apparatus for processing patient sounds
CN103489446B (zh) * 2013-10-10 2016-01-06 福州大学 复杂环境下基于自适应能量检测的鸟鸣识别方法
CN103730130B (zh) * 2013-12-20 2019-03-01 中国科学院深圳先进技术研究院 一种病理嗓音的检测系统
CN105095624B (zh) * 2014-05-15 2017-08-01 中国电子科技集团公司第三十四研究所 一种光纤传感振动信号的识别方法
US9687208B2 (en) * 2015-06-03 2017-06-27 iMEDI PLUS Inc. Method and system for recognizing physiological sound
CN105147252A (zh) * 2015-08-24 2015-12-16 四川长虹电器股份有限公司 心脏疾病识别及评估方法
CN105761720B (zh) * 2016-04-19 2020-01-07 北京地平线机器人技术研发有限公司 一种基于语音属性分类的交互系统及其方法
CN106847293A (zh) * 2017-01-19 2017-06-13 内蒙古农业大学 设施养殖羊应激行为的声信号监测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006132596A1 (en) * 2005-06-07 2006-12-14 Matsushita Electric Industrial Co., Ltd. Method and apparatus for audio clip classification
CN101894551A (zh) * 2010-07-02 2010-11-24 华南理工大学 一种咳嗽自动识别方法及装置
CN102664011A (zh) * 2012-05-17 2012-09-12 吉林大学 一种快速说话人识别方法
CN106847262A (zh) * 2016-12-28 2017-06-13 华中农业大学 一种猪呼吸道疾病自动识别报警方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3839971A1 (en) * 2019-12-19 2021-06-23 Koninklijke Philips N.V. A cough detection system and method
WO2021122672A1 (en) * 2019-12-19 2021-06-24 Koninklijke Philips N.V. A cough detection system and method
WO2021189903A1 (zh) * 2020-10-09 2021-09-30 平安科技(深圳)有限公司 基于音频的用户状态识别方法、装置、电子设备及存储介质
CN112331231A (zh) * 2020-11-24 2021-02-05 南京农业大学 基于音频技术的肉鸡采食量检测系统
CN112331231B (zh) * 2020-11-24 2024-04-19 南京农业大学 基于音频技术的肉鸡采食量检测系统

Also Published As

Publication number Publication date
CN108701469B (zh) 2023-06-20
CN108701469A (zh) 2018-10-23

Similar Documents

Publication Publication Date Title
WO2019023879A1 (zh) 咳嗽声音识别方法、设备和存储介质
CN108369813B (zh) 特定声音识别方法、设备和存储介质
CN109074822B (zh) 特定声音识别方法、设备和存储介质
Sailor et al. Unsupervised Filterbank Learning Using Convolutional Restricted Boltzmann Machine for Environmental Sound Classification.
CN107610715B (zh) 一种基于多种声音特征的相似度计算方法
WO2019232829A1 (zh) 声纹识别方法、装置、计算机设备及存储介质
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
CN110123367B (zh) 计算机设备、心音识别装置、方法、模型训练装置及存储介质
CN109961017A (zh) 一种基于卷积循环神经网络的心音信号分类方法
CN107910020B (zh) 鼾声检测方法、装置、设备及存储介质
CN108281146A (zh) 一种短语音说话人识别方法和装置
CN108922541A (zh) 基于dtw和gmm模型的多维特征参数声纹识别方法
CN110880329A (zh) 一种音频识别方法及设备、存储介质
CN109036437A (zh) 口音识别方法、装置、计算机装置及计算机可读存储介质
CN110767239A (zh) 一种基于深度学习的声纹识别方法、装置及设备
CN110946554A (zh) 咳嗽类型识别方法、装置及系统
CN109065043A (zh) 一种命令词识别方法及计算机存储介质
US20210027777A1 (en) Method for monitoring phonation and system thereof
CN112820319A (zh) 一种人类鼾声识别方法及其装置
Sahidullah et al. Robust speaker recognition with combined use of acoustic and throat microphone speech
Venkatesan et al. Binaural classification-based speech segregation and robust speaker recognition system
CN115346561A (zh) 基于语音特征的抑郁情绪评估预测方法及系统
CN110570871A (zh) 一种基于TristouNet的声纹识别方法、装置及设备
CN111145726B (zh) 基于深度学习的声场景分类方法、系统、装置及存储介质
CN112329819A (zh) 基于多网络融合的水下目标识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17920145

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17920145

Country of ref document: EP

Kind code of ref document: A1