CN112309423A - Respiratory tract symptom detection method based on smart phone audio perception in driving environment - Google Patents

Respiratory tract symptom detection method based on smart phone audio perception in driving environment Download PDF

Info

Publication number
CN112309423A
CN112309423A CN202011216514.2A CN202011216514A CN112309423A CN 112309423 A CN112309423 A CN 112309423A CN 202011216514 A CN202011216514 A CN 202011216514A CN 112309423 A CN112309423 A CN 112309423A
Authority
CN
China
Prior art keywords
sub
sound
frame
sound signals
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011216514.2A
Other languages
Chinese (zh)
Inventor
李凡
吴玥
解亚东
杨松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN202011216514.2A priority Critical patent/CN112309423A/en
Publication of CN112309423A publication Critical patent/CN112309423A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/08Detecting, measuring or recording devices for evaluating the respiratory organs
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/68Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient
    • A61B5/6887Arrangements of detecting, measuring or recording means, e.g. sensors, in relation to patient mounted on external non-worn devices, e.g. non-medical devices
    • A61B5/6898Portable consumer electronic devices, e.g. music players, telephones, tablet computers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/66Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for extracting parameters related to health condition
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B2503/00Evaluating a particular growth phase or type of persons or animals
    • A61B2503/20Workers
    • A61B2503/22Motor vehicles operators, e.g. drivers, pilots, captains

Abstract

The invention discloses a respiratory tract symptom detection method based on smart phone audio perception in a driving environment. The method includes the steps of collecting sounds in a vehicle by using a loudspeaker of a smart phone, filtering the driving noises of the vehicle by a self-adaptive subband spectral entropy method, extracting acoustic characteristics of the noise-removed sounds and sending the characteristics to a trained neural network, judging whether respiratory symptoms such as cough, sneeze and nose inhalation exist in the collected sounds, and recording the times of the relevant respiratory symptoms. The invention does not depend on various pre-erected professional medical equipment, has low cost, strong anti-interference performance and no privacy leakage problem, and is suitable for the detection environment with stable driving noise and closer distance between a driver and passengers. The invention adopts a denoising method based on the self-adaptive subband spectral entropy to eliminate the influence of various driving noises, so that the system has stronger robustness to environmental noises, and can accurately and efficiently realize the detection and classification of three typical respiratory tract symptoms.

Description

Respiratory tract symptom detection method based on smart phone audio perception in driving environment
Technical Field
The invention relates to a respiratory tract symptom detection method, in particular to a respiratory tract symptom detection method based on audio sensing capability of a smart phone audio sensor, namely a loudspeaker and a microphone in a driving environment, which is mainly used for monitoring whether drivers and passengers have three typical respiratory tract symptoms of cough, sneeze and nose inhalation, and belongs to the technical field of mobile computing application.
Background
Among respiratory symptoms closely related to human health, coughing, sneezing and nose inhalation are the most common respiratory symptoms in daily life. Although these respiratory symptoms appear to be negligible, they do correlate with more than 100 diseases, such as common cold, flu, allergies, and more severe respiratory diseases such as pneumonia, asthma, chronic lung disease, and the like. These respiratory diseases are mostly curable, but still need to be discovered as early as possible, especially infectious respiratory diseases. Thus, detection of respiratory symptoms can help not only individuals to find health problems, but also to prevent infectious diseases, promoting public health development.
Currently, methods for detecting respiratory symptoms rely primarily on specialized medical equipment deployed in hospitals and medical facilities, connected to medical systems. For example, the respiration monitoring device is used for detecting the air flow in and out of the mouth of the patient to judge whether the patient coughs; the patient is tested for abnormal breathing conditions by mounting a device with an accelerometer to the chest of the patient.
However, these methods generally have problems of high cost, difficulty in deployment, and applicability only to hospitals and medical institutions, etc. In the field of mobile computing applications, there are several methods of detecting respiratory symptoms using audio sensors. For example, by having a microphone device worn by a user to collect sounds around the user, it is determined whether the user coughs; the microphone on the mobile phone of the user is used for collecting the sound around the user to judge whether the user has behaviors of coughing, sneezing, nose sucking and the like. However, these methods have problems such as poor interference resistance and applicability only to a relatively quiet indoor environment. In a driving environment, particularly in commercial vehicles such as taxies, due to the small space and the close distance between passengers and drivers, infectious respiratory diseases are easy to spread. Due to the noise in the driving environment and the difficulty in deploying dedicated equipment, the existing methods are not suitable for detecting respiratory symptoms such as coughing, sneezing and inhaling nose in the driving environment.
In view of the foregoing, there is a need for a method for detecting whether a driver and a passenger in a driving environment have respiratory symptoms by using an audio sensor in a driver smart phone.
Disclosure of Invention
The invention aims to solve the problems of high cost and low anti-interference performance of detecting respiratory symptoms of a driver and passengers in a driving environment, and provides a method for detecting respiratory symptoms of cough, sneeze, nose inhalation and the like of the driver or the passengers by using a smart phone audio sensor.
The core idea of the invention is as follows: the method comprises the steps of collecting sounds in a vehicle by using a loudspeaker of a smart phone, filtering the driving noises of the vehicle by using a self-adaptive subband spectral entropy method, extracting acoustic characteristics of the noise-removed sounds and sending the characteristics to a trained neural network, judging whether respiratory symptoms such as cough, sneeze and nose inhalation exist in the collected sounds, and recording the times of the relevant respiratory symptoms. The method is particularly suitable for the driving environment with stable driving noise and short distance between the driver and the passenger in the small automobile.
The purpose of the invention is realized by the following technical scheme:
a respiratory tract symptom detection method based on smart phone raised audio perception in a driving environment comprises the following steps:
step 1: the method comprises the steps of collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment by using a microphone of a smart phone, and filtering automobile driving noise in the collected sound signals based on an adaptive subband spectral entropy denoising method, namely an ABSE denoising method.
Specifically, the implementation method of step 1 is as follows:
step 1.1: the smart phone is placed in a vehicle to collect sound signals of three behaviors of coughing, sneezing and nose sucking of different drivers and passengers.
Step 1.2: dividing each sound signal collected in the step 1.1 into sub-segments with the same length, selecting n sub-segment sound signals (such as 2 to 10 segments) of the beginning part to perform Fast Fourier Transform (FFT), then calculating the average energy spectrum of the sub-segment sounds, and initializing the threshold value of ABSE.
Threshold T of ABSEs=μθ+α·σθ(ii) a Wherein the content of the first and second substances,
Figure BDA0002760582430000021
Figure BDA0002760582430000022
Hb(l) Is the ABSE value of the l sub-segment; and alpha represents a weight value and is selected according to an experimental result.
Step 1.3: the ABSE value of the sound signal of the next sub-segment is calculated and compared with the threshold value obtained in step 1.2. And if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing Inverse Fast Fourier Transform (IFFT) to obtain a denoised sound signal of the sub-segment sound. And if the ABSE value of the sub-segment sound does not exceed the threshold value, updating the average energy spectrum according to the energy spectrum of the sub-segment sound.
Step 1.4: and (4) repeating the step 1.3 until all the sound signals are denoised. And filtering the denoised sound signals by a high-pass filter to remove signals in a low frequency band, taking out sound segments containing cough, sneeze and nose inhalation sound segments in the filtered sound signals, cutting the sound segments into different signal frames, wherein each signal frame contains a respiratory tract symptom, and marking the signal frames by corresponding behaviors.
Step 2: and (3) extracting mixed acoustic features based on Mel cepstrum coefficient (MFCC) and gamma cepstrum coefficient (GFCC) of each frame from the denoised and marked signal frames obtained in the step 1, and training a classifier based on a long-time memory (LSTM) neural network by using the features.
Specifically, the implementation method of step 2 is as follows:
step 2.1: dividing each signal frame containing the respiratory tract symptom obtained in the step 1 into sub-frames with the same length, calculating 12-dimensional MFCC features of each sub-frame, and splicing the first 10-dimensional MFCC features of each sub-frame into an MFCC feature vector of the frame.
Step 2.2: dividing each signal frame containing respiratory tract symptoms obtained in the step 1 into subframes with the same length, calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into the GFCC characteristic vector of the frame.
Step 2.3: splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain the classifier of the three respiratory symptom sounds in the driving environment.
And step 3: in practical application, a microphone of the smart phone in the vehicle is used for continuously collecting sound signals in the vehicle. And (3) removing the automobile driving noise from the collected sound signals by using the method in the step (1.2), and segmenting and complementing the denoised sound signals to enable each section of sound signals to be equal-length signal frames. And then, extracting the acoustic characteristics of each signal frame by using the method in the step 2.2, and sending the characteristics to a trained classifier for judgment. Once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.
Specifically, the implementation method of step 3 is as follows:
step 3.1: the speaker sampling rate of the user's handset is set to 48kHz, and the handset microphone continues to receive the sound signal in the car.
Step 3.2: for the sound signals collected in step 3.1, the driving noise in the collected sound signals is removed by using the methods of steps 1.2 and 1.3, and the sound sub-segments with the ABSE value exceeding the threshold value are selected. If the total duration of a plurality of sound sub-segments exceeding the threshold exceeds the time threshold T _1, the sub-segments are divided into overlapped sub-frames with fixed length. If the total duration of a number of consecutive sound sub-segments exceeding the threshold is less than a further time threshold T _2, the sub-segment sum is discarded. If the total time length of a plurality of sound sub-sections exceeding the threshold value is more than T _2 and less than T _1, the sub-sections are expanded and the length is a fixed frame length. Each frame is filtered through a high pass filter.
Step 3.3: for each fixed-length filtered frame obtained in step 3.2, the MFCC feature vector of the frame is calculated in step 2.1, then the GFCC feature vector of the frame is calculated in step 2.2, the two vectors are spliced into a mixed feature vector of the frame, and then the mixed feature vector is sent to a trained LSTM network for classification, so as to determine whether the frame contains cough, sneeze or nose sucking behavior.
Advantageous effects
1. Compared with the prior art, the method can realize the detection of the respiratory symptoms of the driver and the passengers only by continuously receiving the sound signals in the driving environment through the microphone in the smart phone. Therefore, the invention does not depend on various pre-erected professional medical equipment, has low cost, strong anti-interference performance and no privacy leakage problem, and is suitable for the detection environment with stable driving noise and closer distance between the driver and the passenger.
2. Aiming at the difference of the characteristics of sound signals of typical respiratory symptoms and driving noises, the invention adopts a denoising method based on the self-adaptive subband spectral entropy to eliminate the influence of various driving noises, so that the system has stronger robustness to environmental noises.
3. The method extracts the mixed acoustic features aiming at different sound signal features of the three typical respiratory symptoms, and accurately and efficiently realizes the detection and classification of the three typical respiratory symptoms by combining the neural network and the deep learning technology.
Drawings
FIG. 1 is a schematic diagram of the method of the present invention.
FIG. 2 shows the accuracy of different methods for detecting respiratory symptoms according to embodiments of the present invention.
FIG. 3 is a confusion matrix for different airway symptom detections according to an embodiment of the present invention.
FIG. 4 shows recall rates of different respiratory symptoms in different scenarios according to embodiments of the present invention.
Detailed Description
The method of the present invention will be described in further detail with reference to the following examples and the accompanying drawings.
As shown in fig. 1, a respiratory tract symptom detection method based on smartphone audio perception in a driving environment includes the following steps:
step 1: a microphone of the smart phone is used for collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment, and a denoising method based on adaptive subband spectral entropy (ABSE) is designed for filtering automobile driving noise in the collected sound signals.
Step 1.1: 16 volunteers were recruited as drivers or passengers to drive or ride the test vehicles, the volunteers placed the smart phone in the vehicle and collected the sound signals of the three behaviors of coughing, sneezing and nose sucking during the driving of the vehicle.
Step 1.2: dividing each sound signal collected in the step 1.1 into non-overlapping sub-segments with the length of 0.2 second, taking the sound signals of the first 10 sub-segments, calculating the average energy spectrum E of the sound of the sub-segments after Fast Fourier Transform (FFT), and initializing the threshold T of ABSEs=μθ+α·σθWherein
Figure BDA0002760582430000051
Figure BDA0002760582430000052
Hb(l) Is the ABSE value of the/sub-segment. The weight α is 0.1.
Step 1.3: the ABSE value of the sound signal of the next sub-segment is calculated and compared with the threshold value obtained in step 1.2. If it isAnd if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing Inverse Fast Fourier Transform (IFFT) on the subtracted signal to obtain the sound signal of the sub-segment sound after denoising. If the ABSE value of the sub-segment sound does not exceed the threshold, updating the average energy spectrum, i.e. E, according to the energy spectrum of the sub-segment soundnew=0.7E+0.3EcurrentIn which EcurrentIs the energy spectrum of the current sub-segment.
Step 2: collecting audio signals generated when the gasoline automobile runs, and training a classifier based on a long-time and short-time memory neural network (LSTM).
Step 2.1: and (3) dividing each frame containing one respiratory symptom obtained in the step 1 into subframes with the length of 0.07 second, wherein an overlapping area with the length of 0.03 second exists between two adjacent subframes. And calculating 12-dimensional MFCC features of each sub-frame, and splicing the first 10-dimensional MFCC features of each sub-frame into a 120-dimensional MFCC feature vector of the frame.
Step 2.2: and (3) dividing each frame containing one respiratory symptom obtained in the step 1 into subframes with the length of 0.07 second, wherein an overlapping area with the length of 0.03 second exists between two adjacent subframes. And calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into a GFCC characteristic vector of 240 dimensions of the frame.
Step 2.3: and splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a 360-dimensional mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain the classifiers of the three respiratory symptom sounds in the driving environment. The LSTM network comprises 2 LSTM layers and 1 full-connection layer, Tanh is used as an activation function, a batch normalization layer is added behind each LSTM layer, and a cross entropy cost function is used as a loss function. The timestamp value of the LSTM network is set to 6, i.e. each time the input is the eigenvector of the current subframe and the eigenvector of the 5 subframes before the current subframe. For the tth timeout, the LSTM layer uses the formula ht=δ(W0[ht-1,xt+b0])·tanh(St) Will input xtMapping to a compressed vector htWherein W is0And b0Respectively representing a weight matrix and an offset vector, StRepresents the state of the tth timer, ht-1Represents the compressed vector corresponding to the previous timestamp, and δ () represents the activation function. After training, three classifiers of typical respiratory symptoms are obtained.
And step 3: in practical application, a microphone of the smart phone in the vehicle continuously collects sound signals in the vehicle. And (3) removing the automobile driving noise from the collected sound signals by using the method in the step (1.2), and segmenting and complementing the noise-removed sound signals to enable each section of sound signals to be frames with equal length. And then, extracting the acoustic features of each frame by using the method in the step 2.2, and sending the features into a trained classifier for judgment. Once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.
Step 3.1: in practical applications, the speaker sampling rate of the user's smartphone is set to 44.1kHz, and the smartphone microphone continuously receives sound signals from the vehicle interior.
Step 3.2: for the sound signals collected in step 3.1, the driving noise in the collected sound signals is removed by using the methods of steps 1.2 and 1.3, and the sound sub-segments with the ABSE value exceeding the threshold value are selected. Recording the total time length of a plurality of continuous sound subsegments exceeding a threshold as d, and if d is more than 0.4 second, dividing the subsegment into subframes with the length of 0.4 second and the length of an overlapping area of 0.2 second; if d <0.2 seconds, discarding the sub-field sum; if 0.2< d <0.4, 1/2(0.4-d) seconds long sound signal is added to the sub-segment sum forward and backward, respectively, to be a frame of length 0.4 seconds. Each frame is passed through a high pass filter to filter out sounds below 800 Hz.
Step 3.3: for each fixed-length filtered frame obtained in step 3.2, the 120-dimensional MFCC feature vector of the frame is calculated in step 2.1, then the 240-dimensional GFCC feature vector of the frame is calculated in step 2.2, the two vectors are spliced into a 360-dimensional hybrid feature vector of the frame, and then the 360-dimensional hybrid feature vector is sent to a trained LSTM network for classification, so that whether the frame contains cough, sneeze or nose sucking behavior is judged.
Examples
In order to test the performance of the method, the method is compiled into an android application program which is deployed in android mobile phones of different models. And 16 volunteers were recruited as drivers and passengers, respectively, driving and riding the test vehicle in different real scenarios.
First, the overall accuracy of the method in a driving environment was tested. Figure 2 shows the overall accuracy of this method and two other methods of detecting respiratory symptoms (SymDetector and CoughSense). As can be seen from the figure, the overall accuracy of the method for detecting three typical respiratory symptoms is 93.91%, while the overall accuracy of the other two methods is only 70.55% and 67.64%, which fully indicates that the method has higher accuracy under the driving environment.
The accuracy of three typical respiratory symptom classifiers based on LSTM were then tested. Fig. 3 shows the confusion matrix for the classifier. As can be seen from the figure, the recognition accuracy of each respiratory symptom is over 93.64 percent, and the average recognition accuracy is 95.52 percent. Very little data is classified into wrong categories because some respiratory symptoms with small sound are easily classified into other categories when the smart phone is far away from the user, and the method is embodied with high accuracy.
And finally, testing the detection accuracy of the method under different driving scenes. FIG. 4 shows the recall rates detected for each type of respiratory symptom in city streets, highways, country roads and parking lots, from which it can be seen that the parking lots are most quiet and therefore the recall rates detected for the three types of respiratory symptoms in the area are highest; the driving noise on the expressway is large, and the unevenness of the country road easily causes the vehicle to bump, so the detection recall rate of the three respiratory symptoms in the two areas is slightly low. However, the recall rate of the detection of the three respiratory symptoms is not lower than 88.37% in all scenes, and the universality of the invention is high.
The above-described embodiments are further illustrative of the present invention and are not intended to limit the scope of the invention, which is to be accorded the widest scope consistent with the principles and spirit of the present invention.

Claims (3)

1. Respiratory tract symptom detection method based on smart phone raised audio perception under driving environment is characterized by comprising the following steps:
step 1: collecting sound signals of coughing, sneezing and nose sucking of different drivers and passengers in a driving environment by using a microphone of a smart phone, and filtering automobile driving noise in the collected sound signals based on an adaptive subband spectral entropy (ABSE) denoising method;
step 1.1: placing the smart phone in a vehicle, and collecting sound signals of three behaviors of coughing, sneezing and nose sucking of different drivers and passengers;
step 1.2: dividing each sound signal collected in the step 1.1 into sub-segments with the same length, selecting n sub-segment sound signals of the beginning part to perform fast Fourier transform, then calculating the average energy spectrum of the sub-segment sound, and initializing the threshold T of ABSEs=μθ+α·σθ
Wherein the content of the first and second substances,
Figure FDA0002760582420000011
Hb(l) Is the ABSE value of the l sub-segment; alpha represents a weight value;
step 1.3: calculating the ABSE value of the sound signal of the next sub-section, and comparing the ABSE value with the threshold value obtained in the step 1.2; if the ABSE value of the sub-segment sound exceeds a threshold value, performing FFT on the sub-segment sound and calculating an energy spectrum, then subtracting the average energy spectrum obtained in the step 1.2 from the energy spectrum of the sub-segment sound, and performing inverse fast Fourier transform to obtain a sound signal of the sub-segment sound after denoising; if the ABSE value of the sub-segment sound does not exceed the threshold value, updating the average energy spectrum according to the energy spectrum of the sub-segment sound;
step 1.4: repeating the step 1.3 until all the sound signals are denoised; filtering the denoised sound signals by a high-pass filter to remove signals in a low frequency range, taking out sound segments containing cough, sneeze and nose inhalation in the filtered sound signals, cutting the sound segments into different signal frames, wherein each signal frame contains a respiratory tract symptom, and marking the signal frames by corresponding behaviors;
step 2: for the denoised and marked signal frames obtained in the step 1, extracting mixed acoustic features of each frame based on a Mel cepstrum coefficient MFCC and a gamma cepstrum coefficient GFCC, and training a classifier based on a long-time and short-time memory LSTM neural network by using the features;
and step 3: in practical application, a microphone of a smart phone in a vehicle is used for continuously collecting sound signals in the vehicle; removing the automobile driving noise from the collected sound signals by using the method in the step 1, and segmenting and complementing the noise-removed sound signals to enable each section of sound signals to be equal-length signal frames; then, extracting the acoustic characteristics of each signal frame by using the method in the step 2, and sending the characteristics into a trained classifier for judgment; once the classifier determines a cough, sneeze or nose-inhale behavior, the corresponding respiratory symptoms are recorded and the cumulative number of occurrences is recorded.
2. The respiratory symptom detection method based on smart phone speaker audio perception in the driving environment according to claim 1, wherein the step 2 comprises the following steps:
step 2.1: dividing each signal frame containing a respiratory tract symptom signal obtained in the step 1 into sub-frames with the same length, calculating 12-dimensional MFCC characteristics of each sub-frame, and splicing the first 10-dimensional MFCC characteristics of each sub-frame into an MFCC characteristic vector of the frame;
step 2.2: dividing each signal frame containing a respiratory tract symptom obtained in the step 1 into subframes with the same length, calculating the GFCC characteristics of 31 dimensions of each subframe, and splicing the GFCC characteristics of the first 20 dimensions of each subframe into GFCC characteristic vectors of the frame;
step 2.3: splicing the MFCC vector obtained in the step 2.1 and the GFCC vector obtained in the step 2.2 into a mixed feature vector, and then sending the mixed feature vector into a 3-layer LSTM network for training to obtain classifiers of three respiratory symptom sounds in a driving environment;
the LSTM network comprises 2 LSTM layers and 1 full-connection layer, Tanh is used as an activation function, a batch normalization layer is added behind each LSTM layer, and a cross entropy cost function is used as a loss function; the timestamp value of the LSTM network is set to be 6, namely, the input of each time is the feature vector of the current subframe and the feature vectors of 5 subframes before the current subframe; for the tth timeout, the LSTM layer utilizes ht=δ(W0[ht-1,xt+b0])·tanh(St) Will input xtMapping to a compressed vector htWherein W is0And b0Respectively representing a weight matrix and an offset vector, StRepresents the state of the tth timestamp, δ () represents the activation function; h ist-1Representing the compressed vector corresponding to the previous timestamp.
3. The respiratory symptom detection method based on smart phone speaker audio perception in the driving environment according to claim 1, wherein the step 3 comprises the following steps:
step 3.1: continuously receiving sound signals in the car by using a microphone of a mobile phone of a user;
step 3.2: for the sound signals collected in the step 3.1, firstly removing the driving noise in the collected sound signals, and selecting sound sub-segments with the ABSE value exceeding a threshold value; if the total duration of a plurality of sound sub-segments exceeding the threshold exceeds a time threshold T _1, dividing the sub-segments into overlapped sub-frames with fixed length; if the total duration of a plurality of sound sub-segments exceeding the threshold is less than another time threshold T _2, discarding the sub-segment sum; if the total duration of a plurality of sound sub-segments exceeding the threshold is greater than T _2 and less than T _1, expanding the sub-segments and the length to be a fixed frame length; each frame is then filtered through a high pass filter.
Step 3.3: and (3) calculating the MFCC characteristic vector of each signal frame with fixed length obtained in the step (3.2), then calculating the GFCC characteristic vector of the frame, splicing the two vectors into a mixed characteristic vector of the frame, sending the mixed characteristic vector into a trained LSTM network for classification, and judging whether the frame contains cough, sneeze or nose sucking behaviors.
CN202011216514.2A 2020-11-04 2020-11-04 Respiratory tract symptom detection method based on smart phone audio perception in driving environment Pending CN112309423A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011216514.2A CN112309423A (en) 2020-11-04 2020-11-04 Respiratory tract symptom detection method based on smart phone audio perception in driving environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011216514.2A CN112309423A (en) 2020-11-04 2020-11-04 Respiratory tract symptom detection method based on smart phone audio perception in driving environment

Publications (1)

Publication Number Publication Date
CN112309423A true CN112309423A (en) 2021-02-02

Family

ID=74325622

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011216514.2A Pending CN112309423A (en) 2020-11-04 2020-11-04 Respiratory tract symptom detection method based on smart phone audio perception in driving environment

Country Status (1)

Country Link
CN (1) CN112309423A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951267A (en) * 2021-02-23 2021-06-11 恒大新能源汽车投资控股集团有限公司 Passenger health monitoring method and vehicle-mounted terminal
JP7291319B2 (en) 2021-07-27 2023-06-15 上海交通大学医学院付属第九人民医院 Evaluation method and apparatus for difficult airway based on speech technique by machine learning

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413113A (en) * 2013-01-15 2013-11-27 上海大学 Intelligent emotional interaction method for service robot
US20160210988A1 (en) * 2015-01-19 2016-07-21 Korea Institute Of Science And Technology Device and method for sound classification in real time
CN110383375A (en) * 2017-02-01 2019-10-25 瑞爱普健康有限公司 Method and apparatus for the cough in detection noise background environment
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110719553A (en) * 2018-07-13 2020-01-21 国际商业机器公司 Smart speaker system with cognitive sound analysis and response
CN110853620A (en) * 2018-07-25 2020-02-28 音频分析有限公司 Sound detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413113A (en) * 2013-01-15 2013-11-27 上海大学 Intelligent emotional interaction method for service robot
US20160210988A1 (en) * 2015-01-19 2016-07-21 Korea Institute Of Science And Technology Device and method for sound classification in real time
CN110383375A (en) * 2017-02-01 2019-10-25 瑞爱普健康有限公司 Method and apparatus for the cough in detection noise background environment
CN110719553A (en) * 2018-07-13 2020-01-21 国际商业机器公司 Smart speaker system with cognitive sound analysis and response
CN110853620A (en) * 2018-07-25 2020-02-28 音频分析有限公司 Sound detection
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张科等: "基于融合特征以及卷积神经网络的环境声音分类系统研究", 《西北工业大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112951267A (en) * 2021-02-23 2021-06-11 恒大新能源汽车投资控股集团有限公司 Passenger health monitoring method and vehicle-mounted terminal
JP7291319B2 (en) 2021-07-27 2023-06-15 上海交通大学医学院付属第九人民医院 Evaluation method and apparatus for difficult airway based on speech technique by machine learning

Similar Documents

Publication Publication Date Title
CN112309423A (en) Respiratory tract symptom detection method based on smart phone audio perception in driving environment
CN104916289A (en) Quick acoustic event detection method under vehicle-driving noise environment
CN102499699B (en) Vehicle-mounted embedded-type road rage driving state detection device based on brain electrical signal and method
Vij et al. Smartphone based traffic state detection using acoustic analysis and crowdsourcing
CN109816987B (en) Electronic police law enforcement snapshot system for automobile whistling and snapshot method thereof
CN110600054B (en) Sound scene classification method based on network model fusion
CN111261189B (en) Vehicle sound signal feature extraction method
CN107179119A (en) The method and apparatus of sound detection information and the vehicle including the device are provided
CN109949823A (en) A kind of interior abnormal sound recognition methods based on DWPT-MFCC and GMM
CN106409298A (en) Identification method of sound rerecording attack
CN109965889B (en) Fatigue driving detection method by using smart phone loudspeaker and microphone
CN110880328B (en) Arrival reminding method, device, terminal and storage medium
CN109741609B (en) Motor vehicle whistling monitoring method based on microphone array
CN109009125A (en) Driver&#39;s fine granularity monitoring of respiration method and system based on audio frequency of mobile terminal
Lee et al. Acoustic hazard detection for pedestrians with obscured hearing
CN113793624B (en) Acoustic scene classification method
CN115052761B (en) Method and device for detecting tire abnormality
Evans Automated vehicle detection and classification using acoustic and seismic signals
Kubo et al. Design of ultra low power vehicle detector utilizing discrete wavelet transform
Qi et al. A low-cost driver and passenger activity detection system based on deep learning and multiple sensor fusion
CN206671813U (en) Pure electric or hybrid pedestrian caution sound control system
CN112230208B (en) Automobile running speed detection method based on smart phone audio perception
Sobreira-Seoane et al. Automatic classification of traffic noise
CN109389994A (en) Identification of sound source method and device for intelligent transportation system
CN110956977A (en) Real-time positioning system and method for automobile whistling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210202

RJ01 Rejection of invention patent application after publication