Disclosure of Invention
The invention provides a pulmonary disease auscultation system based on a convolutional neural network, a signal processing method and equipment for solving the problems, wherein the acquired respiratory sound signals are subjected to preprocessing, data conversion and data enhancement, so that the accuracy of feature extraction and the accuracy of classification results can be improved for small sample data.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, the invention provides a convolutional neural network-based pulmonary disease auscultation system, comprising:
the preprocessing module is configured to preprocess the acquired breath sound signals and normalize the breath sound signals in sequence;
the data conversion module is configured to extract audio features from the normalized data and generate Mel frequency spectrum data;
a data enhancement module configured to perform data amplification on the generated mel-frequency spectrum data by using a data enhancement model;
the convolutional neural network module is configured to perform feature extraction on the amplified Mel frequency spectrum data by using the trained deep learning model;
and the feature classification module is configured to classify the extracted features by using the trained low-difference forest classifier to obtain a classification result.
As a further improvement, the method further comprises the following steps:
a training model module configured to train the convolutional neural network module and the low-diversity forest classifier using a public or autonomously acquired breath sound signal data set to find optimal model parameters.
As an alternative implementation, the low-difference forest classifier takes K decision trees as basic classifiers, and a combined classifier is obtained after ensemble learning, and when a sample to be classified is given, the classification result output by the low-difference forest classifier is voted and determined by the classification result of each decision tree.
As an alternative embodiment, an acquisition device is further included for acquiring and storing the breathing sound signal.
As an alternative embodiment, the deep learning model and the low-dissimilarity forest classifier are cascaded;
as an alternative embodiment, the training process of the low-difference forest classifier includes:
(a) generating a low-difference degree sequence based on a low-difference degree sequence sampling method according to the number of samples of the training data set;
(b) according to the principle of ascending or descending, acquiring the sequence numbers of all elements of the low-difference sequence, and generating a sequence number sequence;
(c) taking samples with set sizes of the training data set as training sets of corresponding decision trees;
(d) setting the number of decision trees of the forest with low diversity;
(e) random integers in the sample number range are used as initial sample indexes of a decision tree;
(f) starting from the random integer, taking a set number of continuous elements from a serial number sequence;
(g) taking out a corresponding number of samples from the input rearranged and deleted T-column data according to the taken-out elements as sample serial numbers to form a training sample of a decision tree;
(h) constructing a decision tree, and training the decision tree by using the training sample;
(i) and (e) repeating the steps (e) - (h), and constructing and training the decision trees according with the number of the decision trees.
As an alternative, the data enhancement model is a variational auto-encoder, and when the auto-encoder is trained, the data information is encoded in the latent space and then decoded to reconstruct the original data.
In an alternative embodiment, the deep learning model is a lightweight convolutional neural network, and includes an input layer, two convolutional layers, a pooling layer, a convolutional layer, a flattening layer, and a fully-connected layer, which are connected in sequence.
The invention provides a signal processing method of a pulmonary disease auscultation system based on a convolutional neural network, which comprises the following steps:
preprocessing the acquired breath sound signals, and sequentially normalizing the breath sound signals;
extracting audio features from the normalized data to generate Mel frequency spectrum data;
amplifying the data of the generated Mel frequency spectrum data by using a data enhancement model;
carrying out feature extraction on the amplified Mel frequency spectrum data by using the trained deep learning model;
and classifying the extracted features by using the trained low-difference forest classifier to obtain a classification result.
A third aspect of the present invention provides a terminal device comprising a memory and a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions, when executed by the processor, performing the steps of:
preprocessing the acquired breath sound signals, and sequentially normalizing the breath sound signals;
extracting audio features from the normalized data to generate Mel frequency spectrum data;
amplifying the data of the generated Mel frequency spectrum data by using a data enhancement model;
carrying out feature extraction on the amplified Mel frequency spectrum data by using the trained deep learning model;
and classifying the extracted features by using the trained low-difference forest classifier to obtain a classification result.
Compared with the prior art, the invention has the beneficial effects that:
the invention uses breath sound as the diagnosis basis, can realize accurate characteristic classification aiming at small samples, has small system operation amount, uses a lightweight class classification model with lightweight class and high accuracy, can realize deployment on low-cost embedded computer equipment, can realize remote diagnosis and treatment, and realizes artificial intelligent medical service in remote or underdeveloped areas.
The invention adopts the Mel frequency spectrum analysis method in the aspect of feature processing, the extracted features are obviously superior to the results of most low-order feature extraction methods, and the further feature extraction of a lightweight network is facilitated.
The invention uses the deep learning method of fusing the low-difference degree series forest, has very high accuracy and stability while having very low requirements on computing resources, and ensures that the whole model can be deployed on cheap embedded equipment.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
The specific implementation mode is as follows:
the invention is further described with reference to the following figures and examples.
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise. It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The embodiments will be described in detail below with reference to the accompanying drawings.
Example 1
In one or more embodiments, as shown in fig. 1-7, a convolutional neural network-based pulmonary disease auscultation system is provided, comprising:
the preprocessing module is configured to preprocess the acquired breath sound signals and normalize the breath sound signals in sequence;
a data conversion module configured to extract audio features from the normalized data and generate Mel Spectrogram (Mel Spectrogram) data;
a data enhancement module configured to perform data amplification on the generated mel-frequency spectrum data by using a data enhancement model;
the convolutional neural network module is configured to perform feature extraction on the amplified Mel frequency spectrum data by using the trained deep learning model;
and the feature classification module is configured to classify the extracted features by using the trained low-difference forest classifier to obtain a classification result.
In the embodiment, data conversion and data enhancement are performed in the characteristic extraction stage of the data, so that the processing efficiency of the small sample data can be improved, the signal processing efficiency can be improved, the identification failure rate can be reduced, and the adaptability of system application can be improved.
As a further improvement, the method further comprises the following steps:
a training model module configured to train a convolutional neural network module using public or autonomously acquired breath sound signal data sets to find optimal model parameters; or/and, configured to train a low dissimilarity forest classifier from the acquired breath sound signal dataset.
What is achieved is also a data acquisition module, coupled to the breath sound collection device, configured to acquire the breath sound signal.
The system implementation, as shown in fig. 1(a) and fig. 1(b), mainly includes two parts, model training and model deployment. In this embodiment, the system can be deployed in an embedded computer such as a raspberry pi computer, or an english junson Nano computer, or an RK3399 computer, to make a portable respiratory disease diagnosis device. However, in other embodiments, the system may be deployed on other terminal devices, and is not limited to the above-mentioned devices.
In this embodiment, the training part includes using an open data set, and performing model training and verification in a ten-fold cross-validation manner to obtain an optimal model structure and coefficients.
Specifically, in the training model section, the present embodiment uses an open breath sound data set collected by 2017 ICBHI (International Conference on Biomedical and Health information, International Conference on biomedicine and information, hereinafter abbreviated as ICBHI) to perform detection based on the abnormality level and the pathology level, respectively. The sample statistics for the anomaly level are shown in table 1. The sample statistics for the pathology levels are shown in table 2.
TABLE 1
TABLE 2
In table 1 and table 2, the data sets have unbalanced category distribution and fewer samples, and in order to improve the processing effect, the data sets are expanded by the present embodiment as a further improvement. Specifically, a Variational Auto Encoder (VAE) model may be used for data enhancement.
In training an autoencoder, data information is encoded in the underlying space and then decoded to reconstruct the original data.
The latent space is the lattice space, a mathematical space. An autoencoder consists of two components, an encoder and a decoder. The encoder brings the data from the high dimensional input to the bottleneck layer, which has the smallest number of neurons. The decoder then receives this encoded input and converts it back to the original input shape. The potential space is the space where the data is located at the bottleneck level.
As shown in fig. 2, the architecture of the data enhancement network is shown. The sample conditions for each class after enhancement are shown in tables 3 and 4.
TABLE 3
TABLE 4
And training by adopting a ten-fold cross validation method. I.e. samples of the data set are randomly shuffled and then evenly divided into ten. And selecting 1 part of the test set as a test set every time, providing the rest 9 parts of the test set as a training data set for convolutional neural network learning and low-diversity forest model training, and sending the test set into the test set after the training is finished to obtain the accuracy of the test set. And the circulation is performed for ten times. And finally, taking the average accuracy as the prediction accuracy of the model.
In this embodiment, a convolutional neural network and a low-variance forest model form a cascade model, as shown in fig. 3, a flow of model training and testing is shown in fig. 4(a) and 4(b), and a training process is as follows:
1. a training data set D is obtained and the number of samples of this data set is calculated as N.
2. Carrying out normalization processing on the breath sound audio files;
the first sample (i.e. a breath sound audio file) is normalized such that each audio data point is within the interval-1, 1.
3. For extracting audio features from the normalized audio data, optionally, a mel-frequency spectrum analysis method may be used to extract the audio features.
The audio features are extracted by a mel frequency spectrum analysis method, and the following formula can be adopted:
where f represents the frequency of the sound.
Specifically, generation of the mel frequency spectrum is completed by adopting an Libsora development kit. The parameters are shown in table 5 below:
TABLE 5
4. Repeating the steps 2-3 to complete the Mel-frequency spectrum conversion of all samples of the training data set, and storing in a temporary file D1.
The steps 1 to 4 are a preprocessing process of data.
5. Reading in D1, extracting features by using a convolutional neural network, and outputting N × 64 neuron data, wherein the structure and parameters of a convolutional neural network model are shown in the following table 6:
TABLE 6
6. Storing N x 64 neuron data output by the convolutional neural network feature extraction in the last step and the classification labels of the original training set data into a temporary file D2;
7. performing dimensionality reduction on the feature data D2 output by the convolutional neural network;
specifically, all 64 feature columns are reduced to one column T using a dimension reduction method. T is appended to the last column of D2. The dimension reduction method can be PCA (principal component analysis), FA (factor analysis), kPCA (nonlinear principal component analysis), tSVD (truncated singular value decomposition), and the like.
8. Reordering the characteristic data D2 according to the obtained numerical value after the dimension reduction processing to obtain new characteristic data D3;
optionally, all samples of D2 are rearranged in ascending or descending order, depending on the size of the T column values. The T column is then deleted and stored as a new temporary file D3.
The steps 6-8 are performed on the output data of the convolutional neural network before being input into the low-difference forest classifier, so that the data are more suitable for the low-difference forest algorithm, and the data processing efficiency is improved.
The present embodiment uses a lightweight convolutional neural network, as shown in table 3, which includes an input layer, two convolutional layers, a pooling layer, a convolutional layer, a flattening layer, and a fully-connected layer, which are connected in sequence. And then, outputting the result through a full connection layer, and combining a low-diversity forest algorithm to obtain a final result.
In this embodiment, a low-diversity forest classifier model is used as the final classifier. The Forest with low diversity (Best diversity Sequence Forest) is K decision trees
And the combined classifier is obtained after ensemble learning is carried out on the basic classifier. When a sample to be classified is given, the classification result output by the low-diversity forest is simply voted to decide by the classification result of each decision tree. As shown in fig. 3, here
Is a random variable sequence.
Specifically, the low-diversity forest algorithm flow is as follows:
9.1) generating a low diversity order sequence BDS with N elements according to the number of samples N of the training data set D using the following formula:
that is, each element in the low variance sequence is equal to a natural number and a circumferential ratio (i.e., 3.141592653589793238462) or other transcendental number (e.g., 3.141592653589793238462)
Etc.) and the remaining bits (here, 21 bits) after the decimal point. And N is a continuous natural number starting from 1 and is taken as the sample number N of the training data set D. For example, BDS = {0.142, 0.283, 0.425... } (note: only the 3 bits after the decimal point are reserved here for convenience of illustration).
And 9.2) acquiring the sequence numbers of all the elements of the BDS sequence according to the ascending or descending principle to generate a sequence number sequence R. For example, the first number in the low disparity sequence is 0.142, and is ordered in ascending order among the N elements at r1, the second number is 0.283, the third number is ordered at r2, and the third number is 0.425, the third number is ordered at r3. Therefore, the generated low difference degree number series R is [ R1, R2, r3. ].
9.3) as is customary for the usual integration method, each decision tree uses 65% of the samples of the dataset as the training set for the tree, stored as temporary objects d =65% N.
9.4) setting the number K of the low-difference forests including the decision trees, wherein K =100 in this embodiment, that is, 100 decision trees are used to form one low-difference forest.
9.5) generating a random integer x between 1 and N as an initial sample index of a decision tree.
9.6) starting from x, in R generated in step 9.2, d successive elements are taken.
9.7) taking D samples from the D3 data set generated in the step 8 according to the taken elements as sample serial numbers to form a temporary data set as a training sample of a decision tree.
9.8) to construct a decision tree, any one of the decision tree algorithms of ID3, CART, C4.5, and the like can be used. When splitting each node of the decision tree, a feature column subset is extracted randomly from all the features with equal probability, and is usually taken
And (4) selecting an optimal attribute from the subset to split the nodes, wherein m is the total number of the features. Using the temporary data set of the step 9-7) for training, and storing the parameters after the decision tree training into a temporary container P.
9.9) circulating the steps 9.5-9.8, constructing and training K decision trees together, and storing P and the parameters of the lightweight convolutional neural network in the step 7 into an ONNX pre-training model file.
At this point, one round of training is finished, a reserved test data set is used for classification testing, and then the rest 9 rounds of training and testing are carried out in a ten-fold cross validation mode. The experimental result shows that the indexes of the accurate classification rate of the respiratory disease, the F-Score and the like in the model constructed by the embodiment are all over 99 percent.
System deployment follows. By utilizing the electronic stethoscope, a 3.5mm earphone hole electronic stethoscope which is commonly used in the market can be used; or a sound pick-up is added on the head part of the common stethoscope; continuously collecting respiratory sound signals of 6 parts (such as a left chest, a right chest and a right chest) of a human body in a quiet environment, converting the respiratory sound signals into electric signals and storing the electric signals, wherein the electric signals can be stored as a wav file;
the electronic stethoscope is connected to a raspberry pi 4B card computer (or other similar inexpensive embedded computers) and the wav file is normalized by an algorithm. And extracting features of the preprocessed audio signal by using a Mel frequency spectrum method, sending the feature map into a trained cascade model combining a convolutional neural network and a low-diversity forest, and calculating to obtain a final recognition result to complete preliminary self-diagnosis.
As shown in fig. 1(b), the pulmonary auscultation system based on convolutional neural network performs a preliminary self-diagnosis, and the signal processing procedure of the system is as follows:
the data acquisition module acquires a breathing sound signal; such as the waveforms illustrated in fig. 5.
The preprocessing module preprocesses the acquired breath sound signals and normalizes the breath sound signals in sequence; step 2, the normalization method is the same as that in the model training process;
the data conversion module extracts audio features from the normalized data to generate Mel frequency spectrum data; the mel-frequency spectrum data generation method is the same as step 3 in the model training process, and as shown in fig. 6, mel-frequency spectrum data after the extraction of the exemplary waveform of fig. 5 is obtained.
The data enhancement module utilizes a data enhancement model to perform data amplification on the generated Mel frequency spectrum data; a Variational Auto Encoder (VAE) model may be used for data enhancement.
The convolutional neural network module utilizes the trained deep learning model to extract the features of the amplified Mel frequency spectrum data; receiving the data-enhanced mel-frequency spectrum data, inputting the data to a lightweight convolution neural network module arranged in the embodiment for feature extraction, and outputting N × 64 neuron data;
and the feature classification module classifies the extracted features by using the trained low-difference forest classifier to obtain a classification result. The classification algorithm performs steps 9.1-9.9.
Example 2
Based on embodiment 1, this embodiment provides a terminal device, including a memory, a processor, and computer instructions stored in the memory and executed on the processor, where the computer instructions, when executed by the processor, perform the following steps:
preprocessing the acquired breath sound signals, and sequentially normalizing the breath sound signals;
extracting audio features from the normalized data to generate Mel frequency spectrum data;
amplifying the data of the generated Mel frequency spectrum data by using a data enhancement model;
carrying out feature extraction on the amplified Mel frequency spectrum data by using the trained deep learning model;
and classifying the extracted features by using the trained low-difference forest classifier to obtain a classification result.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts by those skilled in the art based on the technical solution of the present invention.