CN111104834A

CN111104834A - Application method of cross-contrast neural network in intelligent detection of heart sound

Info

Publication number: CN111104834A
Application number: CN201811281412.1A
Authority: CN
Inventors: 朱昊川; 陈颖; 葛云; 黄晓林
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2018-10-25
Filing date: 2018-10-25
Publication date: 2020-05-05

Abstract

The invention discloses an application method of a cross-contrast neural network in intelligent detection of heart sound. The invention combines the deep neural network and the IBS theory, and provides a new network architecture, namely a cross-contrast neural network. The new network architecture utilizes a convolution filter to extract the texture features of the sound frequency spectrum, and then calculates the similarity of the features of the two images in a comparison mode. Experiments on the phonocardiogram database show that the method can improve the classification accuracy and simultaneously give intuitive statistical explanation.

Description

Application method of cross-contrast neural network in intelligent detection of heart sound

Technical Field

The invention relates to application of statistical analysis and deep learning in intelligent detection of heart sounds.

Background

In 2012, the AlexNet network makes a historical breakthrough in ILSVRC, and the effect greatly surpasses that of the traditional image classification method, which is the first time that the deep neural network is used for large-scale image classification. After the AlexNet network, a series of convolutional neural network models emerged, constantly refreshing the performance on the ImageNet dataset. The deep learning and the neural network are rapidly developed in signal analysis, particularly image analysis, and the accuracy of analysis is greatly improved.

The application of neural networks to medical images presents two major problems. On the one hand, the neural network structure is very complex, and the phenomenon of overfitting is easy to occur on a small sample set, especially on a medical sample. On the other hand, the task of classifying medical images is not only concerned with improving accuracy, but also requires enhancing interpretability. Unlike many classification/segmentation tasks in the field of computer vision, medical computer-aided applications often require approval of clinical knowledge for large-scale use. Based on this, we propose a new network architecture, cross-contrast neural network (CCNN). The new network architecture utilizes a convolution filter to extract the texture features of the sound frequency spectrum, and then calculates the similarity of the features of the two images in a comparison mode. This method on the one hand promotes the use of the sample and on the other hand explains in part from a statistical point of view why we can judge the class of an image. Experiments on the phonocardiogram database show that the method can improve the classification accuracy and simultaneously give intuitive statistical explanation.

Disclosure of Invention

In view of the above disadvantages of the prior art methods, the present invention is directed to a new network architecture, cross-contrast neural network (CCNN). The new network architecture utilizes a convolution filter to extract the texture features of the sound frequency spectrum, and then calculates the similarity of the features of the two images in a comparison mode. Firstly, the heart sound signal is converted into an image and then input into a subsequent network for processing. There are many ways to convert the collected heart sound signals into images, and the scheme adopts a way of converting the heart sound into the images through short-time Fourier transform.

The network architecture has two parts:

the first part is the architectural part of the network, divided into a four-layer structure. Each layer is composed of a convolutional layer composed of several filters. In the second part, image features obtained by a k-th layer network are extracted for training, and the method comprises the following steps:

step 1, carrying out probability statistics on the extracted features;

step 2, obtaining the standardized MIBS value by using the modified IBS formula, combining the MIBS value with the label, and calculating

A loss value;

and 3, optimizing the loss value, and performing model training until the training is finished.

The invention has the beneficial effects that: the deep neural network is combined with the statistical analysis IBS theory, images are input into the network pairwise in a combined mode, the traditional IBS theory is modified and added into network judgment as priori knowledge, and the problems that the sample size is small, accuracy is difficult to guarantee and statistical characteristics cannot be searched are solved. This method on the one hand promotes the use of the sample and on the other hand explains in part from a statistical point of view why we can judge the class of an image. Experiments on the phonocardiogram database show that the method can improve the classification accuracy and simultaneously give intuitive statistical explanation.

Drawings

FIG. 1 is a network architecture diagram of the method of the present invention

FIG. 2 is a diagram of a convolutional neural network model

FIG. 3 is a flow chart of a model algorithm

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and detailed description.

The image multi-task classification method based on the cross-contrast neural network is applied to the intelligent detection of the heart sounds. As shown in fig. 1, a network configuration of the present invention is configured such that a captured heart sound signal is first converted into an image, and the following short-time fourier transform method, for example, can be used. The sound signals are converted into normalized time-frequency graphs through short-time Fourier transform, and the images are subjected to feature extraction through a cross-contrast neural network. Response to different inputs. The specific formula is as follows:

(for example, parameters can be used such that fs is 2 kHZ, window size is 512, Hamming distance is 256)

As shown in fig. 2, the convolutional neural network structure is adopted, and it is assumed that we extract the convolution result of the ith layer, and the number of filters corresponding to this layer is t. Each filter can be seen as a pattern of some texture and the result of the convolution represents the response of the input image over the t texture patterns. In order to measure the response strength of the input image to the t filters, a threshold value is set, and the number of pixel points of which the pixel point values are larger than the threshold value on the t characteristic images is counted to serve as extracted texture characteristic values. And comparing the pixel value of each pixel point in the t images with the threshold, wherein the threshold is 1 when the pixel value is larger than or equal to the threshold, and otherwise, the threshold is 0. Thus, the number of 1 in each of the t images is obtained, and a series of probabilities w are obtained_i(i＝1，2，3，....，N)。

In at least one embodiment of the invention, the set is made up of a number of images of certain categories selected from a database, where the number of images of each category in the database may be slightly different but not biased too much, otherwise the trained network may be biased.

As shown in fig. 3, which is a flow chart of the algorithm, we use the image features output from any layer as the input of the subsequent IBS formula processing stage. For example, we can take the output of the i-th layer, and for a set of inputs (two time-frequency graphs of heart sounds), we set a threshold, and can reduce the feature map of the 4-th convolutional layer to 2 feature vectors of 1 × t. The magnitude of the feature vector represents the magnitude of the response of the input image to the t filters. Therefore, how to measure the similarity of the textures of the two images can be converted into measuring the distance between the two vectors. Since we do not care about the magnitude of the values, but only about whether the response strength of the two images to the t textures has obvious rank correlation, we do not choose a sensitive method of euclidean distance/manhattan distance and other values here. We used a modified IBS model to measure the energy required to carry the break point to the 45 degree line. We update the parameters of the entire network with the cross entropy between IBS values and the real label as a loss function.

Next we will explain in detail the principle of the M-IBS model of similarity measure. Assuming that the two images belong to different classes, they should have different texture distributions; belonging to the same class, they should have similar texture characteristics. This is statistically represented as a significant correlation between the 1 x t-dimensional vectors extracted by the filter, and we therefore use IBS to measure this correlation, given two sets of extracted 1 x t-dimensional texture feature vectors (e.g., taking the fourth layer results in the following feature vectors):

we define the calculation of the NorIBS value as follows:

wherein

w_kDenotes the convolution result, p, of the 4-1 layer filter k_i(w_k) The ratio of the number of pixels representing the output of the ith (i ═ 1, 2) image in a set of inputs (two MR images) on the kth filter of 4-1 layers is greater than a given threshold. It can be seen that the corresponding kth filter represents the probability that the texture appears on the original. R_i(w_k) Indicating that for image i, the output of the k-th filter is sorted in size among 512 values. F (w)_k) Showing the return of the k-th filter on both imagesThe normalized entropy. N represents the number of filters, and in this embodiment, N is 512. If the time-frequency graphs of two heart sounds belong to the same class, the ordering values of the texture features should be close, and the NorIBS value tends to be 0. If the feature vectors of 2 x 512 dimensions are plotted as scatter plots, the points should be distributed around the 45 degree line. On the contrary, if the time-frequency graphs of the two heart sounds do not belong to the same class, the NorIBS value is greater than 0, and if the filters with larger entropies have ordering differences, the NorIBS value is increased more greatly. The point distribution reflected on the scatter diagram is very scattered in a two-dimensional space and can be traced without regularity. For two time-frequency graphs of heart sounds we have given labels 0 or 1 when feeding them into the network, with label 0 representing the group of inputs originating from the same liver fibrosis stage and label 1 representing from different liver fibrosis stages. Since we want to conduct the cross entropy between the true label and the IBS predictor back as loss to compute the gradient update network, and the upper limit of IBS is not strictly 1, we modify the original IBS model. The specific method comprises the following steps: after calculating NorIBS, we chose to randomly shuffle the probability values. We define the distance after scrambling as RevIBS:

RevIBS＝NorIBS(random(p₁(w_k))，ramdom(p₂(w_k))) (6)

α is a learning factor, which can be set to 2-4.

For images belonging to the same class, NorIBS is close to 0, and its disturbed RevIBS value is much larger than NorIBS, this difference is further amplified by exponential operation, and ModIBS tends to 0. For the MR images of different stages, the original distribution of the MR images shows a scattered distribution, the distribution before and after the scattering does not change greatly, and the difference between the original distribution and the scattered distribution tends to 0, and the ModIBS tends to 1. So far, we have converted ModIBS to values between 0 and 1. We use cross entropy as a loss function and use a batch gradient training method to modify the model parameters.

loss＝(1-label)·log(1-IBS)+label·log(IBS) (8)

For a test picture to be staged, selecting partial samples of each category in a training set and the picture to be tested to form an input pair, calculating ModIBS of the partial samples and the picture to be tested, and selecting the category of the training set sample with the highest similarity as a prediction result.

The main innovation of this new network architecture is:

1) the new network architecture utilizes a filter to extract the texture features of the sound frequency spectrum, and calculates the similarity of the features of the two images in a comparison mode.

2) Experiments show that the method can improve the classification accuracy and simultaneously give visual statistical explanation.

It should be understood that although the present description refers to embodiments, not every embodiment contains only a single technical solution, and such description is for clarity only, and those skilled in the art should make the description as a whole, and the technical solutions in the embodiments can also be combined appropriately to form other embodiments understood by those skilled in the art.

The above-listed detailed description is only a specific description of a possible embodiment of the present invention, and they are not intended to limit the scope of the present invention, and equivalent embodiments or modifications made without departing from the technical spirit of the present invention should be included in the scope of the present invention.

Claims

1. The invention discloses a technology of an application method of a cross-contrast neural network in the intelligent detection of heart sound, which is characterized in that the cross-contrast neural network and an IBS formula are applied, trained and tested.

2. The technology of the application method of the cross-contrast neural network in the intelligent detection of the heart sound as claimed in claim 1, is characterized in that collected heart sound signals are converted into images, the images are input into the network pairwise in a combined mode, the traditional IBS theory is modified and added into network judgment as priori knowledge, and the problems that the sample size is small, the accuracy is difficult to guarantee and statistical characteristics cannot be found are solved.

3. The technique of the application method of the cross-contrast neural network in the intelligent detection of the heart sound as claimed in claim 1, wherein the cross-contrast neural network model trained by the image is utilized, the image features output from any layer can be used as the input of the subsequent IBS formula processing stage, the texture features of the sound frequency spectrum are extracted by the filter, and the similarity of the two image features is calculated by means of contrast.

4. The technique of the application method of the cross-contrast neural network in the intelligent detection of the cardioid sound as claimed in claim 1, wherein the classification of an image is determined by comparing a known classification image with each classification image.