CN113920071A

CN113920071A - New coronavirus image identification method based on convolutional neural network algorithm

Info

Publication number: CN113920071A
Application number: CN202111134598.XA
Authority: CN
Inventors: 季文杰
Original assignee: Individual
Current assignee: Individual
Priority date: 2021-09-27
Filing date: 2021-09-27
Publication date: 2022-01-11

Abstract

The invention discloses a new coronavirus image identification method based on a convolutional neural network algorithm, which utilizes three different classifiers: the method comprises the steps that an SVM, a KNN and a CNN are used for simulating and evaluating the proposed COVID-19 detection model, images of COVID-19, SARS and normal persons are collected to verify the model, firstly, effective methods such as preprocessing, region growing segmentation and deep learning classification are used, a COVID-19 early prediction method based on processing is provided, and a mixed feature extraction method of DWT, geometric features and texture features is adopted after segmentation. Then, classifying by adopting a support vector machine and a KNN learning algorithm; in a second scenario, where the CNN algorithm is applied to COVID-19 classification, the method will help medical systems to quickly improve the detection efficiency of COVID-19.

Description

New coronavirus image identification method based on convolutional neural network algorithm

Technical Field

The invention relates to the field of deep learning and new coronavirus image identification, in particular to a new coronavirus image identification method based on a convolutional neural network algorithm.

Background

Coronaviruses comprise a large number of virus species that can cause disease in animals and humans, many of which cause respiratory infections in humans, ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS).

The most common symptoms of covi-19 are fever, fatigue and dry cough, some patients may also experience pain, nasal congestion, chills, sore throat and/or diarrhea, some patients may be infected with the virus without any symptoms, the symptoms of covi-19 are usually mild and gradual, most people (80%) may recover from the disease without special treatment, and about one-sixth of those infected with covi-19 may develop severe symptoms.

Chest image screening has become a common diagnostic tool for pneumonia, and in addition, images play an important role in the quantitative assessment of covi-19 and disease monitoring, on which the areas of covi-19 infection at the initial stage of infection can be distinguished by the Glass nodules (GGO) in the lungs, and the areas at the later stage of infection can be distinguished by lung consolidation, compared to reversing the polymerase chain reaction, several studies have shown that images are more sensitive and effective for covi-19 screening, and chest image imaging is more sensitive to covi-1 detection even without clinical symptoms, and physicians typically diagnose whether a patient's lungs are infected with new crown pneumonia by viewing the patient's image.

Many machine learning algorithms have been used to enhance medical system identification of disease, and biomedical images from different devices, such as X-ray, computed tomography (images), Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET), provide important information to distinguish normal and abnormal patients, extract this information from the images, and then identify disease and infection using machine learning algorithms.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a new coronavirus image identification method based on a convolution neural network algorithm.

The technical scheme adopted by the invention is that three different classifiers are utilized: support Vector Machines (SVMs), k-nearest neighbors (k-NN) and deep learning Convolutional Neural Network (CNN) algorithms to simulate and evaluate the proposed codv-19 detection model, which is validated by collecting images of codv-19, SARS and normal persons, using both conditions and evaluation metrics to complete the method.

In the present invention, a standard data set of patients diagnosed with COVID-19, patients diagnosed with SARS and normal patients is collected, and the method is performed using two scenarios, as follows:

step 1, denoising a COVID-19 image by adopting a Gaussian filter;

step 2, after denoising treatment, dividing the image into two conditions, namely a region of interest (ROI) in the COVID-19 image by using a region growing algorithm in the first condition, extracting significant features by adopting a mixed feature extraction method combining a gray level co-occurrence matrix (GLCM) method, a gradient method and a Discrete Wavelet Transform (DWT) method, and processing the features through a classification algorithm (SVM and k-NN); in the second case, the COVID-19 is classified by using a deep learning algorithm (CNN);

and 3, evaluating the COVID-19 classification and detection model by adopting the evaluation index.

In the invention, in step 1, the Gaussian filtering technology is adopted to remove the distortion and the blur of the image and improve the image quality, and the Gaussian function is calculated by using a formula (1) as follows:

where x is the distance from the origin on the horizontal axis, y is the distance from the origin on the vertical axis, and σ is the standard deviation of the Gaussian distribution.

In the invention, in step 2, texture features are extracted from the COVID-19 image by using a gray level co-occurrence matrix method, and important features are obtained by adopting five statistical analysis methods of mean value, energy, contrast, uniformity and correlation, wherein the five statistical analysis methods are shown in the following formula:

the mean value is used to measure the average intensity of the COVID-19 image pixels,

the histogram of the COVID-19 image pixels is normalized by energy,

contrast is used to measure the local intensity variance of COVID-19 image pixels,

uniformity to measure uniformity of COVID-19 image pixels,

the correlation is used to measure the gray scale correlation distribution,

where x and y are the locations of the pixels in the cell, m and n are the column and row numbers, respectively, μ is the mean number, and σ is the standard deviation.

In the invention, step 2, the gradient is a vector consisting of size and direction, the gradient is calculated by adopting a gradient method, the derivative of the gradient is horizontal and vertical, a Prewitt operator is used for detecting the edge and extracting the vector value, and the operator detects the edge by utilizing the external value of the gray level change of each point around the image pixel;

p is the original image, G_xAnd G_yIs the value of the image gradient, G, detected by the horizontal and vertical edges_xAnd G_yIs calculated as follows:

in the present invention, in step 2, the feature extraction using the discrete wavelet transform method, a wavelet decomposition process, in which 4 subband images (HH, HL, LL, and LH) representing data of each size, the LL subband being an object of extracting features of an image, the subband being an estimation component, and HH, H, and HL taking into account specific elements of the image, features are extracted from the image using the 3-level decomposition of the Harr wavelet.

In the invention, step 2, the model for detecting COVID-19 by three classification methods of COVID-19, SARS and normal people is completed by using SVM algorithm, as shown in the following formula,

wherein

Is the squared Euclidean distance, 2 σ, between two feature vectors²Is a parameter.

In the invention, step 2. non-parameter classification is completed by using a K-NN algorithm, and the algorithm searches the nearest point between the features by using an Euclidean distance method, as follows:

wherein x1-x2, y1-y2 represent Euclidean vectors.

In the invention, step 2, a deep learning algorithm (CNN) is adopted to classify the COVID-19, wherein the CNN consists of four layers, namely an input layer, a convolution layer, a polling layer and a connecting layer, and is used for extracting features from the COVID-19 image, and the operation steps of the CNN are as follows:

an input layer: inputting a COVID-19 image, extracting dataset image tags, which also standardizes data scaling to speed up the training dataset, this layer being used as a pre-processing step to improve accuracy;

and (3) rolling layers: the characteristic non-linearity of convolutional layers is mapped to multiple network layers to obtain better results, this layer uses an activation function called a corrected linear unit (ReLU) which is applied to the output of the previous layer, the ReLU function being defined as the positive part of its parameters, as follows:

f(x)＝x⁺＝max(0，x)；

and a polling layer: this layer improves the training data by reducing the dimensionality and compression of the data, thereby improving the accuracy of the classification, and performs downsampling along the spatial dimension of a given input, thereby reducing the number of parameters in the activation;

connecting layers: this layer calculates the output by computing the weights of the neurons locally associated with the input and the dot product connecting them with a small region of the input volume, the ReLU being applied to the output of the previous layer activation.

In the invention, step 3. the statistical analysis index is used for evaluating the proposed COVID-19 classification and detection model, and the evaluation index (namely, accuracy, specificity, sensitivity and recall rate) is adopted and calculated as follows:

the detected image can be finally determined as TP, TN, FP and FN, wherein TP is true positive, TN is true negative, FP is false positive and FN is false negative.

Drawings

FIG. 1 is a flow chart framework of the present invention

FIG. 2 is a sample of a standard data set of the present invention

FIG. 3 is the Prewitt operator mask of the present invention

FIG. 4 is a flowchart of the wavelet decomposition procedure of the present invention

FIG. 5 is a block diagram of a convolutional neural network of the present invention

Detailed Description

It should be noted that the embodiments and features of the embodiments can be combined with each other without conflict, and the present application will be further described in detail with reference to the drawings and specific embodiments.

As shown in fig. 1, a new coronavirus image identification method based on convolutional neural network algorithm utilizes three different classifiers: support Vector Machine (SVM), k-nearest neighbor (k-NN), and deep learning Convolutional Neural Network (CNN) algorithms to simulate and evaluate the proposed COVID-19 detection model, which is validated by collecting images of COVID-19, SARS, and normal humans, using both conditions and evaluation metrics to complete the method.

step 1, denoising a COVID-19 image by adopting a Gaussian filter;

step 2, after denoising treatment, dividing the image into two conditions, namely a region of interest (ROI) in the COVID-19 image by using a region growing algorithm in the first condition, then extracting significant features by adopting a mixed feature extraction method combining a gray level co-occurrence matrix (GLCM) and a gradient and discrete wavelet method, and treating the features by a classification algorithm (SVM and k-NN); in the second case, the COVID-19 is classified by using a deep learning algorithm (CNN);

The images need to be pre-processed before being classified by the algorithm, and the main purpose of the pre-processing stage is to suppress any distortion in the images, thereby improving the ability to obtain important features. Furthermore, the pre-processing step is very important to improve low contrast and high noise levels. A detailed description of these pre-treatment stages is given below

In order to improve the segmentation quality, a morphological segmentation method is adopted. When the COVID-19 image is segmented, small spots are left that are not skin lesions. These small spots must be removed in order to obtain the proper information from the image. Morphological segmentation, or over-segmentation, avoids confusion between isolated artifacts and objects of interest by preventing the detection of very small non-lesions. Thus, morphological segmentation is used to remove very small objects from the binary COVID-19 image, while preserving the shape and size of larger objects. Fig. 3 shows the various stages of the segmentation method by which the morphologic method extracts and isolates the shapes with the largest area.

The gray level co-occurrence matrix method is a feature extraction method for extracting important features from a biomedical image by utilizing statistical analysis, and the method is used for extracting texture features from a COVID-19 image.

Step 2, extracting texture characteristics from the COVID-19 image by using a gray level co-occurrence matrix method, and obtaining important characteristics by using five statistical analysis methods of mean value, energy, contrast, uniformity and correlation, wherein the five statistical analysis methods are shown in the following formula:

the histogram of the COVID-19 image pixels is normalized by energy,

uniformity to measure uniformity of COVID-19 image pixels,

the correlation is used to measure the gray scale correlation distribution,

where x and y are the locations of the pixels in the cell, m and n are the column and row numbers, respectively, μ is the mean, and σ is the standard deviation.

In the invention, step 2. the gradient is a vector composed of size and direction, the gradient is calculated by a gradient method, the derivative of the gradient is horizontal and vertical, a Prewitt operator is used for detecting edges, and vector values are extracted, as shown in figure 4. The operator detects edges using an external value of the gray scale variation of each point around the image pixel;

in the present invention, step 2. the Discrete Wavelet Transform (DWT) method is more efficient than the wavelet transform in consideration of the dimension and position of the binary. Fig. 5 shows a wavelet decomposition process in which 4 subband images (HH, HL, LL, and LH) represent data of each size. In the invention, the LL subband is the target for extracting image features; the subband is an estimation component, and HH, H, and HL consider a specific element of the image. Features are extracted from the image using a 3-level decomposition of the Harr wavelet.

In the invention, the support vector machine algorithm in the step 2 is a strong and large machine learning algorithm for classification and regression, the support vector machine algorithm is used for dichotomy classification, such as two-class binary classification or multi-class binary classification, and the algorithm is applied to a training data set to obtain a classification model allowing an optimal decision function; it maps non-linear separable data from a low-dimensional space to a high-dimensional space; the SVM algorithm classifies data by finding a hyperplane that distinguishes two classes of data.

When a larger margin is set, the algorithm of the support vector machine can obtain a lower error rate; the invention adopts three classification methods of COVID-19, SARS and normal person to establish a model for detecting COVID-19; wherein the radial basis functions and the kernel functions have a higher accuracy. As shown in the following formula,

wherein

Is the square between two feature vectorsEuclidean distance, 2 σ²Is a parameter.

In the invention, step 2, non-parameter classification is completed by using a K-NN algorithm, wherein the K-NN is a non-parameter classification algorithm, and has lower complexity compared with other classification algorithms, so the algorithm is called as an inert algorithm. The K-NN algorithm classifies data using nearest neighbor features. The algorithm uses the euclidean distance method to find the closest points between features as follows:

wherein x1-x2, y1-y2 represent Euclidean vectors.

In the invention, step 2, a deep learning algorithm (CNN) is adopted to classify the COVID-19 as a part of machine learning, and the deep learning relates to a technology for simulating a human cranial nerve system. Using algorithms, deep learning simulation reasoning and data extraction, which contains hidden layers of mathematical functions for analyzing specific patterns in specific data; deep learning becomes increasingly important in the classification and detection of images and video, and has many uses in both the medical field (for X-ray identification) and in computer vision applications. The application of deep learning in the medical field improves image quality and prediction of different types of diseases. Convolutional Neural Network (CNN) is a neural network, the most common deep learning algorithm; these are feed-forward neural networks, CNN consists of four layers, namely an input layer, a convolutional layer, a polling layer and a connection layer, for extracting features from COVID-19 images, and the operation of CNN is as follows:

f(x)＝x⁺＝max(0，x)；

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims

1. A new coronavirus image identification method based on a convolutional neural network algorithm is characterized in that three different classifiers are utilized: support Vector Machines (SVMs), k-nearest neighbors (k-NN) and deep learning Convolutional Neural Network (CNN) algorithms to simulate and evaluate the proposed codv-19 detection model, which is validated by collecting images of codv-19, SARS and normal humans.

2. The method of claim 1, wherein standard data sets of patients diagnosed with COVID-19, patients diagnosed with SARS and normal patients are collected, and two scenarios are used to perform the method, comprising the following steps:

step 1, denoising a COVID-19 image by adopting a Gaussian filter;

step 2, after denoising treatment, dividing the image into two conditions, wherein in the first condition, a region of interest (ROI) is segmented from the COVID-19 image by using a region growing algorithm, then a mixed feature extraction method combining a gray level co-occurrence matrix (GLCM) method, a gradient method and a Discrete Wavelet Transform (DWT) method is adopted to extract significant features, and the features are processed by a classification algorithm (SVM and k-NN); in the second case, the COVID-19 is classified by using a deep learning algorithm (CNN);

3. The method for identifying the new coronavirus image based on the convolutional neural network algorithm as claimed in claim 2, wherein in step 1, a gaussian filtering technique is adopted to remove distortion and blur of the image and improve the image quality, and a gaussian function is calculated by using a formula (1) as follows:

4. The method for identifying the new coronavirus image based on the convolutional neural network algorithm as claimed in claim 2, wherein in step 2, a gray level co-occurrence matrix method is used for extracting texture features from the COVID-19 image, and five statistical analysis methods of mean value, energy, contrast, uniformity and correlation are adopted to obtain important features, wherein the following formula is shown as follows:

the histogram of the COVID-19 image pixels is normalized by energy,

uniformity to measure uniformity of COVID-19 image pixels,

the correlation is used to measure the gray scale correlation distribution,

where x and y are the positions of the pixels in the cell, m and n are the column and row numbers, respectively, μ is the mean, and σ is the standard deviation.

5. The method of claim 4, wherein the gradient is a vector consisting of magnitude and direction, the gradient is calculated by gradient method, the derivative of which is horizontal and vertical, Prewitt operator is used to detect the edge, and the vector value is extracted, the operator uses the external value of the gray level change of each point around the image pixel to detect the edge;

6. a new coronavirus image recognition method based on convolutional neural network algorithm as set forth in claim 5, characterized in that, in step 2, the feature extraction by discrete wavelet transform method, wavelet decomposition process, in which 4 subband images (HH, HL, LL, and LH) represent data of each size, LL subband is an object of extracting the feature of the image, the subband is an estimation component, and HH, H, and HL extract the feature from the image by 3-level decomposition of Harr wavelet considering the specific elements of the image.

7. The method for recognizing a new coronavirus image based on convolutional neural network algorithm as set forth in claim 6, wherein the step 2. the model for detecting COVID-19 by three classification methods of COVID-19, SARS and normal person is performed by using SVM algorithm, as shown in the following formula,

wherein

8. The method according to claim 7, wherein the K-NN algorithm is used to perform non-parametric classification in step 2, and the Euclidean distance method is used to find the closest point between features, as follows:

wherein x1-x2, y1-y2 represent Euclidean vectors.

9. The method for identifying new coronavirus image based on convolutional neural network algorithm as claimed in claim 8, wherein in step 2, the COVID-19 is classified by using deep learning algorithm (CNN), the CNN is composed of four layers, i.e. input layer, convolutional layer, polling layer and connection layer, for extracting features from the COVID-19 image, and the CNN is operated as follows:

input layer, inputting COVID-19 image, extracting data set image label, it also standardizes data scaling to speed up training data set, this layer is used as preprocessing step to improve precision;

convolutional layer-mapping the characteristic non-linearity of a convolutional layer to a multi-layer network layer to obtain better results, this layer uses an activation function called a corrected linear unit (ReLU) which is applied to the output of the previous layer, the ReLU function being defined as the positive part of its parameters, as follows:

f(x)＝x⁺＝max(0，x)；

a polling layer that improves training data by reducing the dimensionality and compression of the data to improve classification accuracy, and that performs down-sampling along the spatial dimension of a given input to reduce the number of parameters in activation;

connected layer this layer calculates the output by computing the weights of the neurons locally related to the input and the dot product connecting them with a small region of the input volume, the ReLU being applied to the output of the previous layer activation.

10. The method for identifying new coronavirus image based on convolutional neural network algorithm as claimed in claim 2, wherein step 3. the statistical analysis index is used to evaluate the proposed COVID-19 classification and detection model, and the evaluation index (i.e. accuracy, specificity, sensitivity and recall) is calculated as follows: