CN111444960A

CN111444960A - Skin disease image classification system based on multi-mode data input

Info

Publication number: CN111444960A
Application number: CN202010222477.XA
Authority: CN
Inventors: 朱平; 覃智威; 刘钊; 凌闻元
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-07-24

Abstract

A dermatologic image classification system based on multi-modal data input, comprising: the invention fully utilizes the text supplementary data matched with the image data and improves the accuracy of the classification of the dermatosis image. The input data is divided into image data and text data, and after image data standardization and text data coding are carried out, an image feature extractor based on a depth migration convolutional neural network and a text feature extractor based on a multilayer perceptron unit are respectively constructed. And inputting the image characteristics and the text characteristics into the comprehensive full-connection layer by using a characteristic fusion method, outputting the class labels of the data, and realizing the classification task of the skin disease images.

Description

Skin disease image classification system based on multi-mode data input

Technical Field

The invention relates to the technology in the field of image processing, in particular to a skin disease image classification system based on multi-mode data input.

Background

In the field of image processing technology, image recognition and classification are common image processing tasks, and are processes for judging specific class labels to which given images belong by using technical means. The skin mirror image of the pigmentation skin lesion has a significant difference from a natural image, and the skin lesion image has differences of an imaging means, an image visual angle, a size and illumination, background noise interference and the like. Meanwhile, the classification task of skin disease images is a very difficult challenge because the skin disease images have a large variety and some images have small difference, and different images of the same disease type may have large difference due to the reasons.

In the field of research of skin disease image classification, a Convolutional Neural Network (CNN) is generally used for feature extraction and classification of images. However, the image classification model based on the conventional convolutional neural network only focuses on single-mode image input of an image data layer, and all features are extracted and mapped based on the image. In the clinical diagnosis process of skin diseases, along with the acquisition of image data of skin diseases, there are often many important text numerical data, and this part of data may come from medical record information, examination reports, records of diagnosis and treatment processes, and the like. Meanwhile, the image classification based on the traditional convolutional neural network can reduce the classification effect when the problems of multiple skin diseases, insignificant difference among individual diseases, unbalanced data of various diseases and the like are faced, and the text numerical data serving as supplementary information is added into the classification process, so that the classification accuracy is improved. Aiming at the problems, how to construct a multi-modal input data classification model capable of efficiently processing image and text values is one of key technologies for fully utilizing various medical data and improving the classification accuracy of skin disease images under the background of big data

Disclosure of Invention

The invention provides a skin disease image classification system based on multi-mode data input, which fully utilizes text supplementary data matched with image data to improve the accuracy of skin disease image classification, divides input data into image class data and text class data, respectively constructs an image feature extractor based on a Deep Transfer Convolutional Neural Network (DTCNN) and a text feature extractor based on a Multilayer Perceptron unit (M L P) after image data standardization and text data coding, and jointly inputs image features and text features to a comprehensive full-link layer by using a feature fusion method to output class labels of data, thereby realizing the classification task of skin disease images.

The invention is realized by the following technical scheme:

the invention relates to a skin disease image classification system based on multi-modal data input, which comprises: the system comprises a data preprocessing module, an image feature extraction module, a text parameter feature extraction module, a feature fusion module and a classification output module, wherein: the data preprocessing module receives multi-modal data comprising image data and text data corresponding to the image data, a pixel value matrix is obtained through preprocessing and is standardized, text parameters in the text data are encoded to obtain a numerical matrix, and a training sample set and a test sample set are generated through the processed data; the deep migration convolutional neural network in the image feature extraction module is trained by a training sample set and then the feature extraction of the image is carried out; the multi-layer perceptron unit in the text parameter characteristic extraction module is trained by a training sample set and then carries out the characteristic extraction of text parameters; the feature fusion module fuses the image features and the text parameter features through a feature fusion method and outputs the fused image features and the text parameter features to the classification output module together, and the comprehensive full-connection layer of the classification output module utilizes a softmax activation function to output the classification result of the image.

And applying the multi-mode input comprehensive classification model established by the training set data to a test set to obtain class labels of each skin disease image of the test set, and comparing the class labels with real labels to obtain the overall accuracy, average sensitivity and average accuracy of classification prediction.

The standardization refers to that: scaling the value X of the pixel value matrix to the interval of 0-1

Wherein: x_normIs a normalized matrix of pixel values, X is an initial matrix of pixel values, mu_XIs the average, σ, of all pixel value matrices_XIs the standard deviation of the matrix of all pixel values.

The coding means: the text data is converted into numerical data by mapping the text information to serial number information.

The ratio of the training sample set to the test sample set is preferably: 8:2.

The deep migration convolutional neural network is as follows: and migrating the models established in other task fields to the target task field by using a migration learning method, so that the target task is quickly and efficiently completed by using the adjusted migration model, and the difficulty of restarting establishing the models is avoided. The method specifically comprises the following steps: selecting a convolutional neural network model pre-trained on a natural image data set, changing a final output layer of the pre-trained model to enable the output layer to output the classification characteristics of the skin disease image, and finely adjusting the migration model by using the image data of the training set in the example to finally obtain the deep migration convolutional neural network.

The multi-layer perceptron unit is an Artificial Neural Network (ANN), and comprises:

the text parameter feature extraction device comprises an input layer for receiving text numerical data, an output layer for receiving text parameter features and a hidden layer arranged between the input layer and the output layer and used for extracting input data features, wherein the layers are fully connected through neurons, and the text parameter feature extraction device comprises: each layer of the network comprises a certain number of neurons, information processing and transmission are carried out on each layer through specific weights, threshold values and activation functions, the input layer receives numerical value input, and the output layer outputs specific results according to feature extraction and combination conditions.

The feature fusion is that: and combining the extracted image features and the text parameter features to obtain a comprehensive feature vector of the image and text data.

The comprehensive full-connection layer is as follows: the last few layers of the convolutional neural network, which are intended to map the distributed feature representation learned by convolutional layers to class space, act as a classifier. The synthesis means that: the fully connected layer of the present invention receives the integrated feature input containing image and text data for final classification.

The softmax activation function refers to: in the multi-classification task, the activation function included in the final fully-connected layer maps a plurality of neuron output values to (0,1), and the sum of the output values is 1. From the perspective of probability, for the output of each sample, the largest value (the largest probability) is selected as the class prediction value of the sample, specifically:

wherein: e.g. of the typeⁱIs the output value of the ith neuron, whose value is the power i of e, ∑_je^jIs the sum of the outputs of all neurons in the full link layer, S_iAnd predicting probability values for the output classes of the ith neurons of the full connection layer.

The total accuracy rate is as follows: according to the real category label of the sample, the ratio of the number of all samples with correct prediction to the number of the whole samples is as follows:

wherein: TP is actually the positive type, and the model correctly predicts the number of the positive types; TN is actually the negative category, and the model correctly predicts the number of the negative categories; FP is actually negative category, and the number of the positive categories is predicted by the model error; FN is the actual positive class and the model mispredicts to the number of negative classes. In the multi-classification problem, for a specific class, the class is a positive class, and the non-class is a negative class.

The average sensitivity refers to: for each class, the proportion correctly predicted by the model as a positive class is sensitivity. The average sensitivity is then the average of all individual class sensitivities

Wherein: TP is actually positiveOtherwise, the model predicts correctly the number of positive categories; FN is the actual positive class and the model mispredicts to the number of negative classes.

The average accuracy is as follows: in the positive category of model prediction, the proportion that really belongs to the positive category is the accuracy. The average accuracy is the average of the accuracies of all the individual classes

Wherein: TP is actually the positive type, and the model correctly predicts the number of the positive types; FP is the actual negative class and the model mispredicts as the number of positive classes.

Technical effects

The invention integrally solves the problems that the accuracy is low due to the fact that when the traditional convolutional neural network is used for classifying the skin disease images, text numerical data matched with the skin disease images are ignored and only single image data is processed, and the skin disease types are large, the difference among individual disease types is not obvious, and the data of the disease types are unbalanced.

Compared with the prior art, the method comprehensively improves the classification accuracy of the skin disease images by utilizing the multi-mode input text numerical data: when skin disease image classification is carried out, skin disease image data and text data can be processed simultaneously, extraction of image features and text parameter features is carried out through establishing a depth migration convolutional neural network and a multilayer perceptron unit, information supplement and interaction existing among modal data are utilized, and the accuracy of skin disease image classification is effectively improved in a data feature fusion mode. The traditional convolution neural network based on single-mode image input ignores text data except images in the skin disease image classification process, and the data can often provide valuable reference information in the medical process.

Drawings

FIG. 1 is a schematic diagram of the system of the present invention;

FIG. 2 is a flow chart of skin condition image classification based on multi-modal data input;

FIG. 3 is a schematic diagram of an example of a class 7 dermatological sample in example;

FIG. 4 is a schematic structural diagram of a Transfer-ResNet50 feature extractor in the embodiment;

FIG. 5 is a schematic diagram of a multi-layered sensor unit according to an embodiment.

Detailed Description

As shown in fig. 1, a skin disease image classification system based on multi-modal data input includes: the system comprises a data preprocessing module, an image feature extraction module, a text parameter feature extraction module, a feature fusion module and a classification output module, wherein: the image characteristic extraction module and the text parameter characteristic extraction module comprehensively utilize skin disease image data and related text data, can receive image and text numerical value multi-modal data input and perform characteristic extraction and fusion, a depth migration convolutional neural network in the image characteristic extraction module effectively represents and extracts skin disease image characteristics, the multilayer perceptron unit in the text parameter characteristic extraction module is used for efficiently extracting text parameter characteristics, and the image characteristics and the text parameter characteristics are integrated through the characteristic fusion module, so that the classification accuracy of skin disease images is improved together.

The image feature extraction module adopts a Transfer learning method, selects a Residual error Network (ResNet) trained in advance based on a natural image data set, removes the last full connection layer of the original Residual error Network by transferring the Network structure and the weights of all layers of ResNet50, adds three batch normalization layers, one discarding layer and two full connection layers with the activation functions of Re L U, and establishes a deep Transfer convolutional neural Network Transfer-ResNet 50.

In the residual error network, the network weight of each layer of ResNet50 is used as an initial value during training, the first 10 layers of ResNet50 networks are frozen, namely, neuron parameters of the first 10 layers are not updated, the characteristic extraction is carried out on the skin disease image by using neuron parameters of 40 layers after the skin disease image data set is trained, and finally a 64-dimensional skin disease image characteristic vector is output.

The text parameter feature extraction module uses a multilayer perceptron unit to extract features of text parameters, the multilayer perceptron unit receives input of coded numerical vectors by building an input layer, two layers of hidden layers are provided, the number of neurons is 64 and 32 respectively, Re L U is used as an activation function, the number of neurons in a feature output layer is 4, Re L U is used as the activation function, and finally a 4-dimensional text parameter feature vector is output.

This example is illustrated by taking as an example a classification of a skin mirror image containing 7 types of pigmented skin lesions, which specifically includes the following steps:

step 1, dividing an original skin disease data set into image data and corresponding text parameter data, wherein:

the image data were 10015 skin mirror images containing 7 types of pigmented skin lesions, respectively: 327 solar keratosis images, 514 basal cell tumors, 1099 seborrheic keratosis images, 115 skin fibroma lesions, 6705 melanocyte nevi, 1113 melanoma images and 142 vascular skin lesions, wherein samples of all the categories are shown in FIG. 3;

the text data includes the number of the skin disease image, the skin lesion number, and the diagnosis confirmation type.

Step 2, respectively preprocessing the image data and the text data, specifically:

2.1) acquiring pixel value matrixes of three color channels of RGB of each image aiming at the skin disease image data:

X＝[X_R,X_G,X_B]and calculating the mean value mu according to the pixel value matrix of all the images_XAnd standard deviation sigma_X. Using formulas

Normalizing the matrix of pixel values for each input image to scale the matrix values to [0, 1%]。

2.2) encoding each item of data for text data, mapping character-type image numbers, skin lesion numbers, and diagnosis confirmation types to corresponding numerical serial numbers, thereby converting the character-type data into numerical data, whereby each input sample becomes a numerical vector.

2.3) in the processed image data and text data thereof, dividing the processed image data and text data into training sample sets according to the proportion of 80% for training classification prediction models; the remaining 20% of the proportion is the test set used for model final testing and results comparison.

Step 3, constructing a deep migration convolutional neural network feature extractor shown in fig. 4 by using a migration learning method, and extracting features of the skin disease image, wherein the method specifically comprises the following steps:

3.1) selecting a Residual Network (ResNet) trained in advance based on a natural image dataset, and constructing a feature extractor by taking ResNet50 with a 50-layer Network (49 layers of which are convolutional layers) as an example.

ResNet ensures that a deep network at least keeps the performance of a shallow network by introducing a residual learning mode and utilizing identity mapping, thereby overcoming the problems that the learning efficiency is low and the accuracy cannot be effectively improved due to deepening of the network depth. Due to the structural innovation of the residual error network, the pre-training model ResNet50 has good image feature learning capability and generalization capability.

3.2) removing the last full-connection layer by migrating the network structure and the weights of the ResNet50, only using for image feature extraction and not outputting classification categories, meanwhile, adding three batch normalization layers after the network layer of the ResNet50, one discarding layer and two full-connection layers with the activation function of Re L U, and establishing a deep migration convolutional neural network Transfer-ResNet50 as shown in FIG. 4.

The batch normalization layer is as follows: batch standardization and deep learning training skills. For the neurons of each hidden layer of the deep neural network, the input distribution is normalized to the standard normal distribution with the mean value of 0 and the variance of 1, so that the input data of each layer has the same distribution, and the learning process of the network is accelerated.

The discarding layer is as follows: in the method for preventing model overfitting, hidden layer neuron output is set to be 0 according to a certain probability p (in the example, p is 0.25) in the deep neural network training process, and the updating of parameters of the hidden layer neuron is stopped.

The Re L U activation function is a modified linear unit, and when x is less than 0, the output is 0, and when x is greater than 0, the output is x.

3.3) during training, using the network weight of each layer of ResNet50 as an initial value, freezing the first 10 layers of ResNet50 network, namely not updating the neuron parameters of the first 10 layers, using the neuron parameters of the 40 layers after the training of the skin disease image data set, reasonably using the good feature learning capability of ResNet50 to extract the features of the skin disease image, and finally outputting a 64-dimensional feature vector.

Step 4, constructing a multilayer perceptron unit shown in fig. 5 to perform feature extraction on the text parameters, specifically comprising:

the method comprises the steps of constructing an input layer for receiving input of coded numerical vectors, constructing two hidden layers, wherein the neuron number is 64 and 32 respectively, Re L U is used as an activation function, the neuron number of a characteristic output layer is 4, Re L U is used as the activation function, and performing characteristic extraction on text parameters by a multilayer perceptron unit to output a 4-dimensional characteristic vector.

And 5, performing feature fusion based on the image features and the text parameter features obtained in the steps 3 and 4: using the fusion network layer, the 64-dimensional image feature vector and the 4-dimensional text parameter feature vector are combined to form a 68-dimensional comprehensive feature vector.

And 6, inputting the comprehensive characteristic vector obtained in the step 5 into a comprehensive full-connection layer, mapping the comprehensive characteristic vector to 7 dermatosis category spaces of the case by the comprehensive full-connection layer by using a softmax activation function, and outputting a final classification result and corresponding probability.

By adopting the steps, the image feature extractor based on the depth migration convolutional neural network and the text parameter feature extractor based on the multilayer perceptron unit are constructed, feature extraction can be simultaneously performed on the skin disease image data and the text data, the extracted image features and the text parameter features are fused, and finally the classification result of the skin disease image is output.

Based on the 7 types of skin disease images and the corresponding text parameters of the embodiment, the comprehensive classification model is trained on a training set, and the model classification performance is verified by using a test set. Meanwhile, in order to show the performance improvement of the method on the skin disease image classification task, the traditional convolutional neural network method is used for training and testing on the same skin disease image data set, the overall classification Accuracy (Accuracy), the average Sensitivity (Sensitivity) and the average Precision (Precision) of the two methods on the test set are compared, and the comparison result is shown in table 1.

TABLE 1 comparison of skin disease image classification results

As shown in table 1, for the skin disease image multi-classification problem, compared with the traditional convolutional neural network method, the method improves the classification accuracy by 6.94%; the sensitivity of the classification is improved by 16.80%; the accuracy of classification is improved by 21.22%. 3, classification evaluation indexes are obviously improved, and the classification performance of the skin disease image of the model is well improved on the basis of fully utilizing various modal data.

Through specific practical experiments, under the specific environment setting of a Python programming language and a tensrflow deep learning framework, the device/method is started/operated by parameters of an initial learning Rate L earning Rate being 0.001, a data set iteration number Epoch being 50 and a sample size Batchsize of each training being 32, and the obtained experimental data includes that the classification accuracy of 7 types of skin disease images is 85.62%, the sensitivity is 66.90% and the accuracy is 81.02%.

Compared with the prior art, the system constructs a skin disease image classification system capable of processing multi-modal data input, and specifically comprises the following steps: and performing feature extraction and fusion on the skin disease image data and the corresponding text parameter data through multi-modal data feature extraction. The limitation that the skin disease image classification only focuses on the image data is broken through, and an effective technical means is provided for comprehensive utilization of medical data. The image feature extraction module based on the deep migration convolutional neural network can well extract features of skin disease images, the text parameter feature extraction module based on the multilayer perceptron unit can efficiently extract text parameter features, the feature fusion module combines the two types of features, and the accuracy of skin disease image classification is improved by utilizing information supplement and interaction among modal data features. Compared with the traditional convolutional neural network method, the classification accuracy of the method on the 7-class skin disease images is improved by 6.94 percent; the sensitivity is improved by 16.80%; the accuracy is improved by 21.22%.

The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. A system for classifying skin conditions based on multi-modal data input, comprising: the system comprises a data preprocessing module, an image feature extraction module, a text parameter feature extraction module, a feature fusion module and a classification output module, wherein: the data preprocessing module receives multi-modal data comprising image data and text data corresponding to the image data, a pixel value matrix is obtained through preprocessing and is standardized, text parameters in the text data are encoded to obtain a numerical matrix, and a training sample set and a test sample set are generated through the processed data; the deep migration convolutional neural network in the image feature extraction module is trained by a training sample set and then the feature extraction of the image is carried out; the multi-layer perceptron unit in the text parameter characteristic extraction module is trained by a training sample set and then carries out the characteristic extraction of text parameters; the feature fusion module fuses the image features and the text parameter features through a feature fusion method and outputs the fused image features and the text parameter features to the classification output module together, and a comprehensive full-connection layer of the classification output module utilizes a softmax activation function to realize classification result output;

2. The skin condition image classification system according to claim 1, wherein the depth migration convolutional neural network is: and migrating the models established in other task fields to the target task field by using a migration learning method, so that the target task is quickly and efficiently completed by using the adjusted migration model, and the difficulty of restarting establishing the models is avoided. The method specifically comprises the following steps: selecting a convolutional neural network model pre-trained on a natural image data set, changing a final output layer of the pre-trained model to enable the output layer to output the classification characteristics of the skin disease image, and finely adjusting the migration model by using the image data of the training set in the example to finally obtain the deep migration convolutional neural network.

3. The skin condition image classification system according to claim 1, wherein the multi-layered perceptron unit is a feed-forward artificial neural network comprising: the text parameter feature extraction device comprises an input layer for receiving text numerical data, an output layer for receiving text parameter features and a hidden layer arranged between the input layer and the output layer and used for extracting input data features, wherein the layers are fully connected through neurons, and the text parameter feature extraction device comprises: each layer of the network comprises a certain number of neurons, information processing and transmission are carried out on each layer through specific weights, threshold values and activation functions, the input layer receives numerical value input, and the output layer outputs specific results according to feature extraction and combination conditions.

4. The skin condition image classification system according to claim 1, wherein the feature fusion is: and combining the extracted image features and the text parameter features to obtain a comprehensive feature vector of the image and text data.

5. The skin condition image classification system according to claim 1, wherein the comprehensive fully-connected layer is: the last few layers of the convolutional neural network, which are intended to map the distributed feature representation learned by convolutional layers to class space, act as a classifier. The synthesis means that: the fully connected layer of the present invention receives the integrated feature input containing image and text data for final classification.

6. The dermatologic image classification system according to claim 1, wherein the softmax activation function is: in the multi-classification task, the activation function included in the final fully-connected layer maps a plurality of neuron output values to (0,1), and the sum of the output values is 1. From the perspective of probability, for the output of each sample, the largest value (the largest probability) is selected as the class prediction value of the sample, specifically:

7. The skin condition image classification system according to claim 1, wherein the overall accuracy is: according to the real category label of the sample, the ratio of the number of all samples with correct prediction to the number of the whole samples is as follows:

8. The dermatologic image classification system of claim 1, wherein the average sensitivity is: for each class, the proportion correctly predicted by the model as a positive class is sensitivity. The average sensitivity is then the average of all individual class sensitivities

Wherein: TP is actually the positive type, and the model correctly predicts the number of the positive types; FN is the actual positive class and the model mispredicts to the number of negative classes.

9. The skin condition image classification system according to claim 1, wherein the average accuracy is: in the positive category of model prediction, the proportion that really belongs to the positive category is the accuracy. The average accuracy is the average of the accuracies of all the individual classes