CN113159186A

CN113159186A - Medical image automatic labeling method based on semi-supervised learning

Info

Publication number: CN113159186A
Application number: CN202110441605.4A
Authority: CN
Inventors: 颜成钢; 张二四; 彭开来; 朱晨瑞; 孙垚棋; 张继勇; 李宗鹏; 张勇东
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2021-07-23

Abstract

The invention discloses a medical image automatic labeling method based on semi-supervised learning. Firstly, inputting a training set into a primary classification network to obtain a primary general deep convolution characteristic, then pre-classifying data according to an output result, and obtaining a pre-classified confusion matrix. Then obtaining a class training subset through spectral clustering, inputting the class training subset into a secondary classification network, and obtaining secondary special depth convolution characteristics; and fusing the two images to obtain the depth convolution characteristic of the image. And encoding additional information of the patient into multi-source heterogeneous characteristics through word2vec, fusing the multi-source heterogeneous characteristics with the deep convolution characteristics, and obtaining a final output result through an SVM classifier. The method of the invention can finish automatic labeling of a large amount of medical data only by using a small amount of labeled medical data. Greatly reducing a large amount of manpower and material resources consumed by manual labeling. Compared with the existing automatic labeling method, the method has higher efficiency and more accurate labeling result.

Description

Medical image automatic labeling method based on semi-supervised learning

Technical Field

The invention relates to semi-supervised automatic annotation of images, in particular to a semi-supervised learning method for automatically annotating medical images by using a small amount of marked images and additional text description information.

Background

In the beginning of the last century, with the development of computer technology and imaging technology, medical images have become important information sources for clinical diagnosis, and play a vital role in diagnosis of various diseases, such as cerebral infarction, pulmonary nodules, pulmonary embolism, thyroid tumor, and the like. The medical image is an important reference basis for diagnosis of doctors, and the image annotation technology can assist the doctors in carrying out preliminary diagnosis on patients and provide corresponding treatment schemes for the doctors to refer to, so that part of workload of the doctors is reduced, and the medical treatment is promoted to realize intelligent transformation. B-ultrasonic scanning images, color Doppler ultrasonic images, nuclear magnetic resonance images, CT images, PET images, SPECT images, digital X-ray machine images, X-ray perspective images, electronic endoscopic images and the like. How to effectively manage the pictures, how to help doctors quickly find interesting images, and how to use the existing case charts to help the doctors to diagnose and improve the working efficiency, and these problems make the retrieval of medical images and the related technology thereof a research hotspot.

Generally, image retrieval techniques fall into two categories: text-based Image Retrieval Technology (TBIR) and Content-based Image Retrieval technology (CBIR), in TBIR technology, images are usually manually labeled first and then retrieved in a Text Retrieval manner, but TBIR technology has obvious defects, and particularly, when the types and the number of the images are very large, the workload required for manually labeling the images is very large, CBIR technology is to search images based on low-level features of the images, however, a user does not directly consider the low-level visual feature similarity of the images when searching the images, but judges whether the images meet the search requirements according to the objects described by the images and the expressed semantic information, so that the 'great gap' limits the development of CBIR technology, Image automatic labeling technology appears, and the embarrassment of the two Retrieval technologies is greatly improved, the user search can be convenient as text search without considering the low-level characteristics of the image, so that the automatic image annotation technology becomes an important research direction in the field of image search.

The core idea of the image automatic labeling technology is that an algorithm automatically learns a semantic concept model from a large number of samples, and then the learned model is used for automatically distributing proper labels for new images. Once an image is assigned a semantic tag, the user can retrieve the image based on the keywords in a manner similar to text retrieval. The automatic image annotation has the characteristics that the image annotation is given according to the semantic content of the image, and the advantages of TBIR and CBIR are achieved. Although scholars have made great progress in automatic image labeling, many algorithms still depend on the number of labeled images, that is, the method models usually require a large amount of labeled data for training, which is at a great labor cost. In the case where the number of labeled data is insufficient, the generalization performance of many models is not high. In a specific task, marked data is difficult to obtain, unmarked data is easy to obtain, and how to fully utilize a large amount of unmarked data to improve the generalization performance of the annotation model is a very challenging problem. Semi-supervised learning attempts to fully utilize unlabeled data to assist weak classifiers in training out labeled models with strong generalization capability. Therefore, how to introduce the semi-supervised learning technology into the image annotation field is a valuable research topic.

Disclosure of Invention

The invention mainly considers that although a large amount of medical data is possessed at present, the data is lack of labels, and the manual labeling needs huge manpower and material resources. How to automatically label unmarked data by using a small amount of marked data is a question worthy of discussion.

Aiming at the actual situation, the invention provides a medical image automatic labeling method based on semi-supervised learning, and the method combines the transfer learning and the deep convolution characteristics, extracts the secondary deep convolution characteristics for the sample with higher similarity, realizes more accurate automatic labeling result, and greatly reduces the needed manpower and material resources.

The method comprises the steps of firstly carrying out ResNet network pre-training on an ImageNet data set, then finely adjusting the network by using a small amount of marked medical image data set to obtain a first-level general deep convolution characteristic of the data, then carrying out pre-classification on the data according to an output result of the network, and obtaining a pre-classified confusion matrix. Then, a class training subset is obtained through spectral clustering, and the class training subset is input into a network to obtain the second-level special depth convolution characteristics of the data. And fusing the primary general depth convolution characteristic and the secondary special depth convolution characteristic together to obtain the depth convolution characteristic of the image. In addition, additional text information of the patient, such as data of age, sex, blood pressure and the like, is coded into multi-source heterogeneous characteristics through word2vec, and the multi-source heterogeneous characteristics are fused with the deep convolution characteristics and can obtain a final output result through an SVM classifier. The method specifically comprises the following steps:

and (1) taking Resnet pre-trained in ImageNet as a primary classification network, and replacing the convolution of the fourth and fifth volume blocks with the hole convolution with the hole rate of 2. Medical image data is collected and labeled by a professional physician to obtain a data set, and the data set is divided into a training set and a testing set. Then, the first-class classification network is trained through a training set with classification labels, and the training of the first-class classification network is completed. And inputting the training set into a primary classification network to obtain primary general deep convolution characteristics of the data.

Pre-classifying data by using the trained primary classification network to obtain a pre-classified confusion matrix, then obtaining a class training subset in a spectral clustering mode, inputting the class training subset into a secondary classification network for training, wherein the network has the same structure as the primary classification network; and obtaining the second-level special depth convolution characteristics of the data. And fusing the primary general depth convolution characteristic and the secondary special depth convolution characteristic together in a splicing mode to obtain the depth convolution characteristic of the data.

And (3) encoding the additional information of the image into multi-source heterogeneous characteristics through word2 vec.

And (4) carrying out feature fusion on the deep convolution features and the multi-source heterogeneous features, and obtaining a final labeling result through an SVM classifier.

And (5) in the training process, inputting a training set, calculating an output loss function, and adjusting network parameters through a back propagation algorithm. In the testing stage, the test set is input, and the marking result can be obtained.

The invention has the following beneficial effects:

the method of the invention can finish automatic labeling of a large amount of medical data only by using a small amount of labeled medical data. Greatly reducing a large amount of manpower and material resources consumed by manual labeling. Compared with the existing automatic labeling method, the method has higher efficiency and more accurate labeling result.

Drawings

FIG. 1 is a flow chart of an implementation of an embodiment of the present invention;

fig. 2 is a schematic diagram of two-level hierarchical feature learning according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to the accompanying drawings and examples.

The invention provides a medical image automatic labeling method based on semi-supervised learning, which can finish automatic labeling of large-scale medical image data only by a small amount of labeled medical image data. The implementation flow is shown in figure 1. The method comprises the following steps:

step (1), Resnet pre-trained on ImageNet is used as a first-level classification network, and the convolution of the fourth and fifth volume blocks is replaced by the void convolution with the void rate of 2, so that a larger receptive field can be obtained while the spatial resolution of the image is kept unchanged, more dense characteristic response is obtained, and the operand can be kept unchanged. Medical image data is collected and labeled by a professional physician, a data set is obtained and divided into a training set and a test set. And then, inputting the training set with the classification labels into a primary classification network to finely adjust the network, thereby finishing the training of the primary classification network. And after the training is finished, inputting the image data of the training set into a trained primary classification network to obtain a primary general depth convolution characteristic of the image data.

Pre-classifying data by using the trained primary classification network to obtain a pre-classified confusion matrix, and then obtaining a sample with high similarity as a class training subset in a spectral clustering mode; and constructing an adjacency matrix by using a full-connection method, regarding each sample data as a node, wherein the weight between two sample points with a far distance is low, and the weight between two sample points with a near distance is high. For the full join method, the weight values between all points are greater than 0. Different kernel functions are selected to define edge weights, and polynomial, gaussian and Sigmoid kernels are commonly used. The edge weight is defined by adopting a Gaussian kernel function (RBF), and the specific formula is as follows:

and after the training subset is obtained, inputting the training subset into a secondary classification network for training. And after the training is finished, inputting the training subset into a secondary classification network to obtain the secondary special depth convolution characteristic of the image data. And fusing the primary general depth convolution characteristic and the secondary special depth convolution characteristic together in a splicing mode to obtain the depth convolution characteristic of the image data.

And (3) encoding the additional information of the image into multi-source heterogeneous characteristics.

First, numerical data, including age, gender, and blood pressure and biochemical indicators, are used directly as part of the feature vector. Secondly, for text-type data comprising the past history and the present medical history, the text-type data is coded into corresponding feature vectors by a word2vec method. And then splicing the two feature vectors together to obtain the feature representation of the additional information.

And (4) performing feature fusion on the depth convolution feature and the multi-source heterogeneous feature, wherein the adopted fusion mode is that the depth convolution feature and the multi-source heterogeneous feature are respectively converted into one-dimensional vectors, then the two one-dimensional vectors are spliced into a new one-dimensional vector, the obtained new one-dimensional vector is input into an SVM classifier for training, and an objective function is as follows:

wherein x is_iTraining sample, y_iAs class label, w^Tx_i+ b is the spatial hyperplane and w is the hyperplane parameter.

And (5) in the training stage, inputting a training set, calculating an output loss function, and adjusting network parameters through a back propagation algorithm.

After training is finished, inputting image data of the test set into a primary classification network, wherein the primary classification network can extract primary general deep convolution characteristics, obtaining a class training subset through spectral clustering, inputting the class training subset into a secondary classification network to obtain secondary deep convolution characteristics, and fusing the primary deep convolution characteristics and the secondary deep convolution characteristics together to obtain the deep convolution characteristics. And simultaneously, inputting the additional information into the word2vec network to obtain the multi-source heterogeneous characteristics. And fusing the deep convolution characteristics and the multi-source heterogeneous characteristics, and inputting the fused deep convolution characteristics and the multi-source heterogeneous characteristics into an SVM classifier to obtain a final labeling result.

Claims

1. A medical image automatic labeling method based on semi-supervised learning is characterized by comprising the following steps:

step (1), using Resnet pre-trained in ImageNet as a primary classification network, and replacing the convolution of the fourth and fifth volume blocks with a hole convolution with a hole rate of 2; collecting medical image data, marking by a professional doctor to obtain a data set, and dividing the data set into a training set and a testing set; then, training the primary classification network through a training set with classification labels to finish the training of the primary classification network; inputting the training set into a primary classification network to obtain primary general deep convolution characteristics of data;

pre-classifying data by using the trained primary classification network to obtain a pre-classified confusion matrix, then obtaining a class training subset in a spectral clustering mode, inputting the class training subset into a secondary classification network for training, wherein the network has the same structure as the primary classification network; obtaining the secondary special depth convolution characteristics of the data; fusing the primary general depth convolution characteristic and the secondary special depth convolution characteristic together in a splicing mode to obtain the depth convolution characteristic of the data;

step (3), encoding the additional information of the image into multi-source heterogeneous characteristics through word2 vec;

performing feature fusion on the deep convolution features and the multi-source heterogeneous features, and obtaining a final labeling result through an SVM classifier;

step 5, in the training process, inputting a training set, calculating an output loss function, and adjusting network parameters through a back propagation algorithm; in the testing stage, the test set is input, and the marking result can be obtained.

2. The medical image automatic labeling method based on semi-supervised learning as recited in claim 1, wherein the specific method in step (1) is as follows:

replacing the convolution of the fourth and fifth volume blocks with a hole convolution with a hole rate of 2 by taking Resnet pre-trained on ImageNet as a primary classification network; collecting medical image data, marking by a professional doctor to obtain a data set, and dividing the data set into a training set and a testing set; then, inputting the training set with the classification labels into a primary classification network to finely adjust the network, and finishing the training of the primary classification network; and after the training is finished, inputting the image data of the training set into a trained primary classification network to obtain a primary general depth convolution characteristic of the image data.

3. The medical image automatic labeling method based on semi-supervised learning as recited in claim 2, wherein the step (2) is as follows:

pre-classifying the data by using a trained primary classification network to obtain a pre-classified confusion matrix, and then obtaining a sample with high similarity as a class training subset in a spectral clustering mode; constructing an adjacency matrix by using a full-connection method, regarding each sample data as a node, wherein the weight between two sample points with a long distance is low, and the weight between two sample points with a short distance is high; for the full-join method, the weight values between all points are all larger than 0; selecting different kernel functions to define edge weights, wherein the commonly used kernel functions comprise polynomial kernel functions, Gaussian kernel functions and Sigmoid kernel functions; the edge weight is defined by adopting a Gaussian kernel function (RBF), and the specific formula is as follows:

after the training subset is obtained, inputting the training subset into a secondary classification network for training; after training is finished, inputting the training subset into a secondary classification network, and obtaining secondary special depth convolution characteristics of the image data; and fusing the primary general depth convolution characteristic and the secondary special depth convolution characteristic together in a splicing mode to obtain the depth convolution characteristic of the image data.

4. The medical image automatic labeling method based on semi-supervised learning as recited in claim 3, wherein the specific method in step (3) is as follows:

firstly, numerical data including age, sex, blood pressure and biochemical indexes are directly used as a part of a feature vector; secondly, encoding the text type data including the prior history and the current medical history into corresponding feature vectors by a word2vec method; and then splicing the two feature vectors together to obtain the feature representation of the additional information.

5. The medical image automatic labeling method based on semi-supervised learning as recited in claim 4, wherein the step (4) is as follows:

performing feature fusion on the deep convolution feature and the multi-source heterogeneous feature, wherein the adopted fusion mode is that the deep convolution feature and the multi-source heterogeneous feature are respectively converted into one-dimensional vectors, then the two one-dimensional vectors are spliced into a new one-dimensional vector, the obtained new one-dimensional vector is input into an SVM classifier for training, and an objective function is as follows:

6. The medical image automatic labeling method based on semi-supervised learning as recited in claim 5, wherein the step (5) is as follows:

in the training stage, inputting a training set, calculating an output loss function, and adjusting network parameters through a back propagation algorithm;

after training is finished, inputting image data of a test set into a primary classification network, wherein the primary classification network can extract primary general deep convolution characteristics, obtain a class training subset through spectral clustering, input the class training subset into a secondary classification network to obtain secondary deep convolution characteristics, and fuse the primary deep convolution characteristics and the secondary deep convolution characteristics together to obtain deep convolution characteristics; meanwhile, inputting the additional information into a word2vec network to obtain multi-source heterogeneous characteristics; and fusing the deep convolution characteristics and the multi-source heterogeneous characteristics, and inputting the fused deep convolution characteristics and the multi-source heterogeneous characteristics into an SVM classifier to obtain a final labeling result.