CN114565593A

CN114565593A - Full-view digital image classification and detection method based on semi-supervision and attention

Info

Publication number: CN114565593A
Application number: CN202210208369.6A
Authority: CN
Inventors: 薛梦凡; 陈怡达; 贾士绅; 江浩东; 杨岗; 陈明皓
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-05-31
Anticipated expiration: 2042-03-04
Also published as: CN114565593B

Abstract

The invention discloses a full-view digital image classification and detection method based on semi-supervision and attention. A full-view digital image classification and detection framework is constructed, classification results can be directly output, interested areas can be visually displayed, a user can be assisted in accurately judging the image types, and meanwhile, the interested areas can be rapidly locked. Compared with a weak supervision learning method which does not need to label the region of interest at all, the method can greatly improve the classification accuracy of the full-view digital image and accurately detect the region of interest only by labeling a small number of regions of interest of the full-view digital image, and has higher practicability.

Description

Full-view digital image classification and detection method based on semi-supervision and attention

Technical Field

The invention relates to the technical field of full-view digital image processing, in particular to a full-view digital image classification and region-of-interest detection method based on semi-supervised learning and attention.

Background

The full-field digital image is an image with ultrahigh resolution, which is scanned by a full-automatic microscope scanner and automatically processed by a computer technology, generally exceeds ten giga pixels, one full-field digital image contains huge amount of information, a professional needs to spend a large amount of time when searching for an interesting region label in the full-field digital image, the judgment of the image type and the retrieval of the interesting region are based on subjective opinions of people and are limited by subjectivity, fatigue and difference cognition, even if the best full-field digital image is represented in a sample, a good consistency result is difficult to obtain, and key problems such as false detection, missing detection and the like easily occur.

In recent years, artificial intelligence technology has been introduced into the field of classification of full-field digital images to achieve excellent effects, and has received unprecedented attention. The convolutional neural network does not depend on a feature description operator defined, selected and designed manually, can automatically mine deep information of the full-view digital image to extract image features and complete classification, and has the advantages of high efficiency, high stability, strong generalization and the like.

The inventor finds that in the current full-view digital image classification method based on deep learning, a full-view digital image needs to be labeled by a professional, and then an interested region is extracted and sent to a network for training and completing a classification task. The advantage of this classification method is high accuracy, however, it requires a large full-field digital image dataset with regions of interest labeled. Because the region of interest labeling of the full-view digital image consumes a lot of time and human resources, the method is largely limited by the inability to construct a large-scale full-view digital image sample dataset. The learners use the data set without region of interest labeling for classification, and the accuracy of the classification model is low due to the fact that the characteristics of the space, the texture and the like of the region of interest cannot be effectively extracted. In addition, the two methods only can finish the classification task of the full-view digital image, and do not detect the region of interest, so that the user cannot quickly lock the position of the region of interest when judging the type of the image, and still needs to spend a lot of time.

Therefore, a full-field digital image classification and region-of-interest detection method with high classification accuracy without a large-scale data set for region-of-interest labeling is needed.

Disclosure of Invention

The invention aims to solve the technical problem that the existing full-view digital image classification method based on deep learning is limited by a data set lacking large-scale region of interest marks, and provides a full-view digital image classification and region of interest detection method based on semi-supervised learning and attention, which can greatly improve the classification accuracy while only needing a small number of data sets with region of interest marks.

The method comprises the following steps:

step S1: full-field digital images are collected and preprocessed.

Step S2: the pre-training feature extraction network Resnet18 is used for extracting features of a full-view digital image, and comprises the following specific steps:

step S21: selecting a part of full-view digital image and a standard contrast sample, framing the region of interest by using a labeling frame, and framing the content part of the standard contrast sample by using the labeling frame;

step S22: generating a mask with the same size and position as the interested region on the preprocessed full-field digital image by using the marking frame of the interested region;

step S23: utilizing a sliding window to cut the preprocessed full-view digital image into a plurality of small image blocks of n multiplied by n, wherein n is the pixel width and height of the small image blocks;

step S24: overlapping the mask with the preprocessed full-view digital image, removing small image blocks at non-overlapping positions, and reserving the small image blocks at the overlapping positions;

step S25: and (5) sending the small image blocks stored in the step (S24) to a Resnet18 network for training, storing and outputting the network structure and parameters thereof after training.

Step S3: sending all full-view digital images into the Resnet18 network pre-trained in the previous step to extract features, and the specific steps are as follows:

step S31: and automatically segmenting all full-field digital images by using opencv, filtering blank backgrounds and artificially formed holes, segmenting the blank backgrounds and the artificially formed holes into small image blocks of n multiplied by n, and storing the coordinates of each image block.

Step S32: the small image blocks are fed into a pre-trained Resnet18 network and converted into a 512-dimensional feature vector h at a fourth residual block_kI.e. the features extracted for each small image block.

Step S4: and (4) sending the features extracted in the step (S3) into a depth gating channel attention module, comprehensively generating Slide-level features, and realizing classification of the full-field digital image through a classification layer. The method comprises the following specific steps:

step S41: the feature vector h_kSending the image data to an attention module of a depth gating channel to obtain an attention score a corresponding to each small image block_k,n：

Wherein a is_k,nAttention score, P, indicating that the kth small image patch belongs to the nth class_a,nRepresents a linear layer belonging to the nth class, sigma (. eta.) represents a sigmoid activation function, tanh (. tan. h.) represents a tanh activation function, and V (. eta.), W (. eta.), G (. eta.), J (. eta.), and L (. eta.) represent different linear layers, respectivelyLayer, N is the total number of image blocks;

step S42: comprehensively generating Slide-level features h by the feature vector corresponding to each small image block and the attention score_slide,n：

h_slide,nRepresenting features of each full-field digital image in the nth class;

step S43: a Slide-level feature vector h_slide,nIs sent into a classification layer

Obtaining a classification result, and realizing the classification of the full-view digital image;

step S5: and (4) extracting the attention scores of all the small image blocks generated in the step (4) corresponding to the model prediction class, generating color blocks with corresponding colors by using the attention scores corresponding to the small image blocks by using matplotlib, covering corresponding positions on the original full-view digital image with certain transparency, and obtaining the detection heat map of the region of interest after fuzzy and smooth operation.

Preferably, the preprocessing is to perform color normalization on the collected full-field digital image according to an input image template.

Preferably, the transparency is 0.4-0.6.

Compared with the prior art, the beneficial results of the invention are as follows:

(1) the method can be popularized and applied to various tasks of classification and interesting region detection according to the full-view digital image, and has universality.

(2) The full-view digital image classification and region-of-interest detection method based on semi-supervised learning and attention only uses a small number of full-view digital images labeled by region-of-interest regions when training the feature extraction model, and uses full-view digital images without region-of-interest labels when training the classification network, so that the preparation work of a data set is reduced, the accuracy of the classification network in a full-view digital image classification task is greatly improved, and simplicity and high accuracy are integrated.

(3) The invention separates the feature extraction module from the classification layer classification module, can arbitrarily add and replace the attention module in the middle, and has stronger adaptability. And after the change, all networks do not need to be trained, and only the newly added attention module and the newly added classification layer need to be retrained, so that the training time is greatly reduced.

(4) The depth gating channel attention network provided by the invention can capture channel information, uses deeper-level attention branches to strengthen the discrimination of attention scores layer by layer, so that the attention scores of small image blocks have stronger robustness and are more accurate, the accuracy of full-view digital image classification is effectively improved, and the depth gating channel attention network can be inserted and used at any time, is easy to realize and has high practicability.

(5) The invention constructs a full-view digital image classification and detection framework, can directly output classification results and visually display interested areas, can assist users in accurately judging the types of the interested areas and simultaneously quickly lock the interested areas.

Drawings

FIG. 1 is a flowchart of a full-field digital image classification and region-of-interest detection method based on semi-supervised learning and attention

FIG. 2 is a flow chart of a pre-training feature extraction network of the present invention

Detailed Description

The present invention will be described in further detail with reference to the following specific examples and the accompanying drawings.

As shown in fig. 1, the data collected in this example for classification and detection of lung adenocarcinoma and lung squamous carcinoma includes 1724 lung adenocarcinomas, 1707 lung squamous carcinomas and 30 normal tissue samples. The full-field digital pathological image marked in the lesion area only accounts for 1.75% of the total sample. The feature extraction network uses Resnet 18. The method for classifying lung cancer pathological images and detecting focuses by using the semi-supervised learning and attention-based method comprises the following steps of:

step S1: 3431 total digital pathological images of lung adenocarcinoma and lung squamous carcinoma in whole visual field are collected, and 30 normal tissue samples are collected. Reading all pathological image information, and performing color normalization processing on all pathological images to eliminate pathological image color difference caused by different coloring agent ratios, dyeing and scanning factors.

Step S2: the pre-training feature extraction network Resnet18 is used for extracting features of all lung cancer pathological images, and as shown in FIG. 2, the specific steps are as follows:

step S21: 30 lung adenocarcinoma, lung squamous carcinoma and normal tissue samples are selected respectively, the lesion area of the cancerous tissue sample is framed by a professional pathologist by a labeling frame, and the tissue area of the normal tissue sample is framed by a labeling frame.

Step S22: and generating a mask with the same size as the original pathological size through a calibration frame marked by a doctor.

Step S23: the pathological section is cut into a plurality of 256 × 256 small image blocks by using a sliding window.

Step S24: overlapping the mask with the original pathological image, eliminating the small image blocks at the non-overlapping positions, and storing the small image blocks at the overlapping positions.

Step S3: sending all lung adenocarcinoma and lung squamous carcinoma full-visual field digital pathological images into a Resnet18 network pre-trained in the previous step to extract features, and specifically comprising the following steps:

step S31: pathological images of all cancerous samples are automatically segmented by opencv, background and artificially formed cavities are filtered, and only tissue parts in the pathological images are reserved. The tissue portion is divided into 256 x 256 small image blocks, stored as a pile of image blocks and their coordinates.

Step S32: the small image blocks are fed into a pre-trained Resnet18 network and converted into a 512-dimensional feature vector h at the fourth residual block_kI.e. the features extracted for each small image block.

Step S4: and (4) sending the features extracted in the step (S3) into a depth gating channel attention module, comprehensively generating Slide-level features, and realizing classification of lung cancer pathological images through a classification layer. The method comprises the following specific steps:

step S41: the feature vector h_kSending the image into an attention module of a depth gating channel to obtain the attention score a corresponding to each small image block_k,n：

Wherein a is_k,nAn attention score, P, indicating that the kth small image block belongs to the nth class_a,nRepresents a linear layer belonging to the nth class, σ (-) represents a sigmoid activation function, tanh (-) represents a tanh activation function, V (-) and,

W (-), G (-), J (-), L (-), respectively represent different linear layers, and N is the total number of image blocks.

Step S43: a Slide-level feature vector h_slide,nEnter the classification level of the corresponding category

And obtaining a classification result, and realizing the classification of the lung cancer pathological images.

Step S5: and (4) extracting the attention scores of all the small image blocks generated in the step (4) corresponding to the model prediction class, generating color blocks with corresponding colors by using the attention scores corresponding to the small image blocks by using matplotlib, covering corresponding positions on the original full-view digital image with the transparency of 0.5, and obtaining a detection heat map of the region of interest after fuzzy and smooth operation.

The above embodiments are not intended to limit the present invention, and the present invention is not limited to the above embodiments, and all embodiments are within the scope of the present invention as long as the requirements of the present invention are met.

Those skilled in the art will appreciate that the invention may be practiced without these specific details.

Claims

1. The full-view digital image classification and detection method based on semi-supervision and attention is characterized by comprising the following steps:

step S1: collecting a full-view digital image and preprocessing;

step S23: utilizing a sliding window to divide the preprocessed full-view digital image into a plurality of small image blocks of n multiplied by n, wherein n is the pixel width and height of the small image blocks;

step S25: sending the small image blocks stored in the step S24 into a Resnet18 network for training, storing and outputting the network structure and parameters thereof after training is finished;

step S31: automatically segmenting all full-view digital images by using opencv, filtering a blank background and a manually formed cavity, segmenting the blank background and the artificially formed cavity into small image blocks of n multiplied by n, and storing the coordinates of each image block;

step S32: the small image blocks are fed into a pre-trained Resnet18 network and converted into a 512-dimensional feature vector h at the fourth residual block_kI.e. each small image blockTaking the characteristics;

step S4: the feature h extracted in the step S3_kSending the images into a depth gating channel attention module, comprehensively generating Slide-level features, and realizing classification of full-field digital images through a classification layer; the method comprises the following specific steps:

Wherein a is_k,nAttention score, P, indicating that the kth small image patch belongs to the nth class_a,nRepresenting linear layers belonging to the nth class, sigma (·) representing a sigmoid activation function, tanh (·) representing a tanh activation function, V (·), W (·), G (·), J (·), L (·) respectively representing different linear layers, and N being the total number of image blocks;

step S42: comprehensively generating Slide-level feature vector h by the feature vector corresponding to each small image block and the attention score_slide,n：

h_slide,nRepresenting the characteristics of each full-field digital image in the nth class;

2. The semi-supervised and attention based full-field digital image classification and detection method as claimed in claim 1, wherein: the preprocessing is to perform color normalization on the collected full-field digital image according to an input image template.

3. The semi-supervised and attention based full-field digital image classification and detection method as claimed in claim 1, wherein: the transparency is 0.4-0.7.