CN116597285B

CN116597285B - Pulmonary tissue pathology image processing model, construction method and image processing method

Info

Publication number: CN116597285B
Application number: CN202310868244.0A
Authority: CN
Inventors: 张天瑜; 邵彦斌; 栾岚; 王浩然; 石权; 吴非; 潘海; 郭鑫宇; 宋凯文; 韩斌; 罗文婷; 薛松峰; 张宏伟; 刘名扬
Original assignee: CENTRAL HOSPITAL AFFILIATED TO SHENYANG MEDICAL COLLEGE; Jilin University
Current assignee: CENTRAL HOSPITAL AFFILIATED TO SHENYANG MEDICAL COLLEGE; Jilin University
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-09-22
Anticipated expiration: 2043-07-17
Also published as: CN116597285A

Abstract

The invention belongs to the technical field of medical image processing, and particularly relates to a lung tissue pathology image processing model, a construction method and an image processing method, wherein a data set of a lung tissue pathology image is obtained, and the data set is preprocessed and then is divided into a training data set and a testing data set according to a proportion; establishing an upstream task, generating a new sample after proportional fusion and addition of each image in a training data set, and training the new sample to obtain a trained mask self-encoder upstream model; the method comprises the steps of establishing a downstream task, utilizing a mask self-encoder upstream model trained by the upstream task, evaluating the performance of the mask self-encoder upstream model by adopting a test data set, solving the problems of complex diagnosis system and the like caused by complex convolution operation and sub-attention mechanisms in the current mainstream algorithm, and having the effects of high efficiency and simplifying the diagnosis system due to long training time of the algorithm.

Description

Pulmonary tissue pathology image processing model, construction method and image processing method

Technical Field

The invention belongs to the technical field of medical image processing, and particularly relates to a lung tissue pathology image processing model, a construction method and an image processing method.

Background

Lung cancer, one of several major cancers in the world, has been a serious threat to people's life health and safety. Lung cancer is statistically the first and second most common malignancy in men and women. In recent years, the incidence rate of lung squamous carcinoma is in a decreasing trend, and the incidence rate is about 30% -40% of lung carcinoma; the incidence rate of lung adenocarcinoma is in an ascending trend, and the incidence rate is about 40% -55% of lung cancer. Lung adenocarcinoma is thus the major subtype of lung cancer. Lung adenocarcinoma can be classified into invasive lung adenocarcinoma and micro-invasive lung adenocarcinoma.

The most reliable scientific basis for the pathologist to diagnose lung cancer is the pathology examination. The specific subtype of lung cancer is judged by observing the information of the size, the shape, the position and the like of cells in the pathological image. Therefore, accurate pathological diagnosis is of great importance for the treatment and cure of patients. However, since the pathological diagnosis itself has a great workload, it is a complex task for the pathologist, which is not only time-consuming and laborious, but also inevitably causes misdiagnosis.

With the rapid development of artificial intelligence, deep learning is widely applied in different fields. More and more researches show that the reliability of the deep learning algorithm in medical image analysis, especially in the task of identifying lung cancer tissue pathology images. Therefore, the deep learning algorithm can be utilized to solve the problems of great workload and missed diagnosis and misdiagnosis of lung cancer pathological diagnosis of a pathologist.

However, the algorithms used in previous deep learning studies to process lung cancer histopathological images are mostly Convolutional Neural Network (CNN), which is for local information in the image, and Vision Transformer (ViT), viT, which is global information in the captured image. Both the convolution operation in CNN and the self-attention mechanism in ViT are very complex operations, which lead to complexity of the model, which is disadvantageous for deployment in an assisted diagnostic system.

In view of the above dilemma, self-supervised learning is a viable solution, and unsupervised feature learning based on deep learning has been favored in recent years for low-level tasks such as cell and lung tumor detection and classification. Self-supervised learning may learn informative data representations from unlabeled data and has been successfully applied to image classification tasks.

Currently, many studies demonstrate the effectiveness of self-supervised learning approaches in various medical image tasks, such as medical image classification, medical image detection, and medical image segmentation. For the histopathological image, some self-supervision learning methods are also proposed, but the images of important nuances in histopathology are difficult to distinguish, and higher-level semantic information cannot be extracted by a self-supervision algorithm. There is thus currently still a lack of effective self-supervised learning methods to extract visual representations from histopathological images and to accomplish the relevant tasks.

Disclosure of Invention

The invention aims to solve the technical problems of complex convolution operation and sub-attention mechanisms in main stream algorithms Convolutional neural network and Vision Transformer at present, and the problems of complex diagnosis system and the like caused by long training time of the algorithms.

The present invention has been achieved in such a way that,

a method for constructing a lung tissue pathology image processing model comprises the following steps:

acquiring a data set of a lung tissue pathological image, preprocessing the data set, and dividing the data set into a training data set and a testing data set according to a proportion;

establishing an upstream task, establishing a mask self-encoder upstream model in the upstream task, generating a new sample after proportional fusion and addition of each image in a training data set, training the new sample, calculating a loss value between an image sample reconstructed by the mask self-encoder upstream model and an original image sample, and increasing training rounds to minimize the loss value to obtain a trained mask self-encoder upstream model;

and establishing a downstream task, utilizing a mask self-encoder upstream model trained by the upstream task, and evaluating the performance of the mask self-encoder upstream model by adopting a test data set.

Further, the mask self-encoder upstream model comprises a mask mixing layer, an encoder layer and a decoder layer, wherein the mask mixing layer generates a new sample image after fusing and adding each image in a training data set in proportion, divides the new sample image into a mask cutting block and a non-mask cutting block in proportion, inputs the non-mask cutting block into the encoder layer, outputs a linear output block, and adds position codes to the linear output block and the mask cutting block and inputs the linear output block and the mask cutting block into the decoder layer;

the encoder layer encodes the non-mask cut blocks output by the mask mixing layer through linear projection to obtain linear output blocks, adds position codes to the linear output blocks, and sorts the mask cut blocks and the linear output blocks to obtain a two-dimensional list;

the decoder layer recovers and reconstructs all the blocks input by the encoder layer by learning the characteristics of the non-mask cutting blocks according to the non-mask cutting blocks output by the mask mixing layer and the two-dimensional list output by the encoder layer to obtain a reconstructed two-dimensional list, reconstructs the reconstructed two-dimensional list to obtain a decoded restored image, calculates a loss value between the restored image and an original image, increases training rounds, and continuously carries out the encoder layer and the decoder layer, so that the loss value finally reaches a fitting state.

Further, the proportionally dividing the new sample image into masked and unmasked tiles includes: the method comprises the steps of dividing an image into a plurality of square cut blocks, dividing the cut blocks into mask cut blocks and non-mask cut blocks according to the setting of a mask rate, performing masking operation, and sequentially extracting the non-mask cut blocks into a column vector.

Further, the ordering of the masked and unmasked tiles includes: and (3) sorting according to the sequence after dicing, and replacing the non-mask dicing with the linear output blocks to obtain a two-dimensional list.

Further, the fusing includes: each image within the training dataset is proportionally mixed with any other image.

Further, the masking includes:

the transducer encoder adopts a code layer trained by a mask from an upstream model of the encoder to output the cut linear projection as a feature vector, and adds position codes to the feature vector;

the learnable classifier is used for summarizing the feature vectors which are output by the transducer encoder and added with the position codes;

and the multi-head perception machine gathers and completes the classification of the images according to the characteristics output by the learnable classifier.

A model of lung tissue pathology image processing, the model comprising:

the data acquisition module is used for preprocessing a data set of a lung tissue pathological image and dividing the data set into a training data set and a test data set according to a proportion;

the mask self-encoder model is obtained by generating a new sample after proportional fusion and addition of each image in the training data set and training the new sample;

Further, the masking includes:

A method of processing a lung tissue pathology image, comprising:

dicing the input test dataset;

outputting the linear projection of the cut block as a feature vector, and adding position coding to the feature vector;

feature summarizing the feature vector added with the position codes;

and finishing the classification of the images according to the feature summary.

Compared with the prior art, the invention has the beneficial effects that:

the mask self-encoder model is adopted, the mask self-encoder can be prevented from only reading limited image information, the rapid processing and classification of a large amount of data are realized, the mask self-encoder model is easy to deploy in an auxiliary diagnosis system, and the classification of pathological pictures is realized.

The invention is suitable for the task of classifying histopathological images. And combining a mixup image enhancement technology and MAE self-supervision learning under the condition of a small amount of annotation data to fully mine advanced semantic information in the pathological image field. By mixing the images between two random samples, the overfitting is reduced and the generalization and robustness of the neural network are improved. A hybrid self-supervision visual characterization learning framework is constructed for the histopathological image, and the model is helped to deeply mine high-level semantic information of the pathological image. The method MixMAE has superiority in classification task, and in addition, the expansion experiment proves that the model has the capability of identifying other cancer pathological images.

Drawings

FIG. 1 is a flow chart of an upstream task in a method for constructing a model for processing pathological images of lung tissue according to an embodiment of the present invention;

fig. 2 is a flowchart of a downstream task in a method for constructing a lung tissue pathology image processing model, or a lung tissue pathology image processing method according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1 in combination with fig. 2, a method for constructing a lung tissue pathology image processing model includes:

acquiring a data set of a lung tissue pathological image, preprocessing the data set, and dividing the data set into a training data set and a testing data set according to a proportion; the data set here includes: available pulmonary histopathological data, and is mixed with pulmonary histopathological data in the public data set LC25000 to prepare a pulmonary pathology data set. The preprocessing refers to data preprocessing on the mixed data set, adjusting the image size to a uniform size, and performing data enhancement processing.

The mixed lung pathology data set comprises five types of data of lung squamous carcinoma, lung adenocarcinoma, invasive lung adenocarcinoma, micro-invasive lung adenocarcinoma and normal lung tissue, and the five types of data are adjusted to be uniform 224×224 pixel size. And performing data enhancement on the mixed data set, and performing mirror image overturning, rotation, scaling, height movement and width movement operation, so that the data volume in the data set is increased by five times to the original data volume, the data volume is expanded to five times, and the model is fully trained, so that the model is more fully learned.

Establishing an upstream task, establishing a mask self-encoder upstream model in the upstream task, generating a new sample after proportional fusion and addition of each image in a training data set, and training the new sample to obtain a trained mask self-encoder upstream model;

masking the processed training set pathology image data from an upstream model of the encoder, performing image mixing operation, masking according to a masking rate, inputting the non-masked cut blocks into the encoder, extracting features, inputting the non-masked cut blocks into a decoder together with all masking blocks, and finally generating a target image;

specifically, the mask self-encoder upstream model comprises a mask mixing layer, an encoder layer and a decoder layer, wherein the mask mixing layer generates a new sample image after fusing and adding each image in a training data set in proportion, the new sample image is divided into a mask cutting block and a non-mask cutting block according to proportion, the non-mask cutting block is input into the encoder layer, a linear output block is output, the linear output block and the mask block are added into a position code and input into the decoder layer, a reconstructed image block is output, a loss value between an original image and a reconstructed image is calculated, training rounds are increased, and the encoder layer and the decoder layer are continuously carried out, so that the loss value is finally made to be in a fitting state;

proportionally dividing the new sample image into masked and unmasked tiles includes: the method comprises the steps of dividing an image into a plurality of square cut blocks, dividing the cut blocks into mask cut blocks and non-mask cut blocks according to the setting of a mask rate, performing masking operation, and sequentially extracting the non-mask cut blocks into a column vector.

The encoder layer encodes the non-mask cut blocks output by the mask mixing layer through linear projection to obtain linear output blocks, adds position codes to the linear output blocks, and sorts the mask cut blocks and the linear output blocks to obtain a two-dimensional list; ordering the masked and unmasked tiles includes: and (3) sorting according to the sequence after dicing, and replacing the non-mask dicing with the linear output blocks to obtain a two-dimensional list.

And the decoder layer is used for recovering and reconstructing all the blocks input by the encoder layer by learning the characteristics of the non-mask cutting blocks according to the non-mask cutting blocks output by the mask mixing layer and the two-dimensional list output by the encoder layer to obtain a reconstructed two-dimensional list, reconstructing the reconstructed two-dimensional list to obtain a decoded restored image, calculating a loss value between the restored image and the original image, increasing training rounds, and continuously performing the two steps of the encoder layer and the decoder layer to minimize the loss value.

In the downstream task, the model after learning in the upstream task is stored and loaded into the ViT model, and the classification test is performed according to the image information learned in the upstream task, and the result is evaluated.

A mask self-encoder downstream model, comprising:

In an embodiment, a large amount of local information exists in the pathological data, and in order to enable the model to fully capture the local information, the input pathological image is subjected to image mixing operation, so that the learning efficiency of the model is improved through limited data;

step S1: and the mask carries out pixel-level image mixing on each picture of the training set and any other picture in the training set by using an upstream model of the encoder, wherein the mixing proportion is a fixed value. After the image mixing operation, the number of images of the training set is not changed;

step S2: dividing all 3×224×224 images after mixing into 14×14 tiles of size 16×16, then masking the 14×14 tiles according to the mask ratio, the masked tiles will become grayscale image tiles as the final input of the model;

step S3: generating a column vector according to an initial sequence from a visible block, namely a non-mask block, in the final input of the model, and inputting the column vector into an encoder to obtain an encoded block containing image information as an output of the encoder;

step S4: the encoded blocks of step S3 and the mask blocks of step S2 are sequentially arranged into a column vector as input to the decoder. The decoder reconstructs images of the coding blocks and the mask blocks according to the information contained in the coding blocks and takes the reconstructed images as the output of the decoder;

step S5: the steps S3 and S4 are repeated continuously, so that the loss between the output of the decoder and the original image is reduced, and the model training effect is achieved.

In this embodiment, the performance of the upstream model is evaluated by the downstream task: the performance of the upstream model is evaluated using a training loss indicator. The model performance is evaluated by using four evaluation indexes of accuracy, precision, specific value and sensitivity.

The embodiment of the invention provides a lung tissue pathology image processing model established in the mode, which comprises the following steps:

The mask self-encoder upstream model comprises a mask mixing layer, an encoder layer and a decoder layer, wherein the mask mixing layer generates a new sample image after fusing and adding each image in a training data set in proportion, the new sample image is divided into a mask cutting block and a non-mask cutting block according to proportion, the non-mask cutting block is input into the encoder layer and output to obtain a linear output block, the linear output block and the mask block are added into a position code and input into the decoder layer and output to obtain a reconstructed image block, a loss value between an original image and a reconstructed image is calculated, training rounds are increased, and two steps of the encoder layer and the decoder layer are continuously carried out, so that the loss value is finally made into a fitting state;

the encoder layer uses the Vision Transformer architecture, but only works for non-masked tiles. Encoding the picture by linear projection to obtain a linear output block, adding position encoding to the linear output block, and sequencing the mask block and the linear output block to obtain a two-dimensional list;

the decoder layer, using a transducer architecture, inputs a set of entire picture slices, including masked slices and non-masked slices. According to the two-dimensional lists output by the non-mask cutting blocks and the encoder layers output by the mask mixing layer, all the cutting blocks input by the encoder layers are restored and rebuilt through learning the characteristics of the non-mask cutting blocks to obtain a rebuilt two-dimensional list, the rebuilt two-dimensional list is rebuilt to obtain a decoded restored image, a loss value between the restored image and an original image is calculated, training rounds are added, and the two steps of the encoder layers and the decoder layers are continuously carried out, so that the loss value is minimum.

The mask is from the downstream model of the encoder, and the mask is a Vision Transformer architecture from the downstream model of the encoder; upstream pre-training models are applied downstream through upstream pre-training to perform classification tasks. Comprising the following steps:

The embodiment of the invention adopts a mask from a model at the downstream of the encoder to classify images, and comprises the following steps: dicing the input test dataset;

feature summarizing the feature vector added with the position codes;

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims

1. A method for constructing a lung tissue pathology image processing model, comprising the steps of:

establishing a downstream task, utilizing a mask self-encoder upstream model trained by the upstream task, and evaluating the performance of the mask self-encoder upstream model by adopting a test data set;

the mask self-encoder upstream model comprises a mask mixing layer, an encoder layer and a decoder layer, wherein the mask mixing layer generates a new sample image after fusing and adding each image in a training data set in proportion, the new sample image is divided into a mask cutting block and a non-mask cutting block in proportion, the non-mask cutting block is input to the encoder layer, a linear output block is obtained by outputting, and the linear output block and the mask block are added into a position code and are input to the decoder layer;

2. The method of constructing a lung histopathological image processing model according to claim 1, wherein the proportionally dividing the new sample image into masked and unmasked tiles comprises: the method comprises the steps of dividing an image into a plurality of square cut blocks, dividing the cut blocks into mask cut blocks and non-mask cut blocks according to the setting of a mask rate, performing masking operation, and sequentially extracting the non-mask cut blocks into a column vector.

3. The method of constructing a lung tissue pathology image processing model according to claim 1, wherein said ordering of masked and unmasked tiles comprises: and (3) sorting according to the sequence after dicing, and replacing the non-mask dicing with the linear output blocks to obtain a two-dimensional list.

4. The method of constructing a model of lung tissue pathology image processing according to claim 1, wherein said fusing comprises: each image within the training dataset is proportionally mixed with any other image.

5. A method of constructing a model of a lung tissue pathology image processing according to claim 1, wherein said masking from the model downstream of the encoder comprises:

6. A model for processing a pathological image of lung tissue, the model comprising:

7. The pulmonary histopathological image processing model of claim 6, wherein the mask is derived from an encoder downstream model comprising: