CN115170857A

CN115170857A - Pancreatic cancer image identification method based on federal transfer learning

Info

Publication number: CN115170857A
Application number: CN202210517997.2A
Authority: CN
Inventors: 李小冬; 余正生; 宫兆喆
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-05-12
Filing date: 2022-05-12
Publication date: 2022-10-11
Also published as: GB2620233A; GB202305577D0; GB2620233A8

Abstract

The invention provides a pancreatic cancer image identification method based on federal transfer learning, which comprises the following steps: s1, acquiring a plurality of pathological images of pancreatic cancer to obtain marked pathological images of pancreatic cancer; s2, constructing a tissue classification model by using the labeled pathological images of the pancreatic cancer based on federal transfer learning; s3, segmenting the pancreatic cancer image to be identified to obtain a segmented pancreatic cancer image to be identified, and recording the position of the segmented pancreatic cancer image to be identified in the preprocessed pancreatic cancer image to be identified; s4, inputting the pancreatic cancer image to be identified and segmented into a tissue classification model to obtain a plurality of tissue classification results, and establishing a corresponding table of various tissues and positions according to the plurality of tissue classification results; and S5, acquiring positions and numbers of various tissues corresponding to the tumors recorded in the position corresponding table for identification. The invention establishes a classification model of various pancreatic tissues based on federal migration learning, and obtains a lesion part according to the classification model.

Description

Pancreatic cancer image identification method based on federal transfer learning

Technical Field

The invention relates to the technical field of pancreatic cancer image identification, in particular to a pancreatic cancer image identification method based on federal transfer learning.

Background

Pancreatic cancer is a digestive system tumor with high malignancy, and is one of the most malignant tumors with low early diagnosis rate and extremely poor prognosis. The surgical pathological examination is the gold standard for pancreatic cancer diagnosis, but the pancreatectomy has a high risk, and the current clinical main pathological diagnosis mode for pancreatic cancer is Endoscopic Ultrasound-guided Fine needle biopsy (EUS-FNA) with small wound, and the pathological diagnosis of pancreatic cancer is carried out by sampling pathological sections of pancreas, wherein the sensitivity and specificity of the pathological sections are 85% -95% and 95% -98%. Rapid on-site evaluation (ROSE) is an important factor affecting the sensitivity of EUS-FNA pancreatic cancer diagnosis. ROSE is the on-site evaluation of the rapidly stained section obtained by EUS-FNA sampling by a pathologist, and the effectiveness and the sufficiency of the tissue section are judged in real time. The professional pathologist needs to spend a lot of time observing pathological sections with huge sizes, and the type and grade of the tumor are artificially diagnosed according to professional knowledge. Nowadays, the production of pathological sections is gradually automated, and a large number of pathological sections are stored into digital images, which lays a data foundation for the development of computer-aided diagnosis technology. The pathological image tissue segmentation is the basis of subsequent operations such as identification, judgment, quantitative analysis and the like, which is the key first step, and the segmentation effect directly influences the quality of pathological image identification, so that the accurate automatic tissue segmentation technology is the key premise of the accuracy of subsequent computer-aided diagnosis. The difficulty of the multi-class tissue automatic segmentation algorithm of the pathological section is that the size of a full-scanning pathological image is too large, and the image contains various tissues of different types, so that the automatic classification and segmentation of various tissues in the pancreatic cancer full-scanning pathological image is challenging. The Chinese invention patent application with the application number of CN201110063144.8 provides a method for processing digital images and classifying modes, which is applied to the computer-aided diagnosis of pancreatic cancer endoscopic ultrasound. The method is realized by extracting the textural features and the classifier of the endoscope ultrasonic image, various objective and quantitative diagnosis indexes are created, and the endoscope ultrasonic image is correctly described and explained, so that the accuracy of early diagnosis of the pancreatic cancer endoscope ultrasonic image is improved. However, the method aims at the ultrasonic image to carry out image processing and mode classification, and has limited image precision and poor accuracy. And aiming at the pathological images of the pancreatic cells rapidly stained by EUS-FNA, the research of the pathological image classification of the pancreatic cancer based on the federal migration learning is in an early stage of onset. The main difficulty is that the marked high-quality data is deficient and the high-resolution pathological image contains a large noise area, so that the classification performance of the model is influenced.

Disclosure of Invention

The invention solves the problems that the conventional pathological pancreatic cancer image classification based on federal transfer learning lacks tissue segmentation and lacks lesion position positioning feedback, provides a pancreatic cancer image identification method based on federal transfer learning, establishes a classification model of various pancreatic tissues based on federal transfer learning, and acquires lesion positions according to the classification model.

In order to realize the purpose, the following technical scheme is provided:

a pancreatic cancer image identification method based on federal transfer learning comprises the following steps:

s1, acquiring a plurality of pathological images of pancreatic cancer, preprocessing the pathological images, and labeling various tissues of the pathological images of pancreatic cancer to obtain labeled pathological images of pancreatic cancer; the various types of tissues include fat, small intestine, lymph, muscle and tumor;

s2, constructing a tissue classification model by using the labeled pathological images of the pancreatic cancer based on federal transfer learning;

s3, preprocessing the pancreatic cancer image to be recognized in the same way as the preprocessing of S1, segmenting the preprocessed pancreatic cancer image to be recognized to obtain a plurality of segmented pancreatic cancer images to be recognized in the same size, and recording the position of the segmented pancreatic cancer image to be recognized in the preprocessed pancreatic cancer image to be recognized;

s4, inputting the pancreatic cancer image to be identified and segmented into a tissue classification model to obtain a plurality of tissue classification results, and establishing a corresponding table of various tissues and positions according to the plurality of tissue classification results;

and S5, acquiring positions and the number of the various tissues corresponding to the tumors recorded in the position corresponding table, judging whether the number is greater than or equal to 1, if so, outputting a pancreatic cancer image to be identified as a diseased image, framing the position corresponding to the tumor on the pancreatic cancer image to be identified, and if not, outputting the pancreatic cancer image to be identified as a normal image.

The method and the device construct a tissue classification model based on federal transfer learning to identify the pancreatic cancer image to be identified, segment and position the image of the pancreatic cancer image to be identified, and input the segmented pancreatic cancer image to be identified into the tissue classification model one by one to obtain a plurality of tissue classification results, so that various tissues of the pancreatic cancer image can be identified, and the tumor category is positioned when the results are output, thereby providing auxiliary support for professional diagnosticians and shortening the diagnosis time.

Preferably, the preprocessing of the pathological image of pancreatic cancer in S1 includes normalization processing, and gray-scale conversion processing. The normalization processing of the invention accelerates the convergence speed during network training, and the normalization is the processing of realizing the centralization of the image through the mean value and increasing the generalization capability of the model. According to the method, the pathological images of the pancreatic cancer with low resolution are removed after gray scale conversion processing, and noise interference such as blank background areas and red blood cells of the pathological images is eliminated, so that the tissue classification model can pay more attention to the features such as the form, arrangement mode and heterogeneity of the pancreatic cells, and the interpretability and classification precision of the tissue classification model are improved.

Preferably, the S2 specifically includes the following steps:

s201, extracting the marked pathological image of the pancreatic cancer in the form of image blocks to obtain 5 tissue marked type image blocks, wherein all the image blocks form a classification data set;

s202, dividing the classification data set into a training set and a test set;

s203, constructing an initial classification model based on the Federal transfer learning, and respectively training and testing the initial classification model by using a training set and a testing set to obtain a trained tissue classification model.

Preferably, the step S203 further comprises a training set amplification step: and performing 90-degree selection, 180-degree rotation, horizontal overturning and vertical overturning on all image blocks of the training set to obtain amplified image blocks, and adding the amplified image blocks into the training set. Because the sample size of the medical image is small, the invention also carries out a series of data amplification on the training image so as to relieve the overfitting problem. According to the invention, through carrying out 90-degree selection, 180-degree rotation, horizontal turning and vertical turning on the image blocks, the training set is expanded to five times of the original training set, and the training precision is improved.

Preferably, the S3 specifically includes the following steps:

s301, preprocessing the pancreatic cancer image to be identified in the same way as the preprocessing in S1;

s302, segmenting the preprocessed pancreatic cancer image to be identified to obtain a plurality of segmented pancreatic cancer images to be identified with the same size, and equally segmenting the pancreatic cancer images to be identified into n rows and m columns of segmented pancreatic cancer images to be identified according to the preprocessed pancreatic cancer images to be identified;

s303, recording the position of the to-be-identified segmented pancreatic cancer image in the preprocessed to-be-identified pancreatic cancer image as (n, m), wherein (n, m) is the n-th row and m-th column of the to-be-identified pancreatic cancer image in the preprocessed to-be-identified pancreatic cancer image.

Preferably, the S4 specifically includes the following steps:

s401, inputting the to-be-identified segmented pancreatic cancer image into a tissue classification model to obtain a plurality of tissue classification results, inputting each to-be-identified segmented pancreatic cancer image into the tissue classification model to obtain the probability that each to-be-identified segmented pancreatic cancer image belongs to each tissue category, judging whether the to-be-identified segmented pancreatic cancer image which belongs to a tumor and has the probability larger than a tumor set value exists, if so, classifying the to-be-identified segmented pancreatic cancer image into the tumor category, and taking the position of the to-be-identified pancreatic cancer image after preprocessing of the to-be-identified segmented pancreatic cancer image as the corresponding position of the tumor category; if not, performing S402;

s402, classifying the segmented pancreatic cancer image to be identified into a tissue type with the maximum probability;

and S403, after all the segmented pancreatic cancer images to be identified are classified, establishing various tissue and position corresponding tables, wherein the table heads of the various tissue and position corresponding tables are provided with tissue types, quantity and positions.

Preferably, the tumor setting value ranges from 50% to 60%.

The invention has the beneficial effects that: the method and the device construct a tissue classification model based on federal transfer learning to identify the pancreatic cancer image to be identified, segment and position the image of the pancreatic cancer image to be identified, and input the segmented pancreatic cancer image to be identified into the tissue classification model one by one to obtain a plurality of tissue classification results, so that various tissues of the pancreatic cancer image can be identified, and the tumor category is positioned when the results are output, thereby providing auxiliary support for professional diagnosticians and shortening the diagnosis time.

Drawings

FIG. 1 is a flow chart of a method of an embodiment;

FIG. 2 is a flow diagram of tissue classification model construction of an embodiment.

Detailed Description

The embodiment is as follows:

the embodiment provides a pancreatic cancer image identification method based on federal transfer learning, and with reference to fig. 1, the method includes the following steps:

s1, acquiring a plurality of pathological images of pancreatic cancer, preprocessing the pathological images, and labeling various tissues of the pathological images of pancreatic cancer to obtain labeled pathological images of pancreatic cancer; various tissues including fat, small intestine, lymph, muscle and tumor; the preprocessing of the pathological image of the pancreatic cancer in the S1 comprises normalization processing, standardization processing and gray scale transformation processing. The normalization processing of the invention accelerates the convergence speed during network training, and the normalization is to realize centralized processing of the image through the mean value, thereby increasing the generalization capability of the model. According to the method, the pancreatic cancer pathological images with low resolution are removed after gray level conversion processing, noise interferences such as blank background areas and red blood cells of the pathological images are eliminated, so that the tissue classification model can pay more attention to the features such as the form, arrangement mode and heterogeneity of pancreatic cells, and the interpretability and classification precision of the tissue classification model are improved.

25 pathological images of the pancreatic cancer are selected to carry out strict 5-class tissue labeling to obtain digital pathological labeling images, and labeled pathological images of the pancreatic cancer are formed.

S2, constructing a tissue classification model by using the labeled pathological images of the pancreatic cancer based on federal transfer learning; referring to fig. 2, S2 specifically includes the following steps:

s201, extracting the marked pathological image of the pancreatic cancer in the form of image blocks to obtain 5 tissue marked type image blocks, wherein all the image blocks form a classification data set; an image block sampling technique is employed to train the model with smaller local image blocks, thereby preserving basic local detail. The size of the image block is set to be 224 multiplied by 224 pixels and is used for training a 5-class tissue classifier;

s202, dividing the classification data set into a training set and a test set;

s203, constructing an initial classification model based on the federal transfer learning, and respectively training and testing the initial classification model by utilizing a training set and a testing set to obtain a trained tissue classification model.

S203 is also provided with a training set amplification step: and performing 90-degree selection, 180-degree rotation, horizontal overturning and vertical overturning on all image blocks of the training set to obtain amplified image blocks, and adding the amplified image blocks into the training set. Because the sample size of the medical image is small, the invention also carries out a series of data amplification on the training image to relieve the overfitting problem. According to the invention, through carrying out 90-degree selection, 180-degree rotation, horizontal turning and vertical turning on the image blocks, the training set is expanded to five times of the original training set, and the training precision is improved.

The initial classification model of the invention is an improved model based on a Federal transfer learning algorithm model, and comprises a hardware perception layer, a data processing layer, a business logic layer, a cross-system cascade network layer and a data storage unit, wherein:

the hardware perception layer is used for providing hardware support for video monitoring;

the data processing layer processes the input data through a deep neural network to obtain support data for decision making of the business logic layer;

the business logic layer is responsible for maintaining a model database and comparing image data; maintenance of the sub-models, and a function of communicating with the sharing models of the internet layer and the cloud server;

the cross-system cascade network layer is used for acquiring data transmitted by the service logic layer, performing migration learning according to the acquired model parameters, and transmitting the data after the migration learning to the data storage unit;

the data storage unit is used for storing data;

the business logic layer comprises an image data matching module, a database maintenance module and a model training module;

the model training module comprises a submodel and an encryption parameter submodule; the encryption parameter sub-module comprises a parameter updating element and an encryption algorithm element; the data storage unit comprises a sharing model, a cloud server, a parameter decoding module and a parameter aggregation module;

the model training process comprises the following steps: the method comprises the steps of completing updating of parameters under a submodel according to collected image data, encrypting the parameters through an encryption algorithm element after the submodel generates parameter updating, then updating and transmitting the encrypted parameters to a cloud server, firstly updating and decrypting the parameters by the cloud server, then judging whether aggregation is needed or updating of a sharing model is directly carried out according to the number of the parameters, setting a stale threshold S, pausing updating of the submodel when the updating frequency of a certain submodel exceeds an average level S, carrying out parameter aggregation and updating of the sharing model in the sharing model, and finally feeding the updating back to the submodel to complete one training.

The method has the advantages that the number of training samples which are acquired at the beginning is small, a large number of pancreatic cancer images are required to be acquired at the early stage, and the images stored in one hospital are far from enough, so that the image samples of a plurality of hospitals are required to be subjected to transfer learning.

In order to measure the performance of the model, 5000 pieces of image blocks extracted from the labeled pathological image of pancreatic cancer are reserved as test sets in the embodiment, the rest are used as training data sets, and 25000 pieces of image blocks obtained by data amplification are used as test sets. After training on the training dataset, the present embodiment assesses the accuracy of the tissue classification in the test dataset. All images in the set are 224 x 224 pixels in size and are input into the model in order for training and testing.

S3, preprocessing the pancreatic cancer image to be identified in a way identical to that in S1, segmenting the preprocessed pancreatic cancer image to be identified to obtain a plurality of segmented pancreatic cancer images to be identified in the same size, and recording the position of the segmented pancreatic cancer image to be identified in the preprocessed pancreatic cancer image to be identified; s3 specifically comprises the following steps:

s301, preprocessing the pancreatic cancer image to be identified in the same way as the preprocessing in the S1;

s303, recording a position of the to-be-identified segmented pancreatic cancer image in the preprocessed to-be-identified pancreatic cancer image as (n, m), where (n, m) is an nth row and an m column of the to-be-identified segmented pancreatic cancer image in the preprocessed to-be-identified pancreatic cancer image, and in this embodiment, the preprocessed to-be-identified pancreatic cancer image is segmented into 16 to-be-identified segmented pancreatic cancer images including 4 rows and 4 columns.

S4, inputting the pancreatic cancer image to be identified and segmented into a tissue classification model to obtain a plurality of tissue classification results, and establishing a corresponding table of various tissues and positions according to the plurality of tissue classification results; s4 specifically comprises the following steps:

s401, inputting the to-be-identified segmented pancreatic cancer image into a tissue classification model to obtain a plurality of tissue classification results, inputting each to-be-identified segmented pancreatic cancer image into the tissue classification model to obtain the probability that each to-be-identified segmented pancreatic cancer image belongs to each tissue type, and judging whether the to-be-identified segmented pancreatic cancer image which belongs to a tumor and has the probability greater than a tumor set value exists, wherein the value range of the tumor set value is 50% -60%, and the value selected in the embodiment is 50%. If so, classifying the to-be-identified segmented pancreatic cancer image into a tumor category, and taking the position of the to-be-identified segmented pancreatic cancer image after preprocessing as the position corresponding to the tumor category; if not, performing S402;

s403, after all the segmented pancreatic cancer images to be identified are classified, according to the establishment of various tissue and position corresponding tables, the table head of each tissue and position corresponding table is provided with the tissue type, the number and the position, and the table I is referred to:

table-various organization and position corresponding table

As can be seen from the table, in the case of the present embodiment, the number of tumor types is greater than 1, the pancreatic cancer image to be identified is output as a diseased image, and the corresponding positions (2, 3), (3, 2) of the tumor are framed on the pancreatic cancer image to be identified.

Claims

1. A pancreatic cancer image identification method based on federal transfer learning is characterized by comprising the following steps:

s3, preprocessing the pancreatic cancer image to be identified in a way identical to that in S1, segmenting the preprocessed pancreatic cancer image to be identified to obtain a plurality of segmented pancreatic cancer images to be identified in the same size, and recording the position of the segmented pancreatic cancer image to be identified in the preprocessed pancreatic cancer image to be identified;

and S5, acquiring positions and the number of the various tissues corresponding to the tumors recorded in the position corresponding table, judging whether the number is greater than or equal to 1, if so, outputting the pancreatic cancer image to be identified as a diseased image, framing the position corresponding to the tumor on the pancreatic cancer image to be identified, and if not, outputting the pancreatic cancer image to be identified as a normal image.

2. The method for pancreatic cancer image recognition based on federal migration learning as claimed in claim 1, wherein the preprocessing of pathological images of pancreatic cancer in S1 includes normalization, normalization and gray-scale transformation.

3. The method for pancreatic cancer image recognition based on federal migration learning as claimed in claim 1, wherein said S2 specifically comprises the following steps:

s202, dividing the classification data set into a training set and a test set;

4. The method for pancreatic cancer image recognition based on federal migration learning as claimed in claim 3, wherein said step S203 further comprises a training set amplification step: and performing 90-degree selection, 180-degree rotation, horizontal overturning and vertical overturning on all image blocks of the training set to obtain amplified image blocks, and adding the amplified image blocks into the training set.

5. The method for pancreatic cancer image recognition based on federal migration learning as claimed in claim 1, wherein said S3 specifically comprises the following steps:

6. The method for pancreatic cancer image recognition based on federal migration learning as claimed in claim 1, wherein said S4 specifically comprises the following steps:

7. The method for pancreatic cancer image recognition based on federal migratory learning as claimed in claim 1, wherein the tumor setting value ranges from 50% to 60%.