CN112862756A

CN112862756A - Method for identifying pathological change type and gene mutation in thyroid tumor pathological image

Info

Publication number: CN112862756A
Application number: CN202110034353.3A
Authority: CN
Inventors: 梁智勇; 陈浩; 张卉; 胡羽; 吴焕文; 林黄靖
Original assignee: Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Current assignee: Peking Union Medical College Hospital Chinese Academy of Medical Sciences
Priority date: 2021-01-11
Filing date: 2021-01-11
Publication date: 2021-05-28
Anticipated expiration: 2041-01-11
Also published as: CN112862756B

Abstract

The invention discloses a method for identifying a pathological change region in a thyroid follicular tumor histopathological image, and predicting gene mutation. The method is an automatic auxiliary diagnosis technology based on a deep learning method, and is characterized in that a pathological change region in a thyroid follicular tumor pathological tissue section image is automatically positioned by utilizing big data and a deep convolutional neural network algorithm, the pathological histology type and the gene mutation type of the pathological change region are automatically identified, and a case carrying RAS and other driving gene mutations simultaneously is identified according to the pathological tissue image, so that the histological classification of the thyroid follicular tumor and the prediction of related gene mutations are realized. The method provided by the invention provides information for clinicians, assists pathological diagnosis and makes clinical decisions in an auxiliary way, and promotes the development of digital pathology and accurate medical treatment.

Description

Method for identifying pathological change type and gene mutation in thyroid tumor pathological image

Technical Field

The invention relates to the technical field of image processing, in particular to a method for identifying a pathological change region in a thyroid follicular tumor pathological image and classifying the type of gene mutation of the pathological change region.

Background

Thyroid hormones can be produced by the thyroid gland, for example, thyroxine and triiodothyronine are the two active thyroid hormones produced by the thyroid gland. They play a crucial role in controlling human metabolism (including protein production, thermoregulation and energy production and regulation, etc.).

Thyroid disease is the second largest disease in the endocrine field, and thyroid cancer is also a malignant tumor common in the endocrine field and is divided into papillary thyroid cancer, follicular thyroid cancer, poorly differentiated thyroid cancer, and anaplastic thyroid cancer.

Thyroid follicular tumors are a group of nodular thyroid tumors with follicular growth pattern, classified as benign, borderline and malignant, and the morphological characteristics of these three follicular thyroid tumors are very similar.

Of these, follicular thyroid cancer is a malignant follicular thyroid tumor that originates in follicular cells, which are usually encapsulated and exhibit an aggressive growth pattern, and tumor cells that do not have the nuclear characteristics of papillary thyroid carcinomas.

Thyroid follicular adenoma is a benign follicular disease of the thyroid gland, with encapsulated growth of the tumor, non-invasive, and tumor cells that do not have the nuclear characteristics of papillary thyroid carcinoma.

Thyroid tumors of uncertain malignant potential are a group of borderline thyroid follicular tumors with suspicious envelope/vascular infiltrates, or nuclear characteristics of papillary carcinomas, while not meeting the malignant criteria.

There are many methods commonly used for clinical diagnosis of thyroid diseases, such as clinical manifestations, imaging examinations, or cytology examinations. However, since the morphological characteristics of the three thyroid follicular tumors are very similar, it is difficult to identify them by imaging examination or fine needle puncture cytology examination. Therefore, in most cases, suspected follicular thyroid tumors are treated by surgical resection, and the pathological tissue sections of the specimens are observed and distinguished from each other by surgical resection.

However, since the evaluation of the pathomorphology is highly subjective and the judgment of the pathological doctors on the infiltration of the envelope and the blood vessels is relatively poor in consistency, the pathological diagnosis of the thyroid follicular tumor is very challenging. With the development of molecular pathology technology, many studies try to find a discrimination point between the three through a molecular marker, but a molecular index which can be used for distinguishing the three is not found yet. However, the research on the thyroid follicular tumor molecular markers suggests that the prognosis of patients with thyroid follicular tumor with RAS and other driver gene mutations is relatively poor (e.g., benign tumor is easy to undergo malignant transformation, or the recurrence risk of malignant tumor is increased) compared with patients with RAS gene mutation alone, thereby suggesting that clinicians make a more positive and effective follow-up scheme for the group of cases.

Meanwhile, the machine learning method has been widely applied to medical image recognition and auxiliary diagnosis, and helps doctors diagnose various diseases more accurately, in a time-saving and labor-saving manner, thus becoming a novel computer-aided diagnosis method. The supervised learning and the deep learning are important machine learning methods, and the positions of focuses and the types of lesions in the images are judged by continuously learning disease labels of the images.

Convolutional neural networks are a deep learning method that is now widely used. The convolutional neural network can be used for identifying primary tumor foci and metastatic tumor foci in a digital pathological section image, takes a two-dimensional or three-dimensional image as input, has a multilayer structure comprising a pooling layer, a convolutional layer, a Relu layer, a full-link layer and the like, and has the characteristics of local perception, weight sharing and multiple convolutional kernels, so that the calculated amount of a large number of parameters in a neural network model is greatly reduced.

The existing convolutional neural network method can be used for the imaging diagnosis of thyroid follicular adenoma and thyroid carcinoma, but the existing methods have the problems of low accuracy, limited diagnostic capability, incapability of distinguishing the boundary follicular tumor and the like. And because of the lack of a large amount of clinical data, a convolutional neural network method for diagnosing thyroid follicular tumor histopathology images is not available at present.

Aiming at the problem, the invention provides a method for identifying a pathological change region in a thyroid follicular tumor pathological image, classifies the gene mutation type of the pathological change region, identifies a case carrying RAS and other driving gene mutations simultaneously according to the pathological tissue image, provides information for a clinician, and makes clinical intervention measures as soon as possible.

Disclosure of Invention

The invention aims to provide an automatic auxiliary diagnosis technology based on a deep learning method aiming at the deficiency and the deficiency of the prior art, which utilizes big data and a deep convolutional neural network algorithm to automatically position a pathological change region in a thyroid follicular tumor pathological tissue image and automatically identify the pathological histology type and the gene mutation type of the pathological change region, thereby realizing the histological classification of the thyroid follicular tumor and the prediction of related gene mutation, and assisting the pathological diagnosis and the formulation of auxiliary clinical decision.

The invention provides a method for identifying the type of pathological changes and gene mutation in a thyroid follicular tumor histopathological image, which is executed by computer equipment and comprises the following steps:

1) carrying out data preprocessing and data normalization on the thyroid follicular tumor histopathology image to obtain a processed histopathology image;

2) performing 1 st classification task detection on the processed histopathology image: detecting abnormal areas, if the tissue pathological images are normal, setting the data labels as normal, ending the method, and determining the identification result of the tissue pathological images as negative; otherwise, setting the data label as abnormal, and continuing the next step if the histopathological image is positive;

3) and (3) detecting the 2 nd classification task and the 3 rd classification task of the positive histopathology images: the 2 nd classification task is to judge the type of an abnormal area, and correspondingly set the judged abnormal result as a thyroid follicular adenoma FA label, a follicular thyroid carcinoma FTC label or a thyroid tumor TT-UMP label with uncertain malignant potential; the 3 rd classification task is to judge the gene type, the judged result is correspondingly set as a gene mutation type label or a wild type label, if the gene type is set as the wild type label, the method is ended, and the identification result of the histopathology image is a wild type; if the gene type is set as the gene mutation type tag, continuing to the next step;

4) detecting the 4 th classification task and the 5 th classification task of the histopathology images with the gene mutation types: the 4 th classification task is to judge the gene mutation type classification I, and the judged result is correspondingly set as a gene mutation type label carrying RAS or a gene mutation type label not carrying RAS; the 5 th classification task is used for judging a gene mutation type classification II, and the judged result is correspondingly set as a gene mutation type label carrying a non-RAS or a gene mutation type label not carrying the non-RAS, wherein when the detection result of the 4 th classification task is the gene mutation type label carrying the RAS and the detection result of the 5 th classification task is the gene mutation type label carrying the non-RAS, the 6 th classification task is carried out, and the gene mutation type is set as an RAS + gene mutation type; and when the detection result of the 4 th classification task is the gene mutation type label without the RAS or the detection result of the 5 th classification task is the gene mutation type label without the non-RAS, the gene mutation type is the non-RAS + gene mutation type.

Further, the data preprocessing comprises: using an adaptive threshold algorithm, retrieving a tissue region on the low resolution Level-5 of the pyramid-type digital cell image of the histopathological image, and locating the tissue region on the high resolution Level-0.

Further, the data normalization comprises: obtaining a micron per pixel mpp (micro per pixel) parameter of the histopathological image by reading additional information of the histopathological image, amplifying or reducing the histopathological image by a bilinear interpolation method, normalizing the micron per pixel mpp of the histopathological image set to be identified to be 0.5, wherein the number of target lines (columns) of the data image is as follows: target row (column) pixel count 0.5 original row (column) pixel count/micron mpp per pixel.

Further, the detection process of the 1 st classification task, the 2 nd classification task, the 3 rd classification task, the 4 th classification task and the 5 th classification task includes the following steps:

firstly, finding a tissue region in the histopathological image through preprocessing; secondly, dividing the whole organization area into a plurality of square image blocks by using a division window, calling a deep convolutional neural network detection module, and sequentially executing the classification task on the square image blocks one by one; then, mapping the detected abnormal region coordinates and the frame to the whole piece to obtain a global abnormal detection result; and finally, extracting features of the classification detection probability based on the whole situation, and obtaining a final result of the classification task by using an XGboost classifier.

Further, in order to expand the training data set, the detection process of the 1 st classification task, the 2 nd classification task, the 3 rd classification task, the 4 th classification task and the 5 th classification task further includes image inversion and image mirroring operations on the histopathology image, so as to generate a new training data set; the detection process of the 5 th classification task additionally comprises the steps of carrying out homologous packing and heterologous packing on the histopathology images to generate a multi-example training data set.

Further, the calling of the deep convolutional neural network detection module sequentially executes the classification task on the square image blocks one by one, and the method comprises the following steps:

firstly, a feature extraction network is constructed by the deep neural network, a large amount of image data and object class labels contained in the image are used for pre-training, abstract features of the image are summarized and extracted, and a high-dimensional feature tensor of the image is output; secondly, classifying the network according to the high-dimensional feature tensor, namely classifying the high-dimensional feature tensor corresponding to the position of the image block contained in the output of the area selection network by a global average pooling layer and a full-connection output layer, wherein the full-connection output layer uses a Softmax activation function to process the output, and the formula of the Softmax activation function is as follows:

wherein e is a natural logarithm, xi is the ith input of the Softmax activation function, and k is the total number of the inputs.

Further, the step of extracting features of the global-based classification detection probability and obtaining a final result of the classification task by using the XGBoost classifier comprises the following steps:

the 1 st classification task to the 5 th classification task are operated through the deep convolutional neural network, and each follicular thyroid tumor pathological image obtains corresponding prediction labels and prediction probabilities of image blocks with different numbers; the prediction probability is used for feature extraction of corresponding categories, a histogram of the prediction probability is averagely divided into n small intervals from 0 to 1 by the same interval parameters to serve as features, the features with the length being a preset value are used for classification of the XGboost model, and a chip-level result is obtained through a corresponding XGboost classifier.

Further, the prediction probability is used in feature extraction of corresponding categories, and for the type 1 task, the type 3 task, the type 4 task and the type 5 task, a slice level result is obtained by using the features of any category; for the class 2 task, it is necessary to use the features of any two of the classes to jointly produce a slice level result.

Likewise, the present invention also provides a computer device comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the method for identifying a type of lesion and a genetic mutation in a follicular thyroid tumor tissue pathology image according to any one of the preceding claims.

The invention also provides a computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which, when being executed by a processor, causes the processor to carry out the steps of the method for identifying a type of lesion and a genetic mutation in a pathologic image of a follicular thyroid tumor tissue according to any one of the preceding claims.

Compared with the prior art, the invention has the following advantages and beneficial effects:

for the first time, the invention judges the benign and malignant degree of the thyroid follicular tumor by classifying the pathological images of the thyroid follicular tumor tissues by using a machine learning method, assists the diagnosis of a pathologist, is beneficial to reducing the workload of the doctor and improves the accuracy of the diagnosis of the thyroid follicular tumor.

Secondly, the thyroid follicular tumor histopathology image is judged to judge whether the thyroid follicular tumor histopathology image has gene mutation or not and predict the type of the gene mutation, identify poor prognosis cases, assist clinicians in predicting patient prognosis and identify possible poor prognosis cases, so that medical resources are effectively distributed.

Thirdly, the method provided by the invention adopts a data enhancement mode, and the steps of homologous packaging and heterologous packaging are used for increasing the accuracy of the gene mutation type label of the image block, so that the method can obtain a section diagnosis suggestion according to the follicular thyroid tumor pathological image, and provide high-sensitivity and specific abnormal diagnosis suggestions and RAS + type diagnosis suggestions.

Fourthly, the method provided by the invention can reduce the negative effects brought by different film-making modes and scanning imaging parameters, and has high robustness for different film-making modes and scanning imaging parameters.

Drawings

The above and other objects, features and advantages of embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:

FIG. 1 shows a full flow diagram of the overall process flow of an embodiment of the present invention.

Fig. 2 shows a flowchart of tasks 1 to 4 in the overall process flow of an embodiment of the present invention.

FIG. 3 shows a flowchart of task 5 in the overall process flow of an embodiment of the invention.

FIG. 4 shows a diagram of pre-training and multi-instance learning in accordance with an embodiment of the present invention.

Detailed Description

The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given solely for the purpose of enabling those skilled in the art to better understand and to practice the invention, and are not intended to limit the scope of the invention in any way.

DenseNet is a network structure of a widely-used convolutional neural network, and is characterized in that the problem of gradient disappearance is relieved through dense connection, feature propagation is enhanced, and the number of parameters is reduced. The input of each layer in the neural network is the union of the outputs of all the previous layers, and the feature map learned by the layer is directly transmitted to all the next layers as the input.

DenseNet improves the efficiency of the network by reducing the amount of computation per layer and by reusing features, by letting the input of the l-th layer directly affect all the layers that follow, where the output and input relationships are

y_l＝F_l([x₀,x₁,...,x_l-1],{W_l}), (1)

Wherein the content of the first and second substances,_lindicates the current number of layers, y_lIs the output of that layer; [ x ] of₀,x₁,...,x_l-1]Is a feature map generated in the 0, 1., l-1 layers, merged in the dimensions of the channel; f_lRepresenting a nonlinear transformation, including a network form of BN, Relu, convolution of 3x3, and the like; w_lIs represented by F_lThe parameter (c) of (c).

The deep convolutional neural network described in a preferred embodiment of the present invention is a DenseNet121 neural network model. The invention uses 5 deep convolution neural networks and 5 XGboost classifiers to respectively finish 5 classification tasks in sequence:

the 1 st classification task is abnormal area detection, and sets a data tag as abnormal or normal.

The 2 nd classification task is abnormal region type classification, and data labels are set as follicular thyroid adenoma FA, follicular thyroid carcinoma FTC or thyroid tumor with uncertain malignant potential TT-UMP.

The 3 rd classification task is gene type classification, setting data tags as mutant or wild type.

The 4 th classification task is a gene mutation type classification I, setting data tags to RAS-carrying gene mutants or RAS-not-carrying gene mutants. Wherein the mutant carrying RAS is defined as mutant cases carrying at least mutation of RAS gene (KRAS, NRAS or HRAS), and the group of cases includes cases having only RAS mutation or cases having both RAS and other gene mutation; mutants not carrying RAS are defined as mutant cases without RAS gene (KRAS, NRAS or HRAS) mutations, i.e. carrying mutations in genes other than RAS.

The 5 th classification task is a gene mutation type classification II, setting data tags to a mutant type carrying non-RAS or a mutant type not carrying non-RAS. Wherein, the mutant carrying non-RAS is defined as a mutant case carrying other than RAS gene (KRAS, NRAS or HRAS), i.e., no RAS mutation but carrying other gene mutation; mutants that do not carry non-RAS are defined as cases with and only mutations in the RAS gene (KRAS, NRAS or HRAS).

Then, based on the results of the 4 th and 5 th classification tasks, the 6 th classification task is completed by using rules: gene mutation type class III, set data tags RAS + gene mutant or non-RAS + gene mutant. Wherein, RAS + gene mutation type is defined as that the gene mutation type contains RAS gene mutation and other gene mutation at the same time; the definition of a non-RAS + gene mutant is any type of gene mutation that does not meet the definition of a RAS + gene mutant.

The embodiment provides a method for identifying a pathological region in a follicular thyroid tumor pathological image based on a deep convolutional neural network. The specific flow of the method is shown in fig. 1, which is a full flow chart of the overall processing flow according to an embodiment of the present invention.

In a preferred embodiment of the present invention, the follicular thyroid tumor pathology image is first preprocessed and data normalized.

The minimum image unit that can be processed by the method for identifying a lesion region according to the present invention is an image block of 224 × 224 pixel pairs. The image blocks may be sampled from the full slice by a windowing cut, or may be sampled from the full slice according to a coordinate region specified by an operator.

A preferred embodiment of the present invention also requires image preprocessing and image data normalization of the follicular thyroid tumor pathology image prior to identification of the lesion region.

The image preprocessing uses an adaptive threshold algorithm to retrieve a tissue region on the low-resolution Level-5 of the pyramid-type digital cell image and locate the tissue region on the high-resolution Level-0.

Because the histopathological image blocks are sampled from different follicular thyroid tumor histopathological images, and the different digital images may be scanned and imaged by different scanners, the actual physical size represented by a single pixel of each pathological image may be different due to different hardware attributes and software parameter settings of the different scanners. Therefore, it is also necessary to perform data normalization processing on the input image data. The aim of the data normalization is to ensure that the images in the data set have similar physical dimensions as much as possible.

The data normalization processing of the invention comprises the following steps: obtaining a micron per pixel (mpp) parameter of the image by reading additional information of the image. The microns per pixel mpp represents the actual physical distance on the thyroid tissue section to which each pixel corresponds. The mpp per pixel is 1, which means that the actual horizontal or vertical distance corresponding to each pixel is 1 micron. By reading the micron per pixel mpp, the image in the data set is enlarged or reduced by bilinear interpolation, thereby realizing the normalization of the image data on the actual physical distance.

A preferred embodiment of the present invention chooses to normalize the micron per pixel mpp of the data image set to be identified to 0.5. Thus, the target row (column) pixel count of the data image is:

target row (column) pixel count 0.5 original row (column) pixel count/μm per pixel mpp

As can be seen from the full flow chart of the overall process flow of fig. 1, the follicular thyroid tumor histopathology image was first subjected to the 1 st classification task: detecting abnormal areas, namely setting the data labels to be abnormal or normal, namely judging whether the images are negative or positive, if the images are normal, ending the method, and judging whether the result of the pathological images is negative; if the image is abnormal, i.e. the result of the pathological image is positive, the next detection will be continued.

The detection of the 2 nd classification task and the 3 rd classification task is carried out on the positive pathological images. The 2 nd classification task is to judge the type of the abnormal region, and set a corresponding data label for the judged abnormal result according to the pathological characteristics, wherein the abnormal result is follicular thyroid adenoma FA or follicular thyroid carcinoma FTC or thyroid tumor TT-UMP with uncertain malignant potential.

The 3 rd classification task is to judge the gene type, and set corresponding data labels according to pathological features of the judgment result, wherein the data labels are gene mutation types or wild types. If the image is wild type, the method is ended, and the result of the pathological image is judged to be wild type; if the image is a gene mutation type, the next detection is continued.

The pathological images of the gene mutation type are detected by the 4 th classification task and the 5 th classification task. The 4 th classification task is to set a gene mutation type classification I, and set corresponding data labels according to pathological characteristics of the judgment result, wherein the data labels are gene mutation types carrying RAS or gene mutation types not carrying RAS. Wherein the mutant carrying RAS is defined as mutant cases carrying at least mutation of RAS gene (KRAS, NRAS or HRAS), and the group of cases includes cases having only RAS mutation or cases having both RAS and other gene mutation; mutants not carrying RAS are defined as mutant cases without RAS gene (KRAS, NRAS or HRAS) mutations, i.e. carrying mutations in genes other than RAS.

The 5 th classification task is gene mutation type classification II, and sets corresponding data labels according to pathological characteristics of the judgment result, wherein the data labels are non-RAS gene mutation types or non-RAS gene mutation types. Wherein, the mutant carrying non-RAS is defined as a mutant case carrying other than RAS gene (KRAS, NRAS or HRAS), i.e., no RAS mutation but carrying other gene mutation; mutants that do not carry non-RAS are defined as cases with and only mutations in the RAS gene (KRAS, NRAS or HRAS).

Then, based on the results of the above-described 4 th and 5 th classification tasks, the 6 th classification task is executed: and (3) classifying the gene mutation type III, setting a corresponding data tag according to pathological characteristics of the judgment result, and setting the RAS + gene mutation type when the result of the 4 th classification task is the RAS gene mutation type and the result of the 5 th classification task is the non-RAS gene mutation type, namely the gene mutation type contains the RAS gene mutation and simultaneously contains other gene mutations. Otherwise, the other cases are uniformly set as non-RAS + gene mutants.

The invention adopts cutting and blocking to process the follicular thyroid tumor pathological image. As shown in fig. 2 and 3: FIG. 2 illustrates a flow diagram of task 1 through task 4 in the overall process flow of one embodiment of the invention; FIG. 3 shows a flowchart of task 5 in the overall process flow of an embodiment of the invention. First, a tissue region in an image is found by preprocessing. Secondly, dividing the whole organization area into a plurality of square image blocks by using a division window, sequentially executing classification tasks on the square image block areas one by calling a deep convolutional neural network detection module, and mapping the detected abnormal area coordinates and frames to the whole piece to obtain a global abnormal detection result. And finally, based on the features extracted by the global classification detection probability, obtaining a final classification result by using an XGboost classifier.

It should be noted that, for the implementation of the expansion of the data set, the 5 classification tasks described in the present invention all employ image inversion and image mirroring operations, so that the generalization capability of the model of the method described in the present invention is stronger by using limited training data.

The data sets used for different classification tasks are different, and in one embodiment of the present invention, a represents the original training data set corresponding to the task, and B, C, D represents the training data set after data enhancement.

The image mirroring refers to vertically or horizontally mirroring the training data set A and the labeled image thereof at the same time, and summarizing the training data set A to form a training data set B.

The image turning refers to turning the training data set B and the labeled image clockwise at the same time, wherein the turning angle is 90 degrees, 180 degrees or 270 degrees, and the turning angle and the training data set B are gathered to form a training data set C. The training data set C is training data used for training the neural network.

In the 5 th task, the training data set C additionally adopts two methods of homologous packing and heterologous packing, so that a multi-example training data set D is generated, the data set is expanded, and the data reliability is improved.

The homologous packing is to carry out pairwise pairing packing on image blocks sampled in each full-slice image in the data set, namely two packed image blocks belong to the same full slice. The heterogeneous packaging is to pair every two full-slice images in the data set, pack every two image blocks in the paired full slices, and ensure that each pair of image blocks belongs to two full slices respectively.

Since heterogeneously packed may result in full slices of two different tags being packed together, the full-slice heterogeneously packed tags of different tags need to be re-modified, while full-slice heterogeneously packed tags of the same tag do not. There is only one possibility for the re-modification, the formula is as follows:

carrying a non-RAS gene mutant + not carrying a non-RAS gene mutant ═ carrying a non-RAS gene mutant

It is noted that since the class 5 task uses a multi-instance training data set in training, the multi-instance data should also be used in the context of the application. In order to ensure that the full slices do not affect each other, data should be obtained only in a homologous packing manner during application.

The deep convolutional neural network model of a preferred embodiment of the present invention is DenseNet121, and the training method of the deep convolutional neural network model includes the following steps:

first, a feature extraction network.

The neural network architecture is composed of a repeatedly piled convolution layer, a sampling layer and a nonlinear activation layer, based on a back propagation algorithm in deep learning, a large amount of image data and object class labels contained in an image are used for pre-training, abstract features of the image are summarized and extracted, and a high-dimensional feature tensor of the image is output.

The feature extraction network architecture diagram of the neural network architecture is shown in table 1, and there is no nonlinear activation layer between each cycle.

TABLE 1 feature extraction network architecture

And secondly, classifying the network according to the high-dimensional feature tensor.

The system is composed of a global average pooling layer and a full-connection output layer, and high-dimensional feature tensors corresponding to the positions of image blocks in the regional selection network output are classified.

The classification network architecture diagram is shown in table 2.

TABLE 2 Classification network

The fully-connected output layer processes the output using a Softmax activation function. The formula of the Softmax activation function is as follows:

wherein e is a natural logarithm, x_iFor the ith input of the Softmax activation function, k is the total number of inputs.

In the class 1 to class 4 tasks, the model is trained using a conventional gradient descent method. In category 5 tasks, however, the model requires pre-training using dataset C and multi-instance learning using dataset D.

FIG. 4 shows a diagram of pre-training and multi-instance learning in accordance with an embodiment of the present invention. The feature extraction network in the neural network obtained by pre-training is migrated and learned to a new network for multi-instance learning, and the classification network part is abandoned.

The final steps of the classification tasks described above: and based on the features extracted by the global classification detection probability, using an XGboost classifier to obtain a final classification result. The specific embodiment of this step is as follows, the rule and calculation of extracted chip-level result features.

The 1 st classification task to the 5 th classification task are all operated through the deep convolutional neural network, and the corresponding prediction labels and prediction probabilities of the image blocks with different numbers can be obtained from each thyroid follicular tumor pathological image.

The prediction probabilities are used for feature extraction of corresponding classes and the corresponding XGBoost classifier is used to derive a slice level result. Each classification category needs to extract 6 parameter features, which are: the maximum value of the prediction probability of the type in the full slice; the minimum value of the prediction probability in the full slice; thirdly, the average value of the prediction probability of the type in the full slice; fourthly, the median of the prediction probability in the full section; the standard deviation of the prediction probability of the type in the whole slice; sixthly, a histogram of the prediction probability of the class in the full slice.

In a preferred embodiment of the present invention, in the 6 th parameter feature, the lengths of the (r) th to (v) th parameter features are all set to 1. Since the probability distribution is in the interval of 0 to 1, the histogram may use 10 bins (bins, straight bars) from 0 to 1 at an interval of 0.1 as features, and thus the length of the sixth parameter feature is set to 10, and in summary, the above 6 parameter features are combined together to have a length of 15. That is, in the present embodiment, a feature of length 15 is used for the XGBoost model to classify, taking the output result of XGBoost as a chip-level result.

It should be noted by those skilled in the art that the length setting of the parameter features is not unique. For example, in still another preferred embodiment of the present invention, the lengths of the (r) th to (v) th parameter features are all set to 1. Since the probability distribution is in the interval of 0 to 1, the histogram may also use 20 bins (bins, straight bars) from 0 to 1 at an interval of 0.05 as features, and thus the length of the sixth parameter feature is set to 20, and in summary, the above 6 parameter features are combined together to have a length of 25. That is, in the present embodiment, a feature of length 25 is used for the XGBoost model to classify, taking the output result of XGBoost as a chip level result.

For the class 1 task, the class 3 task, the class 4 task and the class 5 task, the class-two task flow can obtain a slice-level result by using any one of the characteristics; for class 2 tasks, such a three-classification task flow then needs to use any two of the aforementioned features to collectively produce a slice level result.

An embodiment of the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to execute the steps of the above method for identifying a type of a lesion and a genetic mutation in a thyroid follicular tumor histopathology image.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, causes the processor to perform the steps of the above method for identifying a type of a lesion and a genetic mutation in a thyroid follicular tumor tissue pathology image.

The implementation may be distributed as circuitry among multiple system components, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be stored and managed separately, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may be implemented in many different ways, including as data structures such as linked lists, hash tables, arrays, records, objects, or implicit storage mechanisms. A program can be a part of a single program (e.g., a subroutine), a stand-alone program, distributed over several memories and processors, or implemented in many different ways, such as in a library, e.g., a shared library (e.g., a Dynamic Link Library (DLL)). For example, a DLL may store instructions that, when executed by a circuit, perform any of the processes described above or shown in the figures.

The methods and apparatus of embodiments of the present invention can be accomplished using standard programming techniques with rule-based logic or other logic to accomplish the various method steps. It should also be noted that the words "means" and "module," as used herein and in the claims, is intended to encompass implementations using one or more lines of software code, and/or hardware implementations, and/or equipment for receiving inputs.

Any of the steps, operations, or procedures described herein may be performed or implemented using one or more hardware or software modules, alone or in combination with other devices. In one embodiment, the software modules are implemented using a computer program product comprising a computer readable medium containing computer program code, which is executable by a computer processor for performing any or all of the described steps, operations, or procedures.

The foregoing description of the implementation of the invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and its practical application to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as are suited to the particular use contemplated.

Claims

1. A method for identifying a type of lesion and a genetic mutation in a pathologic image of follicular thyroid tumor tissue, the method being performed by a computer device and comprising the steps of:

3) and (3) detecting the 2 nd classification task and the 3 rd classification task of the positive histopathology images: the 2 nd classification task is to judge the type of an abnormal region, and correspondingly set the judged abnormal result as a thyroid follicular adenoma FA label, a follicular carcinoma FTC label or a thyroid tumor TT-UMP label with undetermined malignant potential; the 3 rd classification task is to judge the gene type, the judged result is correspondingly set as a gene mutation type label or a wild type label, if the gene type is set as the wild type label, the method is ended, and the identification result of the histopathology image is a wild type; if the gene type is set as the gene mutation type tag, continuing to the next step;

4) detecting the 4 th classification task and the 5 th classification task of the histopathology images with the gene mutation types: the 4 th classification task is to judge the gene mutation type classification I, and the judged result is correspondingly set as a gene mutation type label carrying RAS or a gene mutation type label not carrying RAS; the 5 th classification task is used for judging a gene mutation type classification II, and the judged result is correspondingly set as a gene mutation type label carrying non-RAS or a gene mutation type label not carrying non-RAS, wherein when the detection result of the 4 th classification task is the gene mutation type label carrying RAS and the detection result of the 5 th classification task is the gene mutation type label carrying non-RAS, the 6 th classification task is carried out, and the gene mutation type is set as RAS + gene mutation type; and when the detection result of the 4 th classification task is the gene mutation type label without the RAS or the detection result of the 5 th classification task is the gene mutation type label without the non-RAS, the gene mutation type is the non-RAS + gene mutation type.

2. The method of claim 1, wherein the data preprocessing comprises: using an adaptive threshold algorithm, retrieving a tissue region on the low resolution Level-5 of the pyramid-type digital cell image of the histopathological image, and locating the tissue region on the high resolution Level-0.

3. The method of claim 1, wherein the data normalization comprises: obtaining a micron per pixel mpp (micro per pixel) parameter of the histopathological image by reading additional information of the histopathological image, amplifying or reducing the histopathological image by a bilinear interpolation method, normalizing the micron per pixel mpp of the histopathological image set to be identified to be 0.5, wherein the number of target lines (columns) of the data image is as follows:

target row (column) pixel count 0.5 original row (column) pixel count/micron mpp per pixel.

4. The method of claim 1, wherein the detection process of the 1 st, 2 nd, 3 rd, 4 th and 5 th classification tasks comprises the following steps:

firstly, finding a tissue region in the histopathological image through preprocessing;

secondly, dividing the whole organization area into a plurality of square image blocks by using a division window, calling a deep convolutional neural network detection module, and sequentially executing the classification task on the square image blocks one by one;

then, mapping the detected abnormal region coordinates and the frame to the whole piece to obtain a global abnormal detection result;

and finally, extracting features of the classification detection probability based on the whole situation, and obtaining a final result of the classification task by using an XGboost classifier.

5. The method of claim 4, wherein, to expand the training data set,

the detection process of the 1 st classification task, the 2 nd classification task, the 3 rd classification task, the 4 th classification task and the 5 th classification task further comprises image overturning and image mirroring operations on the histopathology image to generate a new training data set;

the detection process of the 5 th classification task additionally comprises the steps of carrying out homologous packing and heterologous packing on the histopathology images to generate a multi-example training data set.

6. The method of claim 4, wherein the step of invoking the deep convolutional neural network detection module to sequentially perform the classification task on the square image blocks one by one comprises the following steps:

firstly, a feature extraction network is constructed by the deep neural network, a large amount of image data and object class labels contained in the image are used for pre-training, abstract features of the image are summarized and extracted, and a high-dimensional feature tensor of the image is output;

secondly, classifying the network according to the high-dimensional feature tensor, namely classifying the high-dimensional feature tensor corresponding to the position of the image block contained in the output of the area selection network by a global average pooling layer and a full-connection output layer, wherein the full-connection output layer uses a Softmax activation function to process the output, and the formula of the Softmax activation function is as follows:

wherein e is a natural logarithm, x_iAnd k is the ith input of the Softmax activation function and the total number of the inputs.

7. The method for identifying the type of pathological changes and genetic mutations in the histopathological images of thyroid follicular tumor according to claim 4, wherein the extracting features based on global classification detection probability and using XGboost classifier to obtain the final result of the classification task comprises the following steps:

the 1 st classification task to the 5 th classification task are operated through the deep convolutional neural network, and each thyroid follicular tumor pathological image obtains corresponding prediction labels and prediction probabilities of image blocks with different numbers;

the prediction probability is used for feature extraction of corresponding categories, a histogram of the prediction probability is averagely divided into n small intervals from 0 to 1 by the same interval parameters to serve as features, the features with the length being a preset value are used for classification of the XGboost model, and a chip-level result is obtained through a corresponding XGboost classifier.

8. The method of claim 7, wherein the prediction probability is used in feature extraction of corresponding classes,

for the type 1 task, the type 3 task, the type 4 task and the type 5 task, using any type of characteristics to obtain a slice level result; for the class 2 task, it is necessary to use the features of any two of the classes to jointly produce a slice level result.

9. A computer device comprising a memory and a processor, the memory having stored therein a computer program which, when executed by the processor, causes the processor to carry out the steps of the method of identifying a type of lesion and a genetic mutation in a follicular thyroid tumor histopathology image according to any one of claims 1 to 8.

10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, causes the processor to carry out the steps of the method for identifying a type of lesion and a genetic mutation in a follicular thyroid tumor histopathological image according to any one of claims 1 to 8.