CN114863149A - Method, system, device and storage medium for predicting relative survival risk of breast cancer - Google Patents

Method, system, device and storage medium for predicting relative survival risk of breast cancer Download PDF

Info

Publication number
CN114863149A
CN114863149A CN202111179543.0A CN202111179543A CN114863149A CN 114863149 A CN114863149 A CN 114863149A CN 202111179543 A CN202111179543 A CN 202111179543A CN 114863149 A CN114863149 A CN 114863149A
Authority
CN
China
Prior art keywords
image
features
patients
data
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111179543.0A
Other languages
Chinese (zh)
Inventor
余维川
刘少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hong Kong University of Science and Technology HKUST
Original Assignee
Hong Kong University of Science and Technology HKUST
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hong Kong University of Science and Technology HKUST filed Critical Hong Kong University of Science and Technology HKUST
Publication of CN114863149A publication Critical patent/CN114863149A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30068Mammography; Breast
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30096Tumor; Lesion

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Chemical & Material Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of biology, and discloses a method, a system, computer equipment and a storage medium for predicting relative survival risk of breast cancer by combining a histology full-section image and a gene mutation label. The method comprises the following steps: acquiring tumor histology full-section image data, and gene mutation data, for each of a pair of patients; acquiring image characteristics according to the acquired image data; selecting genes with significant influence on the survival from the gene mutation data to obtain genome characteristics; processing the image features and the genome features through a twin network, comprising: processing the image features using a recurrent neural network, processing the genomic features using a fully-connected network; splicing the processed image features and genome features to obtain fusion features of the patient; and for the pair of patients, predicting a relative survival risk of the pair of patients using the output linear layer based on the difference in the fused features of the pair of patients.

Description

Method, system, device and storage medium for predicting relative survival risk of breast cancer
Technical Field
The present invention relates to the field of biomedicine, and more particularly to a method, system, computer device and non-transitory computer-readable storage medium for predicting the relative survival risk of breast cancer in combination with histological whole-slice images and genetic mutation signatures.
Background
According to the world health organization cancer data (https:// www.who.int/en/news-room/face-sheets/detail/cancer) [1], breast cancer is one of the leading causes of female mortality. Breast cancer is a very complex disease, and the outcomes of different patients often vary greatly. Currently, standard breast cancer treatment regimens include surgery (mastectomy), chemotherapy, radiation therapy, and possibly hormonal or targeted therapy. Existing treatment regimens address the removal of the tumor and killing any remaining tumor cells, often requiring adjustments to the patient's tumor grade and overall health. Thus, a physician may be helped to better adjust the treatment if the patient's risk of survival can be predicted more accurately.
The existing breast cancer survival analysis method can be divided into three types according to the used data: methods using only image data, such as "Association risk of breakdown cancer recovery" (U.S. patent application No. 10489904) [2] and Seker, Huseyin et al, "Association of nodal investment and overview analysis in breakdown cancer therapy using image cellular data" (Anticancer research 22.1A (2002) -438 [3 ]; methods that use only genomic data, such as Huang, Zhi et al, "SALMON: the present application discloses a method for evaluating breast cancer, wherein the method comprises the steps of providing a section analysis learning with multi-component neural network on Breast cancer (Frontiers in genetics 10(2019):166) [4], providing a section analysis data for section analysis (BMC Medical information and differentiation map, (2020)20:225) [5], providing a section analysis data for section analysis with Breast cancer Survival (U.S. patent application No. 10876767) [6], providing a section analysis result for section analysis with a section analysis result for section analysis (U.S. patent application No. 388 and CN patent application No. 868, providing a section analysis result for section analysis with a section analysis result for section by section, providing a section analysis result for section with section analysis result for section by section, and CN, for evaluation of section analysis result of breast cancer, wherein CN, patent application No. 868, and CN, patent application No. 868, for section analysis result of section for evaluation; and methods using only clinical data, such as "Construction of the model on the Breast cancer with respect to the patient model," clinical regression and determination tree "(Journal of medical systems 38.10(2014): pages 1-7) [9] and" a method for predicting prognosis survival rate for breast cancer based on the dynamic Cox model "(Chinese patent application publication No. CN 108922628A) [10] by Chao, Cheng-Min et al.
Specifically, document [2] predicts the risk of recurrence (high risk/low risk) of breast cancer patients using histopathology whole-section images. Document [3] uses data of cell counts in images to predict the 5-year survival status of breast cancer patients. Document [4] uses a deep learning approach to predict survival risk in breast cancer patients based on multiple sets of mathematical data (mRNA sequencing data, miRNA sequencing data, copy number burden, tumor mutation burden, estrogen and progesterone receptor status). Document [5] also uses multiomic data (gene expression, DNA methylation, miRNA expression, copy number variation) to predict survival risk in breast cancer patients using a deep learning approach. Reference [6] proposes a prognostic index based on 12 genes. Literature [7] use of the expression level of CXXC5 mRNA to predict the risk of survival of breast cancer patients and to monitor the effectiveness of breast cancer therapy. Literature [8] 190 genes were extracted based on expression level analysis of RNA sequence data, and a support vector machine was used to predict whether a breast cancer patient has relapsed. Literature [9] uses support vector machines, logistic regression or decision trees to predict survival of breast cancer patients based on clinical data (pathology grading, whether to receive chemotherapy, whether to receive radiation therapy, age, tumor size, number of lymph nodes examined, number of lymph nodes attacked). Reference [10] predicts survival risk in breast cancer patients using a dynamic Cox model based on clinical data (tumor size and location, number of lymph nodes examined, number of lymph nodes challenged).
All of these techniques use only a single data source, which contains a limited amount of information.
Disclosure of Invention
The invention provides a method for predicting relative survival risk of breast cancer patients by combining pathology histology full-section images and gene mutation labels. The method uses a twin network to predict the relative survival risk of a patient. Firstly, image features and genome features are respectively extracted, then the twin networks are used for fusing the image features and the genome features, and finally the fused features are used for predicting relative risks.
Specifically, according to a first aspect of the present invention, there is provided a method for predicting the relative survival risk of breast cancer in combination with a histological whole-section image and a genetic mutation signature, the method comprising the steps of:
(a) acquiring tumor histology full-section image data, and gene mutation data, for each of a pair of patients;
(b) acquiring image features from the acquired image data, preferably comprising: cutting the histology full-slice image into image blocks, screening out non-tumor image blocks and clustering the remaining image blocks, and taking the sorted class centers as the image features;
(c) selecting genes with significant influence on the survival from the gene mutation data to obtain genome characteristics;
(d) processing the image features and the genome features through a twin network comprising a Recurrent Neural Network (RNN) for processing image features, a Fully Connected Network (FCN) for processing genome features, and an output linear layer for outputting results, comprising: processing the image features using a Recurrent Neural Network (RNN), the genomic features using a fully-connected network (FCN);
(e) splicing the processed image features and genome features to obtain fusion features of the patient; and
(f) for the pair of patients, the output linear layer is used to predict the relative survival risk of the pair of patients based on the difference (e.g., direct subtraction of corresponding elements) of the fused features of the pair of patients.
In one embodiment, in step (b), the segmenting the histological full-slice image into the image blocks comprises: and segmenting side by side from the amplified histological full-slice image to obtain the image blocks.
In one embodiment, step (b) further comprises: and after the histology full-slice image is cut into image blocks, carrying out color normalization processing on the image blocks.
In one embodiment, step (b) further comprises: performing feature extraction on the image blocks, and performing feature extraction on the image blocks by using a pre-trained neural network, preferably further comprising: the pre-trained neural network is a PNASNet neural network pre-trained on an image classification database, such as ImageNet.
In one embodiment, screening the non-tumor image patches in step (b) comprises: filtering the extracted features to screen out non-tumor image patches using a Gaussian Mixture Model (GMM) trained on the features of the non-tumor region image patches, preferably using the resulting GMM model for class-centric ordering in step (b).
In one embodiment, the gene selection in step (c) is achieved by a log rank test of survival information on both the mutated and non-mutated sets of genes, preferably with a p-value threshold of 0.05.
In one embodiment, the RNN in step (d) is an independent two-tier network with 1024 hidden nodes per tier.
In one embodiment, the FCN in step (d) is a three-tier network, wherein the number of nodes per tier is 1024, 512, and 256, respectively.
In one embodiment, the output linear layer in step (f) has no bias parameters.
In one embodiment, the twin network is trained by using data of a plurality of patients with breast cancer survival risk data as training set data, preferably the training step comprises:
(a) acquiring tumor histology full-section image data, and gene mutation data, for each of the plurality of patients;
(b) acquiring image features from the acquired image data, comprising: cutting the histology full-slice image into image blocks, performing feature extraction on the image blocks, screening out non-tumor image blocks and clustering the remaining image blocks, and taking the sorted class centers as the image features;
(c) selecting genes with significant influence on the survival from the gene mutation data to obtain genome characteristics;
(d) processing the image features and the genome features through a twin network, comprising: processing the image features using a Recurrent Neural Network (RNN), the genomic features using a Fully Connected Network (FCN);
(e) splicing the processed image features and genome features to obtain fusion features of the patient;
(f) for each pair of the plurality of patients, training the twin network based on the difference in the fused features of the pair of patients and their relative survival risk, comprising: training network parameters of the twin network using a cross-entropy loss function as a supervision according to the predicted relative survival risk for the pair of patients and the actual survival data for the pair of patients;
(g) the training process of the twin network is divided into three stages: the first stage, only using the image characteristic part to train the RNN part; the second stage, adding genome characteristics, fixing RNN parameters and training FCN part; and in the third stage, the parameter fixation of the RNN is released, and the parameters of the RNN and the FCN are further subjected to combined optimization training.
In one embodiment, wherein the performance evaluation index used to evaluate prediction accuracy is a coincidence index.
In one embodiment, the method is applicable to a cancer having the histological whole-section image and the gene mutation signature.
In a second aspect, the invention provides a system for risk of survival prediction in combination with a histological whole-slice image and a genetic mutation signature, comprising a processor configured to execute computer instructions to cause the method of the first aspect of the invention to be performed.
In a third aspect, the invention provides a computer device comprising a memory and a processor, the memory having stored thereon computer instructions which, when executed by the processor, cause the method of the first aspect of the invention to be performed.
In a fourth aspect, the invention provides a non-transitory computer readable storage medium having stored thereon computer instructions which, when executed by a processor, cause the method of the first aspect of the invention to be performed.
By utilizing the scheme of the invention, the histology full-section image and the gene mutation label can be combined to predict the survival risk, which is beneficial to improving the accuracy of predicting the survival risk, thereby helping doctors to better adjust the treatment scheme.
Drawings
The invention will now be described by way of non-limiting example only with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a process for acquiring image features from a full histological slice of a patient according to an embodiment of the present invention.
Figure 2 schematically illustrates a twin network designed to predict relative survival risk according to an embodiment of the invention.
FIG. 3 schematically shows the coincidence index (c-index) of a 5-fold cross-validation experiment on a TCGA-BRCA dataset using three inputs, an image feature only, a genome feature only, an image feature and a genome feature, respectively, according to an embodiment of the present invention.
Detailed Description
The inventor proves that the combination of the image data and the genome data is helpful for improving the accuracy of the breast cancer survival risk prediction through experiments. The information contained in the image data and the genomic data are different, and the combination of the two is not studied at present.
For the selection of image data features, FIG. 1 illustrates a process for acquiring image features from a full histological slice of a patient according to an embodiment of the present invention. The specific details are described below.
For patients, there are often multiple histological full-slice images possible, for which the inventors assume that patient p i With N i The width is as large as
Figure BDA0003296232670000071
The histological whole-slice image of (a). Typically, the histological full-slice images are large, and it is impractical to train the network directly as input. In addition, the tumor area in the image is less compared to the normal area, and this data non-uniformity makes training more difficult. To alleviate these difficulties, the inventors sought to reduce N i Segmenting the image into image blocks, extracting features from the image blocks, clustering the image blocks into a plurality of categories, and using the mean value of the features of each category as the patient p i As shown in fig. 1. Specifically, the image blocks have a size of 256 × 256 × 3, and are segmented side by side on a 20-fold physically enlarged histological whole-slice image, and the patient p is i In total M i And each image block. In the generation of image blocks, image blocks (background blocks) having particularly large luminance averages are directly discarded. Next, the colors of the image blocks are normalized (see "A method for normalizing histogram slices for quantitative analysis" by Marc Macenko et al, 2009IEEE International Symposium on biological Imaging: From Nano to Macro (ISBI) [11 BI ]]) So as to reduce the color difference of image blocks segmented by different histology full-slices. Then, a pre-trained neural network named PNASNet (where large numbers are classified in generic images) is usedThe library ImageNet was trained with reference to "Progressive neural architecture search" of Chenxi Liu et al, Proceedings of the European Conference on Computer Vision (ECCV).2018[12]) Features are extracted from the image block after color normalization. In feature extraction, the inventors use mean pooling and maximum pooling, respectively, and stitch the resulting features together to improve robustness. Thus, patient p i Is M i X 8640. Next, the inventors filtered the extracted features using a Gaussian Mixture Model (GMM) trained on the features of the non-tumor region image patches (roughly delineated by the physician, typically 10 histologic full-slices of data are sufficient to train) to screen out the first third higher score image patches to filter out the non-tumor region image patches. Finally, using the K-means method, the resulting features are clustered into 128 classes, and the 128 class centers are sorted from small to large according to the GMM score described above. The 128 sorted class centers are used as final image features, and the dimension of the 128 × 8640 is.
Genomic characterization is described below. Gene mutations in Breast Cancer patients are numerous, for example, 21,057 gene mutation records are total for The TCGA-BRCA dataset (see "Radiology Data from The Cancer Genome Atlas Breast acquired Carcinoma [ TCGA BRCA ] collection", The Cancer Imaging Archive,2016[13]) by Lingle W et al. These gene mutations are too numerous and some are redundant, so the inventors need to select important gene mutations from these alternative gene mutations. At the time of gene selection, the inventors used only the data of the training set to avoid information leakage. Specifically, for each candidate gene, the training set is divided into two parts according to whether the gene has a mutation. The inventors then used the log rank test to calculate the level of significance of the two subsets on survival time. Finally, those candidate genes with a p-value of less than 0.05 were selected as genomic features. Typically, only a few hundred genes will be retained after screening. The screening improves the discrimination of the features while reducing the feature dimension.
Figure 2 illustrates a twin network designed to predict relative survival risk according to an embodiment of the present invention.
For survival risk prediction problems, the absolute value of the risk is usually meaningless. Only the relative magnitude of the survival risk of two patients is meaningful. To this end, the solution of the present invention designs a twin network (refer to "Learning a precision metric characterization, with application to surface verification", In Computer Vision and Pattern Recognition, 539-546, IEEE, 2005[14]) to meet this requirement, as shown In fig. 2.
Image features are passed through an independent recurrent neural network (IndRNN) (see "independent recurrent neural network (IndRNN): Building a locator and locator RNN", Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (CVPR): 2018[15]) The processing is performed with the genomic features processed via a Fully Connected Network (FCN). The IndRNN model used has 2 layers, each layer having 1024 hidden nodes. The FCN used consists of 3 layers, each having 1024, 512 and 256 nodes. Next, the processed features (1024 and 256 dimensions, respectively) are stitched together as a joint feature (1280 dimension). For a pair of patients (p) i ,p j ) Their features are processed separately with a network of shared weights, and their differences (p) are combined i Characteristic p and j direct subtraction of corresponding elements of a feature) is fed into the output linear layer to predict relative risk. If patient p i The higher survival risk is marked as 1, otherwise 0. The prediction problem becomes a binary classification problem, so the cross-entropy loss function can be used to guide the training process.
Current methods typically directly estimate the absolute survival risk of a patient, such as P.Mobadersanny et al, "differentiating cancero outer from history and genetics using relational networks of the National Academy of Sciences, vol.115no.13, pp E2970E 29792018 [16] and Yao, Jianwen et al," white slide images based cancer Surviai prediction using identified negative multiple existence networks, "Medical Image Analysis 65(2020)101789[17], and use partial negative logarithm as a likelihood loss function. Compared with the existing method, the twin network has the following advantages: (a) only the relative survival risk is significant, while the twin network of the present invention directly addresses the relative survival risk. (b) The patient pair is the input to the network, not the patient. Thus, assuming that the number of patients is P, the number of training samples is of the order of O (P ^ 2). Furthermore, using the patient pair as input allows for a large increase in the number of training samples, thereby making the training process easier. (c) The proposed method is not affected by the batch size (batch size) in the training. When the batch size is limited due to the memory size of the graphics card, the training samples cannot be put into the memory at the same time. With existing methods, when the batch size is limited, not all patients are visible to the combination in one training session (epoch), and thus the training session may not be stable. Accordingly, in the method of the present invention, all patient pairs will be traversed for a training session, regardless of whether the batch is large or small. Therefore, the method of the present invention is not affected by the batch size, and the training process is stable.
FIG. 3 shows the coincidence index (c-index) of a 5-fold cross-validation experiment on a TCGA-BRCA dataset using three inputs, image-only, genome-only, image-feature and genome-feature, respectively, according to an embodiment of the present invention. The specific details are described below.
In the TCGA-BRCA dataset, a total of 1026 patients had both histologic full-section images and a gene mutation signature. Of these patients, 882 patients were right-missed and 144 patients had an event (i.e., died from breast cancer). In consideration of the data non-equilibrium, the scheme of the invention uses a layered sampling mode when the training set and the test set are divided. The ratio of training set to test set is 4/1. In the experiment, the scheme of the invention adopts 5-fold cross validation, and the used evaluation index is the coincidence index. The scheme of the invention tests the prediction accuracy under different input conditions: only image features, only genomic features, and both features are used. The coincidence index on the test set is shown in fig. 3, and higher prediction accuracy can be obtained using two features than using a single feature.
It will be appreciated by those of ordinary skill in the art that the schematic diagram of the twin network shown in the figures is merely an illustrative block diagram of a portion of the structure associated with aspects of the present invention and does not constitute a limitation of the computer device, processor or computer program embodying aspects of the present invention. A particular computer device, processor or computer program may include more or fewer components or modules than shown in the figures, or may combine or split certain components or modules, or may have a different arrangement of components or modules.
It should be understood that the various elements of the system of the present invention may be implemented in whole or in part in software, hardware, firmware, or a combination thereof. The units may be embedded in a processor of the computer device in a hardware or firmware form or independent of the processor, or may be stored in a memory of the computer device in a software form for being called by the processor to execute operations of the units. Each of the units may be implemented as a separate component or module, or two or more units may be implemented as a single component or module.
In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored thereon computer instructions executable by the processor, the computer instructions, when executed by the processor, instructing the processor to perform the steps of the method of the invention. The computer device may broadly be a server, a vehicle mounted terminal, or any other electronic device having the necessary computing and/or processing capabilities. In one embodiment, the computer device may include a processor, memory, a network interface, a communication interface, etc., connected by a system bus. The processor of the computer device may be used to provide the necessary computing, processing and/or control capabilities. The memory of the computer device may include non-volatile storage media and internal memory. An operating system, a computer program, and the like may be stored in or on the non-volatile storage medium. The internal memory may provide an environment for the operating system and the computer programs in the non-volatile storage medium to run. The network interface and the communication interface of the computer device may be used to connect and communicate with an external device through a network. Which computer program, when being executed by a processor, carries out the steps of the auxiliary method of the invention.
The invention may be implemented as a computer readable storage medium having stored thereon a computer program which, when executed by a processor, causes the steps of the method of the invention to be performed. In one embodiment, the computer program is distributed across a plurality of computer devices or processors coupled by a network such that the computer program is stored, accessed, and executed by one or more computer devices or processors in a distributed fashion. A single method step/operation, or two or more method steps/operations, may be performed by a single computer device or processor or by two or more computer devices or processors. One or more method steps/operations may be performed by one or more computer devices or processors, and one or more other method steps/operations may be performed by one or more other computer devices or processors. One or more computer devices or processors may perform a single method step/operation, or perform two or more method steps/operations.
It will be understood by those of ordinary skill in the art that all or part of the steps of the method of the present invention may be directed to associated hardware, such as a computer device or a processor, for performing the steps of the method of the present invention by a computer program, which may be stored in a non-transitory computer readable storage medium and executed to cause the steps of the method of the present invention to be performed. Any reference herein to memory, storage, databases, or other media may include non-volatile and/or volatile memory, as appropriate. Examples of non-volatile memory include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), flash memory, magnetic tape, floppy disk, magneto-optical data storage device, hard disk, solid state disk, and the like. Examples of volatile memory include Random Access Memory (RAM), external cache memory, and the like.
The respective technical features described above may be arbitrarily combined. Although not all possible combinations of features are described, any combination of features should be considered to be covered by the present specification as long as there is no contradiction between such combinations.
While the present invention has been described in connection with the embodiments, it is to be understood by those skilled in the art that the foregoing description and drawings are merely illustrative and not restrictive of the broad invention, and that this invention not be limited to the disclosed embodiments. Various modifications and variations are possible without departing from the spirit of the invention.

Claims (17)

1. A method of predicting the relative survival risk of breast cancer in combination with a histological whole-slice image and a genetic mutation signature, the method comprising the steps of:
(a) acquiring tumor histology full-section image data, and gene mutation data, for each of a pair of patients;
(b) acquiring image features from the acquired image data, comprising: cutting the histology full-slice image into image blocks, performing feature extraction on the image blocks, screening out non-tumor image blocks and clustering the remaining image blocks, and taking the sorted class centers as the image features;
(c) selecting genes with significant influence on the survival from the gene mutation data to obtain genome characteristics;
(d) processing the image features and the genome features through a twin network comprising a recurrent neural network for processing image features, a fully connected network for processing genome features, and an output linear layer for outputting results, comprising: processing the image features using a recurrent neural network, processing the genomic features using a fully-connected network;
(e) splicing the processed image features and genome features to obtain fusion features of the patient; and
(f) for the pair of patients, predicting a relative survival risk of the pair of patients using the output linear layer based on the difference in the fused features of the pair of patients.
2. The method of claim 1, the slicing the histological full-slice image into the image blocks in step (b) comprising: and segmenting side by side from the amplified histological full-slice image to obtain the image blocks.
3. The method of claim 1, step (b) further comprising: and after the histology full-slice image is cut into image blocks, carrying out color normalization processing on the image blocks.
4. The method of claim 1, step (b) further comprising: and performing feature extraction on the image blocks by using a pre-trained neural network.
5. The method of claim 4, further comprising: the pre-trained neural network is a PNASNet neural network pre-trained on an image classification database, such as ImageNet.
6. The method of claim 1, wherein screening out non-tumor image patches in step (b) comprises: the extracted features are filtered using a gaussian mixture model trained on the features of the non-tumor area image patches to screen out the non-tumor area image patches.
7. The method of claim 6, wherein the ordering of class centers in step (b) is performed using the resulting Gaussian mixture model.
8. The method of claim 1, wherein the gene selection in step (c) is performed by a log rank test of survival information on both mutated and non-mutated sets of genes, wherein the threshold value of p is 0.05.
9. The method of claim 1, wherein the recurrent neural network in step (d) is an independent two-layer network.
10. The method of claim 1, wherein the fully connected network in step (d) is a three-layer network.
11. The method of claim 1, the output linear layer in step (f) having no bias parameters.
12. The method of claim 1, the twin network being trained using data from a plurality of patients having breast cancer survival risk data as training set data, the training step comprising:
(a) acquiring tumor histology full-section image data, and gene mutation data, for each of the plurality of patients;
(b) acquiring image features from the acquired image data, comprising: cutting the histology full-slice image into image blocks, performing feature extraction on the image blocks, screening out non-tumor image blocks, clustering the remaining image blocks, and taking the sorted class centers as the image features;
(c) selecting genes with significant influence on the survival from the gene mutation data to obtain genome characteristics;
(d) processing the image features and the genome features through a twin network, comprising: processing the image features using a recurrent neural network, processing the genomic features using a fully-connected network;
(e) splicing the processed image features and genome features to obtain fusion features of the patient;
(f) for each pair of the plurality of patients, training the twin network based on the difference in the fused features of the pair of patients and their relative survival risk, comprising: training network parameters of the twin network using a cross-entropy loss function as a supervision based on the predicted relative survival risk for the pair of patients and the actual survival data for the pair of patients.
13. The method of claim 12, wherein the performance evaluation index used to evaluate prediction accuracy is a coincidence index.
14. The method of claim 1, which is suitable for a cancer having the histological whole-section image and the gene mutation signature.
15. A system for risk of survival prediction combining histological whole-slice images and genetic mutation signatures, comprising a processor configured to execute computer instructions to cause a method according to any of claims 1-14 to be performed.
16. A computer device comprising a memory and a processor, the memory having stored thereon computer instructions that, when executed by the processor, cause the method of any of claims 1-14 to be performed.
17. A non-transitory computer readable storage medium having stored thereon computer instructions which, when executed by a processor, cause the method according to any one of claims 1-14 to be performed.
CN202111179543.0A 2021-02-05 2021-10-09 Method, system, device and storage medium for predicting relative survival risk of breast cancer Pending CN114863149A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US202163145998P 2021-02-05 2021-02-05
US63/145,998 2021-02-05

Publications (1)

Publication Number Publication Date
CN114863149A true CN114863149A (en) 2022-08-05

Family

ID=82627362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111179543.0A Pending CN114863149A (en) 2021-02-05 2021-10-09 Method, system, device and storage medium for predicting relative survival risk of breast cancer

Country Status (1)

Country Link
CN (1) CN114863149A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333485A (en) * 2023-11-30 2024-01-02 华南理工大学 WSI survival prediction method based on weak supervision depth ordinal regression network

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117333485A (en) * 2023-11-30 2024-01-02 华南理工大学 WSI survival prediction method based on weak supervision depth ordinal regression network
CN117333485B (en) * 2023-11-30 2024-04-05 华南理工大学 WSI survival prediction method based on weak supervision depth ordinal regression network

Similar Documents

Publication Publication Date Title
Hong et al. Predicting endometrial cancer subtypes and molecular features from histopathology images using multi-resolution deep learning models
US11348661B2 (en) Predicting total nucleic acid yield and dissection boundaries for histology slides
US11741604B2 (en) Systems and methods for processing electronic images to infer biomarkers
JP2022527264A (en) Method for determining biomarkers from pathological tissue slide images
US11348239B2 (en) Predicting total nucleic acid yield and dissection boundaries for histology slides
Li et al. Machine learning for lung cancer diagnosis, treatment, and prognosis
CA3108632A1 (en) A multi-modal approach to predicting immune infiltration based on integrated rna expression and imaging features
US11348240B2 (en) Predicting total nucleic acid yield and dissection boundaries for histology slides
El-Bendary et al. A feature-fusion framework of clinical, genomics, and histopathological data for METABRIC breast cancer subtype classification
CN112687327A (en) Cancer survival analysis system based on multitask and multi-mode
Xu et al. Using transfer learning on whole slide images to predict tumor mutational burden in bladder cancer patients
Zormpas-Petridis et al. Superpixel-based conditional random fields (SuperCRF): incorporating global and local context for enhanced deep learning in melanoma histopathology
Wetteland et al. Automatic diagnostic tool for predicting cancer grade in bladder cancer patients using deep learning
US20230056839A1 (en) Cancer prognosis
Levy et al. Mixed effects machine learning models for colon cancer metastasis prediction using spatially localized immuno-oncology markers
CN114863149A (en) Method, system, device and storage medium for predicting relative survival risk of breast cancer
EP4271838A1 (en) Predicting total nucleic acid yield and dissection boundaries for histology slides
Liu et al. Pathological prognosis classification of patients with neuroblastoma using computational pathology analysis
CN113591791B (en) Lung cancer automatic identification system based on self-learning artificial intelligence
Yazici et al. New Approach for Risk Estimation Algorithms of BRCA1/2 Negativeness Detection with Modelling Supervised Machine Learning Techniques
WO2022029492A1 (en) Methods of assessing breast cancer using machine learning systems
Baheti et al. Prognostic stratification of glioblastoma patients by unsupervised clustering of morphology patterns on whole slide images furthering our disease understanding
Qiu et al. Predicting microsatellite instability in colorectal cancer based on a novel multimodal fusion deep learning model integrating both histopathological images and clinical information
He et al. Development of a Multimodal Deep Learning Model for Predicting Microsatellite Instability in Colorectal Cancer by Integrating Histopathological Images and Clinical Data
Xu et al. Fraction and Spatial Organization of Tumor-Infiltrating Lymphocytes Correlating with Bladder Cancer Patient Survival

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination