CN114530222A - Cancer patient classification system based on multiomics and image data fusion - Google Patents

Cancer patient classification system based on multiomics and image data fusion Download PDF

Info

Publication number
CN114530222A
CN114530222A CN202210034741.6A CN202210034741A CN114530222A CN 114530222 A CN114530222 A CN 114530222A CN 202210034741 A CN202210034741 A CN 202210034741A CN 114530222 A CN114530222 A CN 114530222A
Authority
CN
China
Prior art keywords
data
module
fusion
omics
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210034741.6A
Other languages
Chinese (zh)
Inventor
董守斌
黄薇娴
谭凯文
胡金龙
张子烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210034741.6A priority Critical patent/CN114530222A/en
Publication of CN114530222A publication Critical patent/CN114530222A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/20ICT specially adapted for the handling or processing of medical images for handling medical images, e.g. DICOM, HL7 or PACS
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a cancer patient classification system based on multi-omics and image data fusion, which can complete the loading and preprocessing of end-to-end multi-group chemical data and image data, introduces extra characteristic information by utilizing an external knowledge database to carry out characteristic dimension reduction and information aggregation on specific omics data, supplements extra sample information by calculating the similarity between cancer patients, namely samples, finally realizes the fusion of the classification results of the multi-group chemical data and the image data by a multi-modal cross fusion method, and completes the output of the final classification result, and comprises the following functional modules: the system comprises a data loading and preprocessing module, a multi-group science processing module, a picture processing module and a fusion module. The invention can effectively fuse multiomics and image data and can be used for accurately classifying various cancer patients.

Description

Cancer patient classification system based on multiomics and image data fusion
Technical Field
The invention relates to the technical field of cancer patient classification, in particular to a cancer patient classification system based on multiomics and image data fusion.
Background
Cancer is a disease with complex underlying molecular mechanisms and factors that require a large amount of data to more accurately describe, diagnose, and treat a patient. Omics are the main data used by researchers to research the mechanism of cancer, and in recent years, due to the progress of gene sequencing technology, the sequencing time is greatly shortened, the sequencing cost is greatly reduced, and the manpower consumption is reduced, so that the rapid development of various omics including genomics, proteomics and the like is promoted. Meanwhile, due to the development of modern computers and medical images, the images are taken as an effective means for researching cancers, and more pathological pictures are called 'gold standard' for diagnosis. The multiomics and the images provide disease information of patients from different layers, wherein the genomics, the transcriptomics and the proteomics respectively provide molecular-level analysis for cancer patients from gene, transcription and protein expression layers, and the image data visually represent the current physical conditions of the patients. More and more researches are devoted to the fusion of omics data and image data so as to more comprehensively diagnose and treat cancer patients, but the fusion of omics data and image data faces various challenges such as dimensional disaster, data heterogeneity, data imbalance and the like;
for many years, many methods have been proposed for fusion of omics data and image data for various problems. However, most of the existing work has focused on unsupervised fusion of multianatomical and image data, or simply obtaining additional information from features or samples. With the development of public and personalized medicine, more and more organizations and institutions provide literature information and data sets related to cancer, and attract people to research supervised multigroup chemical data and image data fusion methods, which can identify biomarkers related to diseases and predict new samples. Early attempts at this type of approach included feature stitching based approaches and integration based approaches. On one hand, the method based on splicing integrates multiple groups of chemical data and image data by directly splicing the characteristics of input data to finish the learning of classification models. On the other hand, the integration-based approach integrates predictions from different classifiers, each trained on a respective type of input data. However, these methods do not take into account the correlation between different input data types, which may favor certain input data types;
in summary, in consideration of obtaining additional information from the features and the samples at the same time, a new multi-modal data fusion method is used to realize information interaction between different input data and complete fusion of multiple sets of mathematical data and image data.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, provides a cancer patient classification system based on the fusion of multiomics and image data, can realize the information fusion between multigroup chemical data and image data, and accurately classifies cancer patients by utilizing the fused multigroup chemical data and image data.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a cancer patient classification system based on multi-omics and image data fusion can complete loading and preprocessing of end-to-end multi-group chemical data and image data, introduces additional feature information by using an external knowledge database to perform feature dimension reduction and information aggregation on specific omics data, supplements additional sample information by calculating similarity between cancer patients, namely samples, and finally realizes fusion of classification results of the multi-group chemical data and the image data by a multi-modal cross fusion method, and specifically comprises the following functional modules:
the data loading and preprocessing module is used for importing a plurality of groups of mathematical data and image data and preprocessing the imported data;
the system comprises a multiomic processing module and a supervised Graph convolution module, wherein the integrative gene Network module utilizes interaction information between genes provided by an external database HINT to construct an adjacency matrix between the genes, utilizes a Graph convolution neural Network (GCN) and the adjacency matrix between the genes to carry out gene characterization, the supervised Graph convolution module utilizes cosine similarity to construct an adjacency matrix between samples, utilizes the Graph convolution neural Network (GCN) and the adjacency matrix between the samples to carry out sample characterization, and obtains a preliminary prediction classification result taking omic data as input;
the image processing module is used for representing and learning the image data by utilizing a convolutional neural network to obtain a preliminary prediction classification result taking the image data as input;
and the fusion module comprises a primary cross fusion module and a network fusion module, wherein the primary cross fusion module constructs a multi-mode data cross fusion vector, and inputs a reconstruction vector of the vector into the network fusion module, so that the fusion of the classification results of a plurality of omics and image data is realized, and the final classification result is obtained.
Further, the data loading and preprocessing module comprises a data loading module and a data preprocessing module; the data loading module is used for loading a plurality of groups of chemical data and image data, wherein the plurality of groups of chemical data comprise genomics data, transcriptomics data and epigenomics data, each row of the plurality of groups of chemical data represents the expression value of each sample on the corresponding characteristic, each column represents the expression value of one sample in the corresponding characteristic, and the image data is cancer patient pathological diagram data; the preprocessing of the data preprocessing module comprises sample alignment, feature alignment and deletion of features of multigroup mathematical data with null proportion exceeding a% of all samples and low null proportionFilling b% of all samples with software IMPUTE2, removing features with variance lower than a threshold value, performing feature alignment on specific omics data by using software TCGA-Assembler, analyzing a pathological diagram by using software HistomicTK, and cutting the pathological diagram by using an Openslide tool, wherein each sample obtains z regions of interest (ROI), z is greater than or equal to 1, and the pixel size of each region of interest (ROI) is r1×r2Wherein r is1And r2Respectively corresponding to the length and width pixel values of each region of interest (ROI), and finally dividing omics and image data according to a specified proportion to obtain a training set and a test set; the data output after passing through the data preprocessing module is composed of a plurality of samples, and each sample comprises a plurality of omics data and a plurality of pathological pictures.
Further, the omics processing module comprises an integrated gene network module and a supervised graph convolution module;
the integrated gene Network module utilizes interaction information between introduced genes of an external database HINT to realize information aggregation and feature screening of a plurality of omics data feature levels through a Graph convolution neural Network (GCN), and comprises the following steps:
a1) construction of an adjacency matrix A between genes Using Chiense-binary-physical-interaction datasets provided by the external database HINT(g)∈R(p×p)R is a real number set, and p is a characteristic number;
a2) using the adjacency matrix A obtained in step a1)(g)Constructing a Graph convolution neural Network (GCN) to obtain neighbor information of a feature space:
Figure BDA0003467867200000041
wherein, omics U is {1, 2., U }, and U is a plurality of groups of mathematical numbers,
Figure BDA0003467867200000042
respectively pretreated groupsLearning the training set and the test set of u, respectively, in the training phase and the test phase into the formula of step a2),
Figure BDA0003467867200000043
for the implicit layer characterization of omics u, σ (·) is an activation function ReLU (·) ═ max (0, ·), max (0, ·) indicates a larger number of 0 and · which is an hadamard product,
Figure BDA0003467867200000044
for parameters needing to be learned in the training process of a Graph Convolutional neural Network (GCN) in omics u, the integrated gene Network module learns the parameters of the Graph Convolutional neural Network (GCN) only in the training stage;
the supervised image convolution module constructs a sample adjacency matrix A according to cosine similarity between samples(s)The method for obtaining the preliminary prediction classification result of each omics through the Graph convolution neural Network (GCN) comprises the following steps:
b1) construction of adjacency matrix A according to similarity between samples(s)
b1.1) in the training stage, calculating the cosine similarity between the samples in the training set to obtain the adjacency matrix of the training samples
Figure BDA0003467867200000045
Figure BDA0003467867200000046
Wherein the content of the first and second substances,
Figure BDA0003467867200000047
an adjacency matrix representing samples i and j,
Figure BDA0003467867200000051
denotes the cosine similarity between sample i and sample j, xiAnd xiExpression of sample i and sample j in omics, respectivelyValue, | | · | luminance2Representing a 2-norm operation on a,
Figure BDA0003467867200000052
is a contiguous matrix
Figure BDA0003467867200000053
I denotes the identity matrix, ∈ being determined by a given parameter k, k denotes the average number of edges retained by each node, including self-join, whose formula is as follows:
Figure BDA0003467867200000054
wherein I (·) is an indicator function, when sim (x)i,xj) When the k is equal to 1, each node is only connected, and a Graph Convolutional neural Network (GCN) at the moment is equal to a full connection layer;
b1.2) in the testing stage, calculating cosine similarity between the training sample and the testing sample and between the testing sample and the testing sample, and replacing the training integrated test set according to the formula in the step b1.1) to obtain an adjacency matrix of the testing sample
Figure BDA0003467867200000055
b2) Graph convolution neural Network (GCN, Graph Convolutional Network) built with supervised Graph convolution module:
b2.1) in the training stage, the construction formula of the Graph convolution neural Network (GCN) with the supervision Graph convolution module is as follows:
Figure BDA0003467867200000056
Figure BDA0003467867200000057
wherein the content of the first and second substances,
Figure BDA0003467867200000058
for the characterization of omics u after integration of the gene network module,
Figure BDA0003467867200000059
for the adjacency matrix between the samples obtained in step b1.1),
Figure BDA00034678672000000510
and
Figure BDA00034678672000000511
for the parameters needed to be learned in the training process of Graph convolution neural Network (GCN) with supervision Graph convolution module in omics u,
Figure BDA00034678672000000512
and
Figure BDA0003467867200000061
is an implicit characterization of supervised graph convolution modularity omics u;
b2.2) in a testing stage, inputting the adjacency matrix of the testing sample obtained in the step b1.2) and the testing set passing through the integrated gene Network module into a constructed Graph Convolutional neural Network (GCN) for sample information aggregation to obtain omics representation of the testing set;
b3) obtaining a preliminary prediction classification result of each omics data:
Figure BDA0003467867200000062
wherein the content of the first and second substances,
Figure BDA0003467867200000063
predictive labels representing training or test sets after a supervised graph convolution module,ntr、nteThe number of samples in the training set and the test set, respectively, c the number of classes in the classification task,
Figure BDA0003467867200000064
to build parameters to be learned in the softmax classifier process, the formula of the softmax classifier is as follows:
Figure BDA0003467867200000065
wherein, the classification task includes a classification t ═ 1, 2., c } and a classification m ═ 1, 2., c }, h ═ h ·1,h2,...,hc]TFor the vector input to the softmax classifier, htAnd hmRepresenting the t and m elements in the input vector h;
loss function constructed with supervised graph convolution module
Figure BDA0003467867200000066
The following were used:
Figure BDA0003467867200000067
wherein L isCE(. cndot.) is a cross-entropy loss function,
Figure BDA0003467867200000068
a one-hot coded prediction tag representing omic i sample j,
Figure BDA0003467867200000069
to represent
Figure BDA00034678672000000610
M-th element of (1), yjAre authentic tags in the data set.
Further, the picture processing module extracts depth features of the pathological picture by using a Convolutional Neural Network (CNN)Neural Network) is composed of l convolutional layers, pooling layers and full-link layers, l is greater than or equal to 1, wherein the core size of the convolutional layers is s1×s2Each convolution layer has q feature maps, and the pooling layer size is s3×s4And the last layer adopts a full connection layer, and outputs the preliminary classification result of the image data of the sample, and the method comprises the following steps:
c1) in the training stage, the pre-processed size is r1×r2The pathological picture His in the training settrInputting into Convolutional Neural Network (CNN), and extracting pathological picture His from convolutional layer by convolutional layertrThe data is reduced through the pooling layer, and the result is output through the full-connection layer
Figure BDA0003467867200000071
Adjusting network structure parameters through back propagation, obtaining optimal network parameters through continuous training, and adopting a dropout mechanism in the training process to avoid overfitting;
c2) in the testing stage, the pre-processed size is r1×r2Test concentrated pathology picture HisteInputting the data into a trained Convolutional Neural Network (CNN), and outputting the preliminary prediction classification result of the image processing module
Figure BDA0003467867200000072
Further, the fusion module comprises a preliminary cross fusion module and a network fusion module, the preliminary cross fusion module firstly constructs multi-modal data cross fusion vectors, then reconstructs the multi-modal data cross fusion vectors to obtain reconstructed vectors, and finally the network fusion module outputs the classification results after fusion;
the preliminary cross fusion module specifically performs the following operations:
d1) constructing a multi-modal data cross fusion vector:
Figure BDA0003467867200000073
wherein the content of the first and second substances,
Figure BDA0003467867200000074
is a multi-mode data cross fusion vector of a training set or a test set, R is a real number set, ntr、nteRespectively the number of samples in the training set and the test set, c the number of categories in the classification task, U the number of omics,
Figure BDA0003467867200000075
in order to take omics u as input and pass through a supervised graph convolution module to obtain a preliminary prediction classification result of a training set or a test set,
Figure BDA0003467867200000076
the method comprises the steps of taking a pathological diagram as an input and conducting preliminary prediction classification result of a training set or a test set after passing through an image processing module;
d2) reconstructing the multi-modal data cross fusion vector obtained in the step d1) to obtain a reconstructed vector of a training set or a test set
Figure BDA0003467867200000081
The network convergence module is composed of a full connection layer and comprises the following steps:
e1) using the reconstructed vector obtained in step d2)
Figure BDA0003467867200000082
Inputting into a network fusion module, and outputting a final classification result:
Figure BDA0003467867200000083
wherein the content of the first and second substances,
Figure BDA0003467867200000084
the network parameters to be trained in the training stage are input into the training set and the test setConstructing vectors to obtain final classification results of the training set and the test set respectively
Figure BDA0003467867200000085
And
Figure BDA0003467867200000086
the formula of the softmax classifier is as follows:
Figure BDA0003467867200000087
wherein, the classification task includes a classification t ═ 1, 2., c } and a classification m ═ 1, 2., c }, h ═ h ·1,h2,...,hc]TFor the vector input to the softmax classifier, htAnd hmRepresenting the t and m elements in the input vector h;
e2) and (3) calculating a loss function L of the network fusion module by back propagation:
Figure BDA0003467867200000088
wherein L isCE(. is a cross entropy loss function, yjIs the true label of the sample j and,
Figure BDA0003467867200000089
in order to train the final prediction result for sample j,
Figure BDA00034678672000000810
represent
Figure BDA00034678672000000811
The m-th element of (1); in the training stage, a loss function after passing through the network fusion module needs to be calculated, and then parameters of the network fusion module are trained through back propagation; but this step need not be passed through in the testing phase, the final classification result is output in step e 1).
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. introducing additional interaction information among genes by means of an external knowledge database, realizing information aggregation among the genes by utilizing a Graph Convolutional neural Network (GCN), and fully mining implicit characteristics of the multiomic data.
2. By calculating the similarity between the training sample and the test sample and between the test sample and the test sample, the supervised and unsupervised information of the samples is fully utilized, and then the information aggregation of the sample layer is completed through a Graph Convolutional neural Network (GCN), which is beneficial to improving the prediction precision of cancer patient classification.
3. The fusion of the information of the cancer patients in different layers is realized by a multi-mode cross fusion method, and the prediction classification of the cancer patients is further completed by the fusion of a network method.
4. Information aggregation is carried out from a characteristic level and a sample level, the related information among the multiomic data, the image data and different data sets is fully mined, and the prediction precision of cancer patient classification is improved.
Drawings
FIG. 1 is an architectural diagram of the system of the present invention.
FIG. 2 is a schematic diagram of the function of integrating gene network modules.
Fig. 3 is a functional diagram of the supervised graph convolution module.
Fig. 4 is a schematic structural diagram of the fusion module.
Detailed Description
The present invention will be described in further detail with reference to specific examples, but the embodiments of the present invention are not limited thereto.
The embodiment discloses a cancer patient classification system based on fusion of multiomics and image data, and aims to improve the classification prediction precision of cancer patients by fusing the fusion of multiple groups of cancer patient data and image data. The system can complete the loading and preprocessing of end-to-end multigroup mathematical data and image data, introduces extra characteristic information by utilizing an external knowledge database to perform characteristic dimension reduction and information aggregation on specific omics data, supplements extra sample information by calculating the similarity between cancer patients, namely samples, and finally realizes the fusion of the classification results of the multigroup mathematical data and the image data by a multimodality cross fusion method, as shown in fig. 1, the system specifically comprises the following functional modules:
and the data loading and preprocessing module comprises a data loading module and a data preprocessing module.
The data loading module is used for loading a plurality of groups of chemical data and image data, wherein the plurality of groups of chemical data comprise genomics data, transcriptomics data and epigenomics data, each row of the plurality of groups of chemical data represents the expression value of each sample on the corresponding characteristic, each column represents the expression value of one sample in the corresponding characteristic, and the image data is cancer patient pathological diagram data; in this example, the effect of the inventive system was evaluated using the breast cancer item (BRCA) data in the public cancer data set TCGA, and mRNA expression data, DNA methylation data, miRNA expression data, and pathology picture data were loaded from a storage device into 578 breast cancer patients.
The preprocessing of the data preprocessing module comprises sample alignment, feature alignment, deleting features of which the null proportion of a plurality of groups of mathematical data exceeds a% of all samples, filling values of which the null proportion is lower than b% of all samples by IMPUTE2, removing features of which the variance is lower than a threshold value, performing feature alignment on specific omic data by TCGA-Assembler, analyzing a pathological diagram by HistomicsTK, cutting the pathological diagram by an OpenSlide tool, obtaining a z-block region of interest (ROI) of each sample, wherein z is greater than or equal to 1, and the pixel size of each ROI is r1×r2Wherein r is1And r2The method comprises the steps of respectively corresponding to the length and width pixel values of each region of interest (ROI), finally dividing omics and image data according to a specified proportion to obtain a training set and a testing set, wherein data output after passing through a data preprocessing module are composed of a plurality of samples, and each sample comprises a plurality of omics data and a plurality of pathological pictures. In fact, the above preprocessing can be summarized as: preprocessing of multinomial dataThe image processing and preprocessing module and the data set division are as follows:
the preprocessing of multigroup chemical data comprises:
sample alignment: only the sample containing four kinds of data is reserved, and other samples are deleted;
characteristic deletion: deleting the characteristics of which the median value of the multiple groups of mathematical data exceeds 20%, and simultaneously removing the characteristics of which the variance is lower than a threshold value;
data filling: null values with feature loss lower than 20% are filled in with the software IMPUTE 2;
gene mapping: in order that gene-gene interaction information can be used in DNA methylation data, features in DNA methylation are genetically mapped using a TCGA-Assembler, preserving the successfully mapped features in DNA methylation data.
The image data preprocessing comprises the following steps:
analyzing whether the pathological picture is abnormal or not by using software HistomicTK;
the pathology map was cropped using the OpenSlide tool, resulting in an infinite number of ROIs per sample, each with a pixel size of 224 × 224.
The data set partitioning step includes: training model parameters by taking 80% of samples as a training set, and performing performance evaluation on the trained models by taking 20% of samples as a testing set; after passing through a data set dividing module, outputting three groups of chemical data, namely a training set (462 samples) and a test set (116 samples), in the same matrix, wherein the front 462 rows of data are taken as training samples, 463-578 rows of data are taken as test samples, and the total number of the training set samples and the test set samples is 578 samples; meanwhile, training and testing samples in the image data correspond to corresponding training and testing samples in the multiple groups of mathematical data.
The omics processing module comprises an integrated gene network module and a supervised graph convolution module.
The integrated gene Network module utilizes interaction information between introduced genes of an external database HINT to realize information aggregation and feature screening of a plurality of omics data feature layers through a Graph convolution neural Network (GCN), the action principle of the integrated gene Network module is shown in figure 2, and the steps are as follows:
1) construction of an adjacency matrix A between genes Using Chiense-binary-physical-interaction datasets provided by the external database HINT(g)∈R(p×p)R is a real number set, and p is 2000 is the number of genes;
2) using an adjacency matrix A(g)Constructing a GCN to obtain neighbor information of a feature space:
Figure BDA0003467867200000111
wherein, u ═ {1,2,3}, X(1)、X(2)And X(3)Respectively representing mRNA expression data, DNA methylation data and miRNA expression data,
Figure BDA0003467867200000121
for the preprocessed training data or test data of omics u, in this module, the adjacency matrices of the training data set and the test data set are identical,
Figure BDA0003467867200000122
for the characterization of omics u, σ (·) is an activation function ReLU (·) ═ max (0, ·), max (0, ·) indicates a larger number of 0 and ·, which is an hadamard product,
Figure BDA0003467867200000123
for the parameters needing to be learned in the GCN training process of omics u, the integrated gene network module only learns the parameters of the GCN in the training stage, and miRNA expression data cannot be subjected to gene mapping through TCGA-Assembler, so that the miRNA expression data are not changed in the integrated gene network module.
The supervised graph convolution module, the operation principle of which is shown in fig. 3, uses cosine similarity between samples to construct a sample adjacency matrix A(s)Training the parameters of a supervised graph convolution module to obtain the preliminary prediction classification result of each omics, wherein the supervised graph convolution module is used for carrying out information aggregation on samples, and the steps areThe following were used:
1) construction of adjacency matrix A according to similarity between samples(s)
1.1) in the training stage, calculating the cosine similarity between samples in the training set to obtain the adjacency matrix of the training samples
Figure BDA0003467867200000124
R is a real number set, ntr462 is the number of test set samples:
Figure BDA0003467867200000125
wherein
Figure BDA0003467867200000126
An adjacency matrix representing samples i and j,
Figure BDA0003467867200000127
denotes the cosine similarity between sample i and sample j, xiAnd xiRespectively is the expression values of the sample i and the sample j in the omics, | · | | luminous flux2Representing a 2-norm operation on a,
Figure BDA0003467867200000128
is a contiguous matrix
Figure BDA0003467867200000129
I denotes the identity matrix, ∈ being determined by a given parameter k, k denotes the average number of edges retained by each node, including self-join, whose formula is as follows:
Figure BDA00034678672000001210
wherein, delta (·) is an indicator function, when sim (x)i,xj) When ≧ epsilon, δ (·) 1, otherwise δ (·) 0, n is the number of samples, i.e., the number of nodes, the same k value was used in all experiments on the same dataset, in this exampleK-5 in the breast cancer dataset;
1.2) in the testing stage, calculating cosine similarity between the training sample and the testing sample and between the testing sample and the testing sample, and replacing the training integrated testing set according to the formula in the step 1.1) to obtain an adjacent matrix of the testing set
Figure BDA0003467867200000131
R is the real number set, and n ═ 578 is the total number of samples;
2) the GCN with the supervised graph convolution module is constructed:
2.1) in the training phase, the GCN with the supervised graph convolution module is constructed according to the following formula:
Figure BDA0003467867200000132
Figure BDA0003467867200000133
wherein the content of the first and second substances,
Figure BDA0003467867200000134
for the characterization of omics u after integration of the gene network module,
Figure BDA0003467867200000135
for the adjacency matrix between the samples obtained in step b1.1),
Figure BDA0003467867200000136
and
Figure BDA0003467867200000137
for the parameters needed to be learned in the GCN training process of the omics u with the supervised graph convolution module,
Figure BDA0003467867200000138
and
Figure BDA0003467867200000139
to superviseImplicit characterization of graph convolution modularity u;
2.2) in the testing stage, inputting the testing sample adjacency matrix obtained in the step 1.2) and the testing set passing through the integrated gene network module into the constructed GCN for sample information aggregation to obtain omics characterization of the testing set;
3) obtaining a preliminary prediction classification result of each omics data:
Figure BDA00034678672000001310
wherein the content of the first and second substances,
Figure BDA00034678672000001311
predictive labels, n, representing training or test sets after passing through a supervised graph convolution moduletr=462、nte116 is the number of samples in the training set and test set, respectively, c 5 is the number of categories in the classification task,
Figure BDA00034678672000001312
to build parameters to be learned in the softmax classifier process, the formula of the softmax classifier is as follows:
Figure BDA00034678672000001313
wherein, the classification task includes a classification task having a category t {1,2, a., c } and a classification task having a category m {1,2, a., c } and a classification task having a category h [ [ 1,21,...,hc]TTo input the vector of the softmax classifier, htAnd hmRepresenting the t and m elements in the input vector h;
loss function constructed with supervised graph convolution module
Figure BDA0003467867200000141
The following were used:
Figure BDA0003467867200000142
wherein L isCE(. cndot.) is a cross-entropy loss function,
Figure BDA0003467867200000143
a one-hot coded prediction tag representing omic i sample j,
Figure BDA0003467867200000144
represent
Figure BDA0003467867200000145
M-th element of (1), yjAre authentic tags in the data set.
The image processing module extracts the depth characteristics of the pathological image by using the CNN to obtain the primary classification result of the image data on the cancer patient. The CNN is composed of 6 convolutional layers, a pooling layer and a full-link layer, wherein the kernel size of the convolutional layers is 3 x 3, each convolutional layer is provided with 64 feature maps, the size of the pooling layer is 2 x 2, the full-link layer is adopted in the last layer, and the image data preliminary classification result of the sample is output, and the CNN comprises the following steps:
1) in the training stage, the preprocessed pathological pictures His in the training set with the size of 224 × 224 are processedtrInputting into CNN, extracting pathological picture His from CNN by convolutional layertrThe data is reduced in dimension through the pooling layer, overfitting is avoided, and the result is output through the full-connection layer
Figure BDA0003467867200000146
Adjusting network structure parameters through back propagation, obtaining optimal network parameters through continuous training, and adopting a dropout mechanism in the training process to avoid overfitting;
2) in the testing stage, pre-processed pathological pictures His with the size of 224 × 224 in the test setteInputting the image into the trained CNN model, and outputting the preliminary prediction classification result of the image processing module
Figure BDA0003467867200000147
The structure of the fusion module is shown in fig. 4, and the fusion module comprises a preliminary cross fusion module and a network fusion module. The preliminary cross fusion module firstly constructs multi-mode data cross fusion vectors, then reconstructs the multi-mode data cross fusion vectors to obtain reconstructed vectors, and finally the network fusion module outputs the classification results after fusion.
The preliminary cross fusion module specifically performs the following operations:
1) constructing a multi-modal data cross fusion vector:
Figure BDA0003467867200000151
wherein the content of the first and second substances,
Figure BDA0003467867200000152
is a multi-mode data cross fusion vector of a training set or a test set, R is a real number set, ntr=462、nte116 is the number of samples in the training set and test set, respectively, c 5 is the number of classes in the classification task, U3 is the omic number,
Figure BDA0003467867200000153
in order to take omics u as input and pass through a supervised graph convolution module to obtain a preliminary prediction classification result of a training set or a test set,
Figure BDA0003467867200000154
the method comprises the steps of taking a pathological diagram as an input and conducting preliminary prediction classification result of a training set or a test set after passing through an image processing module;
2) reconstructing the multi-modal data cross fusion vector obtained in the step 1) to obtain a reconstructed vector of a training set or a test set
Figure BDA0003467867200000155
The network convergence module is composed of a full connection layer and comprises the following steps:
1) the reconstructed vector obtained by the preliminary cross fusion module
Figure BDA0003467867200000156
Inputting into a network fusion module, and outputting a final classification result:
Figure BDA0003467867200000157
wherein the content of the first and second substances,
Figure BDA0003467867200000158
inputting the network parameters to be trained in the training stage, inputting the reconstruction vectors v corresponding to the training set and the test set, and respectively obtaining the final classification results of the training set and the test set
Figure BDA0003467867200000159
And
Figure BDA00034678672000001510
the formula of the softmax classifier is as follows:
Figure BDA00034678672000001511
wherein, the classification task includes a classification t ═ 1, 2., c } and a classification m ═ 1, 2., c }, h ═ h ·1,...,hc]TFor the vector input to the softmax classifier, htAnd hmRepresenting the t-th and m-th elements in the input vector h;
2) and (3) calculating a loss function of the network fusion module by back propagation:
Figure BDA00034678672000001512
wherein L isCE(. is a cross entropy loss function, yjIs the true label of the sample j,
Figure BDA0003467867200000161
in order to train the final prediction result for sample j,
Figure BDA0003467867200000162
to represent
Figure BDA0003467867200000163
The m-th element of (1); in the training stage, a loss function after passing through the network fusion module needs to be calculated, and then parameters of the network fusion module are trained through back propagation; but the step is not needed in the testing stage, and the final classification result can be output in the step 1).
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. A cancer patient classification system based on fusion of omics and image data, characterized by: the system can complete the loading and preprocessing of end-to-end multigroup mathematical data and image data, introduces extra characteristic information by utilizing an external knowledge database to perform characteristic dimension reduction and information aggregation on specific omics data, supplements extra sample information by calculating the similarity between cancer patients, namely samples, and finally realizes the fusion of the classification results of the multigroup mathematical data and the image data by a multimodality cross fusion method, and specifically comprises the following functional modules:
the data loading and preprocessing module is used for importing a plurality of groups of mathematical data and image data and preprocessing the imported data;
the system comprises a multiomic processing module and a supervised graph convolution module, wherein the integrated gene network module utilizes interaction information between genes provided by an external database HINT to construct an adjacency matrix between the genes, utilizes an atlas neural network GCN to characterize the genes, utilizes cosine similarity to construct an adjacency matrix between samples, utilizes the adjacency matrix between the atlas neural network GCN and the samples to characterize the samples, and obtains a preliminary prediction classification result taking omic data as input;
the image processing module is used for representing and learning the image data by using a convolutional neural network to obtain a preliminary prediction classification result by taking the image data as input;
and the fusion module comprises a primary cross fusion module and a network fusion module, wherein the primary cross fusion module constructs a multi-mode data cross fusion vector, and inputs a reconstruction vector of the vector into the network fusion module, so that the fusion of the classification results of a plurality of omics and image data is realized, and the final classification result is obtained.
2. The system of claim 1 for classifying cancer patients based on fusion of omics and image data, wherein: the data loading and preprocessing module comprises a data loading module and a data preprocessing module; the data loading module is used for loading a plurality of groups of chemical data and image data, wherein the plurality of groups of chemical data comprise genomics data, transcriptomics data and epigenomics data, each row of the plurality of groups of chemical data represents the expression value of each sample on the corresponding characteristic, each column represents the expression value of one sample in the corresponding characteristic, and the image data is cancer patient pathological diagram data; the preprocessing of the data preprocessing module comprises sample alignment and feature alignment, wherein the features of a plurality of groups of mathematical data with null value proportion exceeding a% of all samples are deleted, the values with null value proportion lower than b% of all samples are filled by IMPUTE2, the features with variance lower than a threshold are removed, the feature alignment of specific omic data is carried out by TCGA-Assembler, the pathological diagram is analyzed by HistomicsTK, the pathological diagram is cut by OpenSlide tools, each sample obtains z interesting regions ROI, z is larger than or equal to 1, and the pixel size of each interesting region is r1×r2Wherein r is1And r2Respectively corresponding to the length and width pixel values of each ROI, and finally dividing omics and image data according to a specified proportion to obtain a training set and a test set; after passing through the data preprocessing module, the data are outputThe data obtained is composed of a plurality of samples, and each sample comprises a plurality of omics data and a plurality of pathological pictures.
3. The system of claim 1 for classifying cancer patients based on fusion of omics and image data, wherein: the multiomic processing module comprises an integrated gene network module and a supervised graph convolution module;
the integrated gene network module utilizes interaction information between introduced genes of an external database HINT to realize information aggregation and feature screening of a plurality of omics data feature levels through a graph convolution neural network GCN, and comprises the following steps:
a1) construction of an adjacency matrix A between genes Using Chiense-binary-physical-interaction datasets provided by the external database HINT(g)∈R(p×p)R is a real number set, and p is a characteristic number;
a2) using the adjacency matrix A obtained in step a1)(g)Constructing a graph convolution neural network (GCN) to obtain neighbor information of a feature space:
Figure FDA0003467867190000021
wherein, omics U is {1, 2., U }, and U is a plurality of groups of mathematical numbers,
Figure FDA0003467867190000031
training and test sets of preprocessed omics u, respectively, are input into the formula of step a2) during the training phase and the test phase, respectively,
Figure FDA0003467867190000032
for the implicit layer characterization of omics u, σ (·) is an activation function ReLU (·) ═ max (0, ·), max (0, ·) indicates a larger number of 0 and · which is an hadamard product,
Figure FDA0003467867190000033
GCN training for omics u-on-map convolution neural networksIn the process, parameters needing to be learned are integrated with the gene network module, and the parameters of the graph convolution neural network GCN are only learned in the training stage;
the supervised image convolution module constructs a sample adjacency matrix A according to cosine similarity between samples(s)The method for obtaining the preliminary prediction classification result of each omic through the graph convolution neural network GCN comprises the following steps:
b1) construction of adjacency matrix A according to similarity between samples(s)
b1.1) in the training stage, calculating the cosine similarity between the samples in the training set to obtain the adjacency matrix of the training samples
Figure FDA0003467867190000034
Figure FDA0003467867190000035
Wherein the content of the first and second substances,
Figure FDA0003467867190000036
an adjacency matrix representing samples i and j,
Figure FDA0003467867190000037
denotes the cosine similarity between sample i and sample j, xiAnd xiRespectively is the expression values of the sample i and the sample j in the omics, | · | | luminous flux2Representing a 2-norm operation on a,
Figure FDA0003467867190000038
is a contiguous matrix
Figure FDA0003467867190000039
I denotes the identity matrix, ∈ being determined by a given parameter k, k denotes the average number of edges retained by each node, including self-join, whose formula is as follows:
Figure FDA00034678671900000310
wherein, delta (·) is an indicator function, when sim (x)i,xj) When the k is equal to 1, each node is only connected, and the graph convolution neural network GCN is equal to a full connection layer at the moment;
b1.2) in the testing stage, calculating cosine similarity between the training sample and the testing sample and between the testing sample and the testing sample, and replacing the training integrated test set according to the formula in the step b1.1) to obtain an adjacency matrix of the testing sample
Figure FDA0003467867190000041
b2) A graph convolution neural network GCN with a supervision graph convolution module is constructed:
b2.1) in the training stage, the construction formula of the graph convolution neural network GCN with the supervision graph convolution module is as follows:
Figure FDA0003467867190000042
Figure FDA0003467867190000043
wherein the content of the first and second substances,
Figure FDA0003467867190000044
for the characterization of omics u after integration of the gene network module,
Figure FDA0003467867190000045
for the adjacency matrix between the samples obtained in step b1.1),
Figure FDA0003467867190000046
and
Figure FDA0003467867190000047
for the parameters needed to be learned in the GCN training process of the graph convolution neural network with the supervision graph convolution module in the omics u,
Figure FDA0003467867190000048
and
Figure FDA0003467867190000049
is an implicit characterization of supervised graph convolution modularity omics u;
b2.2) in the testing stage, inputting the adjacency matrix of the testing sample obtained in the step b1.2) and the testing set passing through the integrated gene network module into the constructed graph convolution neural network GCN for sample information aggregation to obtain omics characterization of the testing set;
b3) obtaining a preliminary prediction classification result of each omics data:
Figure FDA00034678671900000410
wherein the content of the first and second substances,
Figure FDA00034678671900000411
predictive labels, n, representing training or test sets after passing through a supervised graph convolution moduletr、nteThe number of samples in the training set and the test set, respectively, c the number of classes in the classification task,
Figure FDA00034678671900000412
to build parameters to be learned in the softmax classifier process, the formula of the softmax classifier is as follows:
Figure FDA00034678671900000413
wherein, the classification task includes a classification t ═ 1, 2., c } and a classification m ═ 1, 2., c }, h ═ h ·1,h2,...,hc]TFor the vector input to the softmax classifier, htAnd hmRepresenting the t and m elements in the input vector h;
loss function constructed with supervised graph convolution module
Figure FDA0003467867190000051
The following were used:
Figure FDA0003467867190000052
wherein L isCE(. cndot.) is a cross-entropy loss function,
Figure FDA0003467867190000053
a one-hot coded prediction tag representing omic i samples j,
Figure FDA0003467867190000054
to represent
Figure FDA0003467867190000055
M-th element of (1), yjAre authentic tags in the data set.
4. A cancer patient classification system based on multiomic and image data fusion as defined in claim 1, wherein: the image processing module extracts depth features of the pathological image by using a Convolutional Neural Network (CNN), wherein the CNN consists of l convolutional layers, a pooling layer and a full-connection layer, l is greater than or equal to 1, and the kernel size of each convolutional layer is s1×s2Each convolution layer has q feature maps, and the pooling layer size is s3×s4And the last layer adopts a full connection layer, and outputs the preliminary classification result of the image data of the sample, and the method comprises the following steps:
c1) in trainingStage, the pre-processed size is r1×r2The pathological picture His in the training settrInputting into convolutional neural network CNN, extracting pathological picture His from convolutional layer by convolutional layertrThe data is reduced through the pooling layer, and the result is output through the full-connection layer
Figure FDA0003467867190000056
Adjusting network structure parameters through back propagation, obtaining optimal network parameters through continuous training, and adopting a dropout mechanism in the training process to avoid overfitting;
c2) in the testing stage, the pre-processed size is r1×r2Test concentrated pathology picture HisteInputting the data into a trained convolutional neural network CNN, and outputting the preliminary prediction classification result of the image processing module
Figure FDA0003467867190000057
5. The system of claim 1 for classifying cancer patients based on fusion of omics and image data, wherein: the fusion module comprises a preliminary cross fusion module and a network fusion module, the preliminary cross fusion module firstly constructs multi-modal data cross fusion vectors, then reconstructs the multi-modal data cross fusion vectors to obtain reconstructed vectors, and finally the network fusion module outputs the classification results after fusion;
the preliminary cross fusion module specifically performs the following operations:
d1) constructing a multi-modal data cross fusion vector:
Figure FDA0003467867190000061
wherein the content of the first and second substances,
Figure FDA0003467867190000062
is a training set orThe multi-mode data of the test set are cross-fused with vectors, R is a real number set, ntr、nteRespectively the number of samples in the training set and the test set, c the number of categories in the classification task, U the number of omics,
Figure FDA0003467867190000063
in order to take omics u as input and pass through a supervised graph convolution module to obtain a preliminary prediction classification result of a training set or a test set,
Figure FDA0003467867190000064
the method comprises the steps of taking a pathological diagram as an input and conducting preliminary prediction classification result of a training set or a test set after passing through an image processing module;
d2) reconstructing the multi-modal data cross fusion vector obtained in the step d1) to obtain a reconstructed vector of a training set or a test set
Figure FDA0003467867190000065
The network fusion module is composed of a full connection layer and comprises the following steps:
e1) using the reconstructed vector obtained in step d2)
Figure FDA0003467867190000066
Inputting into a network fusion module, and outputting a final classification result:
Figure FDA0003467867190000067
wherein the content of the first and second substances,
Figure FDA0003467867190000068
the network parameters to be trained in the training stage are input into the reconstruction vectors corresponding to the training set and the test set to respectively obtain the final classification results of the training set and the test set
Figure FDA0003467867190000069
And
Figure FDA00034678671900000610
the formula of the softmax classifier is as follows:
Figure FDA00034678671900000611
wherein, the classification task includes a classification t ═ 1, 2., c } and a classification m ═ 1, 2., c }, h ═ h ·1,h2,...,hc]TFor the vector input to the softmax classifier, htAnd hmRepresenting the t and m elements in the input vector h;
e2) and (3) calculating a loss function L of the network fusion module by back propagation:
Figure FDA0003467867190000071
wherein L isCE(. is a cross entropy loss function, yjIs the true label of the sample j and,
Figure FDA0003467867190000072
in order to train the final prediction result for sample j,
Figure FDA0003467867190000073
to represent
Figure FDA0003467867190000074
The m-th element of (1); in the training stage, a loss function after passing through the network fusion module needs to be calculated, and parameters of the network fusion module are trained through back propagation; but this step need not be passed through in the testing phase, the final classification result is output in step e 1).
CN202210034741.6A 2022-01-13 2022-01-13 Cancer patient classification system based on multiomics and image data fusion Pending CN114530222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210034741.6A CN114530222A (en) 2022-01-13 2022-01-13 Cancer patient classification system based on multiomics and image data fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210034741.6A CN114530222A (en) 2022-01-13 2022-01-13 Cancer patient classification system based on multiomics and image data fusion

Publications (1)

Publication Number Publication Date
CN114530222A true CN114530222A (en) 2022-05-24

Family

ID=81620847

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210034741.6A Pending CN114530222A (en) 2022-01-13 2022-01-13 Cancer patient classification system based on multiomics and image data fusion

Country Status (1)

Country Link
CN (1) CN114530222A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035988A (en) * 2022-08-15 2022-09-09 武汉明炀大数据科技有限公司 Medical image processing method, system, equipment and medium based on cloud computing
CN115631847A (en) * 2022-10-19 2023-01-20 哈尔滨工业大学 Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment
CN115985513A (en) * 2023-01-05 2023-04-18 徐州医科大学科技园发展有限公司 Data processing method, device and equipment based on multigroup cancer typing

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020113673A1 (en) * 2018-12-07 2020-06-11 深圳先进技术研究院 Cancer subtype classification method employing multiomics integration
CN112131402A (en) * 2020-09-14 2020-12-25 刘容恺 PPI knowledge graph representation learning method based on protein family clustering
CN112687327A (en) * 2020-12-28 2021-04-20 中山依数科技有限公司 Cancer survival analysis system based on multitask and multi-mode

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020113673A1 (en) * 2018-12-07 2020-06-11 深圳先进技术研究院 Cancer subtype classification method employing multiomics integration
CN112131402A (en) * 2020-09-14 2020-12-25 刘容恺 PPI knowledge graph representation learning method based on protein family clustering
CN112687327A (en) * 2020-12-28 2021-04-20 中山依数科技有限公司 Cancer survival analysis system based on multitask and multi-mode

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035988A (en) * 2022-08-15 2022-09-09 武汉明炀大数据科技有限公司 Medical image processing method, system, equipment and medium based on cloud computing
CN115631847A (en) * 2022-10-19 2023-01-20 哈尔滨工业大学 Early lung cancer diagnosis system based on multiple mathematical characteristics, storage medium and equipment
CN115985513A (en) * 2023-01-05 2023-04-18 徐州医科大学科技园发展有限公司 Data processing method, device and equipment based on multigroup cancer typing
CN115985513B (en) * 2023-01-05 2023-11-03 徐州医科大学科技园发展有限公司 Data processing method, device and equipment based on multiple groups of chemical cancer typing

Similar Documents

Publication Publication Date Title
CN108446730B (en) CT pulmonary nodule detection device based on deep learning
CN108492297B (en) MRI brain tumor positioning and intratumoral segmentation method based on deep cascade convolution network
CN111191660B (en) Colon cancer pathology image classification method based on multi-channel collaborative capsule network
CN112116605B (en) Pancreas CT image segmentation method based on integrated depth convolution neural network
CN114530222A (en) Cancer patient classification system based on multiomics and image data fusion
CN114730463A (en) Multi-instance learner for tissue image classification
CN112687327B (en) Cancer survival analysis system based on multitasking and multi-mode
CN111369565B (en) Digital pathological image segmentation and classification method based on graph convolution network
CN114998220B (en) Tongue image detection and positioning method based on improved Tiny-YOLO v4 natural environment
CN113378796B (en) Cervical cell full-section classification method based on context modeling
CN111276240A (en) Multi-label multi-mode holographic pulse condition identification method based on graph convolution network
CN113469958A (en) Method, system, equipment and storage medium for predicting development potential of embryo
CN111524140A (en) Medical image semantic segmentation method based on CNN and random forest method
CN114492620A (en) Credible multi-view classification method based on evidence deep learning
CN114037699B (en) Pathological image classification method, equipment, system and storage medium
CN116128855A (en) Algorithm for detecting tumor protein marker expression level based on pathological image characteristics
CN114580501A (en) Bone marrow cell classification method, system, computer device and storage medium
CN112733859B (en) Depth migration semi-supervised domain self-adaptive classification method for histopathological image
CN116486156A (en) Full-view digital slice image classification method integrating multi-scale feature context
CN116188428A (en) Bridging multi-source domain self-adaptive cross-domain histopathological image recognition method
Yan et al. Two and multiple categorization of breast pathological images by transfer learning
CN113762262B (en) Image data screening and image segmentation model training method, device and storage medium
CN114998647A (en) Breast cancer full-size pathological image classification method based on attention multi-instance learning
CN114496099A (en) Cell function annotation method, device, equipment and medium
Kong et al. Toward large-scale histopathological image analysis via deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination