CN113537358B - Cancer subtype identification method and system based on multiple sets of mathematical data sets - Google Patents
Cancer subtype identification method and system based on multiple sets of mathematical data sets Download PDFInfo
- Publication number
- CN113537358B CN113537358B CN202110813430.5A CN202110813430A CN113537358B CN 113537358 B CN113537358 B CN 113537358B CN 202110813430 A CN202110813430 A CN 202110813430A CN 113537358 B CN113537358 B CN 113537358B
- Authority
- CN
- China
- Prior art keywords
- similarity
- subspaces
- representing
- data
- span
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 39
- 201000011510 cancer Diseases 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 title claims abstract description 25
- 239000000126 substance Substances 0.000 claims abstract description 12
- 238000003064 k means clustering Methods 0.000 claims abstract description 8
- 239000002679 microRNA Substances 0.000 claims abstract description 8
- 238000012847 principal component analysis method Methods 0.000 claims abstract description 8
- 230000014509 gene expression Effects 0.000 claims description 17
- 239000011159 matrix material Substances 0.000 claims description 17
- 238000010586 diagram Methods 0.000 claims description 7
- 230000007067 DNA methylation Effects 0.000 claims description 6
- 108091070501 miRNA Proteins 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 239000000758 substrate Substances 0.000 claims 2
- 108700011259 MicroRNAs Proteins 0.000 abstract description 2
- 108020004999 messenger RNA Proteins 0.000 abstract description 2
- 230000011987 methylation Effects 0.000 abstract description 2
- 238000007069 methylation reaction Methods 0.000 abstract description 2
- 230000037361 pathway Effects 0.000 abstract description 2
- 238000004393 prognosis Methods 0.000 abstract description 2
- 238000000513 principal component analysis Methods 0.000 description 5
- 230000004083 survival effect Effects 0.000 description 4
- 201000010915 Glioblastoma multiforme Diseases 0.000 description 3
- 208000005017 glioblastoma Diseases 0.000 description 3
- 201000005243 lung squamous cell carcinoma Diseases 0.000 description 3
- 208000006545 Chronic Obstructive Pulmonary Disease Diseases 0.000 description 2
- 208000030808 Clear cell renal carcinoma Diseases 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 206010073251 clear cell renal cell carcinoma Diseases 0.000 description 2
- XOOUIPVCVHRTMJ-UHFFFAOYSA-L zinc stearate Chemical compound [Zn+2].CCCCCCCCCCCCCCCCCC([O-])=O.CCCCCCCCCCCCCCCCCC([O-])=O XOOUIPVCVHRTMJ-UHFFFAOYSA-L 0.000 description 2
- 210000000481 breast Anatomy 0.000 description 1
- 201000010897 colon adenocarcinoma Diseases 0.000 description 1
- 208000029742 colonic neoplasm Diseases 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 208000024312 invasive carcinoma Diseases 0.000 description 1
- 238000001325 log-rank test Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The invention discloses a cancer subtype identification method and system based on a plurality of sets of chemical data. The method comprises the following steps: acquiring sample data of each patient; performing dimension reduction treatment on the sample data by adopting a principal component analysis method; constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients; projecting each similarity graph into a low-dimensional subspace; merging the subspaces on a Grassman manifold; based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm. The present invention combines multilateral molecular data (mRNA, microRNA and methylation), clinical data and pathway information to identify patient populations with different biological characteristics and different prognosis, thereby enabling rapid and accurate identification of cancer subtypes.
Description
Technical Field
The invention relates to the technical field of cancer subtype identification, in particular to a method and a system for identifying cancer subtypes based on multiple sets of chemical data sets.
Background
Most of the previous studies focused on the identification of cancer subtypes using single data, with little reliance on comprehensive analysis. The definition of the analysis-by-synthesis is the use of multiple source datasets to better understand the system. Although there is a great deal of research based on single source histology data, most of the etiology of complex traits remains unexplained. Single source histology data does not allow for comprehensive observation of biological systems and performs poorly in identifying new subtypes.
Disclosure of Invention
The invention aims to provide a method and a system for identifying cancer subtypes based on multiple sets of chemical data sets, which are used for quickly and accurately identifying the cancer subtypes.
In order to achieve the above object, the present invention provides the following solutions:
a method of cancer subtype identification based on a plurality of sets of mathematical data, comprising:
acquiring sample data of each patient;
performing dimension reduction treatment on the sample data by adopting a principal component analysis method;
constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients;
projecting each similarity graph into a low-dimensional subspace;
merging the subspaces on a Grassman manifold;
based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm.
Optionally, the sample data comprises gene expression, miRNA expression, and DNA methylation.
Optionally, the expression of the similarity graph is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge E (m) Representing the connection between patients.
Optionally, constructing a similarity graph based on the dimensionality reduced data, and then further includes:
calculating a similarity matrix of the similarity graph;
and according to the similarity matrix, adopting a k-nearest neighbor algorithm to reserve the local structure of each similarity graph.
The invention also provides a cancer subtype identification system based on a plurality of groups of chemical data sets, which comprises:
a sample acquisition film for acquiring sample data of each patient;
the dimension reduction module is used for carrying out dimension reduction processing on the sample data by adopting a principal component analysis method;
the similarity diagram construction module is used for constructing a similarity diagram based on the dimensionality reduced data; the similarity graph is used for representing the similarity between patients;
the projection module is used for projecting each similar graph to the low-dimensional subspace;
a merging module for merging the subspaces on the Grassman manifold;
and the identification module is used for identifying the cancer subtype through a k-means clustering algorithm based on the combined subspaces.
Optionally, the sample data comprises gene expression, miRNA expression, and DNA methylation.
Optionally, the expression of the similarity graph is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge E (m) Representing the connection between patients.
Optionally, the method further comprises:
the calculation module is used for calculating a similarity matrix of the similarity graph;
and the reservation module is used for reserving the local structure of each similarity graph by adopting a k-nearest neighbor algorithm according to the similarity matrix.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a cancer subtype identification method based on a plurality of groups of chemical data sets, which comprises the following steps: acquiring sample data of each patient; performing dimension reduction treatment on the sample data by adopting a principal component analysis method; constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients; projecting each similarity graph into a low-dimensional subspace; merging the subspaces on a Grassman manifold; based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm. The present invention combines multilateral molecular data (mRNA, microRNA and methylation), clinical data and pathway information to identify patient populations with different biological characteristics and different prognosis, thereby enabling rapid and accurate identification of cancer subtypes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying cancer subtypes based on multiple sets of mathematical data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a cancer subtype identification method based on a plurality of sets of mathematical data according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a method and a system for identifying cancer subtypes based on multiple sets of chemical data sets, which are used for quickly and accurately identifying the cancer subtypes.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1-2, the invention discloses a method for identifying cancer subtype based on a plurality of groups of chemical data sets, which comprises the following steps:
step 101: sample data for each patient is obtained. The sample data includes gene expression, miRNA expression, and DNA methylation.
Step 102: and performing dimension reduction treatment on the sample data by adopting a principal component analysis method.
Step 103: constructing a similarity graph based on the dimension-reduced data; the similarity graph is used to represent the similarity between patients.
The expression of the similarity graph is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge E (m) And represents the connection between patients.
Step 104: each similarity graph is projected into a low-dimensional subspace.
Step 105: the subspaces are merged on a glasman manifold.
Step 106: based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm.
Wherein, after step 103, further comprises:
calculating a similarity matrix of the similarity graph;
and according to the similarity matrix, adopting a k-nearest neighbor algorithm to reserve the local structure of each similarity graph.
Specific examples are as follows:
(1) The present invention is downloaded from the TCGA website, including BIC (breast invasive carcinoma), COAD (colon adenocarcinoma), KRCC (renal clear cell carcinoma), GBM (glioblastoma multiforme) and LSCC (lung squamous cell carcinoma). Each cancer contains three data types (DNA methylation, gene expression, and miRNA expression).
(2) The present invention uses popular Principal Component Analysis (PCA) techniques for dimension reduction. The invention performs PCA on a single data type as a matrix, the goal of which is to find the maximum projection variance of all samples, which can be expressed as:
matrix w= [ W 1 ,w 2 ,…,w k ]Is a orthonormal basis for a low dimensional space. Clearly, eq.2 solution is defined by Z (m) Top k feature vector. Let lambda be 1 ≥λ 2 Not less than … not less than 0 is Z (m) />Is lambda i Is w k . Thus, the final result of PCA is calculated as H (m)T =W T Z (m) 。
(3) The present invention builds a patient-to-patient map in PCA space that models specific structures within each view. For the mth figure, G (m) ={V (m) ,E (m) Node V (m) Representing the patient in space, edge E (m) Representing the connection between these patients. Thus, the present invention first calculates the graph G (m) Similarity matrix of (c)W (m) . Each elementThe similarity between patients i and j is measured, and the calculation formula is as follows
The parameter t is a normalization factor. The higher the value, the more similar the two patients are.
Next, the present invention preserves the k-nearest neighbor (k-NN) of each patient to preserve the local structure of each graph.
wherein Ni Consists of the k nearest neighbors of patient i. The parameter k depends on the sample size. Since different histology have different structures, the k-NN map is more similar than the originalMore typically.
(4) To further extract key features of the histology, the present invention projects all the graphs into a low-dimensional subspace and obtains their relevant embedding in these spaces.
The invention firstly calculates the normalized graph Laplace matrix L (m) Defined as wherein D(m) Is->Is defined by +.>And (5) calculating. Using a learned Laplace matrix U (m) Can be communicated according to a spectral clustering methodThe relevant eigenvalue problems are solved to calculate their embedding.
The solution of equation (4) is a normalized Laplace matrix L (m) Is defined in the block (a) and the minimum feature vector k of (b). Since embedding is the base of each space, the histology is more comparable than the original graph.
(5) For M-embedding of histology, minimizing the integrated embedding and the euclidean distance between it is a natural way to obtain a fused representation,
however, this approach assumes that similar patients are close in euclidean space, but this is often not the case. It is clear that multiple sets of mathematical data are complex and heterogeneous and therefore more suitable for measuring their distance on manifolds than euclidean space, such as glasman manifolds.
The glasman manifold G (k, n) is a set of k-dimensional linear subspaces. Mathematically, each point of G (k, n) represents a set of orthonormal bases Y, which can span a k-space span (Y). Thus, the space span (Y) andthe distance between can be defined as the sum of the principal angles of all base pairs:
wherein Is the base point Y i And base->A main included angle between the two.
Based on this measurement, the distance between embeddings can be expressed as:
thus, the objective function is
Equation (8) forces the integrated representation U to approach all embedded U in terms of projected distance on the Grassman manifold (m) . Its solution is to correct the Laplace matrixIs defined as the average maximum eigenvector k of (c).
Finally, by the method in L mod And obtaining the cluster labels by applying a k-means algorithm.
To verify the effectiveness of this method, the present invention compares it to Similar Network Fusion (SNF) and glasmann clustering. The present invention compares the method of the present invention with the results of SNF and Grassman clustering using Cox survival p values, the results are shown in Table (1). For fair comparison, the invention takes the same number of subtypes for SNF and Grassman clusters for each cancer. The method of the present invention shows important differences between survival times. Three-fifths of the cancers were studied by SNF, indicating that the methods of the invention have significant differences in survival time between the different subtypes.
Table 1 log rank test analysis of five cancer survival rates
Type of cancer | Grassman clustering | SNF | The method of the invention |
BIC (5 kinds) | 2.0×10 -4 | 1.1×10 -3 | 4.3×10 -5 |
GBM (3 kinds) | 4.3×10 -3 | 2.0×10 -4 | 2.3×10 -4 |
KRCCC (3 kinds) | 2.8×10 -2 | 2.9×10 -2 | 1.4×10 -1 |
LSCC (4 kinds) | 1.6×10 -2 | 2.0×10 -2 | 2.7×10 -3 |
COAD (3 kinds) | 4.2×10 -2 | 2.0×10 -2 | 2.7×10 -3 |
The invention also provides a cancer subtype identification system based on a plurality of groups of chemical data sets, which comprises:
a sample acquisition film for acquiring sample data of each patient;
the dimension reduction module is used for carrying out dimension reduction processing on the sample data by adopting a principal component analysis method;
the similarity diagram construction module is used for constructing a similarity diagram based on the dimensionality reduced data; the similarity graph is used for representing the similarity between patients;
the projection module is used for projecting each similar graph to the low-dimensional subspace;
a merging module for merging the subspaces on the Grassman manifold;
and the identification module is used for identifying the cancer subtype through a k-means clustering algorithm based on the combined subspaces.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.
Claims (8)
1. A method of identifying a subtype of cancer based on a plurality of sets of mathematical data, comprising:
acquiring sample data of each patient;
performing dimension reduction treatment on the sample data by adopting a principal component analysis method;
constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients;
projecting each similarity graph into a low-dimensional subspace;
merging the subspaces on a Grassman manifold;
based on the combined subspaces, identifying cancer subtypes through a k-means clustering algorithm;
wherein, merging each subspace on the Grassman manifold specifically comprises:
space span (Y)The distance between is defined as the sum of the principal angles of all base pairs:
wherein ,is the base point Y i And base->The main angles between i represent the space span (Y) and +.>The number of the ith base pair in between, k denotes the space span (Y) and +.>Total number of base pairs between, < >>Representing the sum of squares of cosine values of main included angles of all base pairs;
based on this measurement, the distance between embeddings can be expressed as:
wherein M represents the total number of histology,representing span (U) common subspace and span (U) (m) ) The Grassmann manifold distance of subspaces, U denotes the base of all groups of common subspaces, U (m) Representing the base representing the mth histology-specific subspace, tr (UU) T U (m) U (m)T ) Representation span (U) (m) ) Sum of squares of cosine values of main included angles of all substrate pairs between subspaces and span (U) common subspaces;
thus, the objective function is:
wherein I represents a unit array;
forcing the integrated representation U to approach all embedded U in terms of projected distance on the Grassman manifold (m) 。
2. The method of claim 1, wherein the sample data comprises gene expression, miRNA expression, and DNA methylation.
3. The method for identifying cancer subtypes based on multiple sets of chemical data according to claim 1, characterized in that the expression of the similarity map is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge G (m) ={V (m) ,E (m) And represents the connection between patients.
4. The method for identifying cancer subtypes based on multiple sets of chemical data according to claim 1, characterized in that after constructing a similarity map based on the dimensionality-reduced data, it further comprises:
calculating a similarity matrix of the similarity graph;
and according to the similarity matrix, adopting a k-nearest neighbor algorithm to reserve the local structure of each similarity graph.
5. A cancer subtype identification system based on a plurality of sets of mathematical data, comprising:
a sample acquisition film for acquiring sample data of each patient;
the dimension reduction module is used for carrying out dimension reduction processing on the sample data by adopting a principal component analysis method;
the similarity diagram construction module is used for constructing a similarity diagram based on the dimensionality reduced data; the similarity graph is used for representing the similarity between patients;
the projection module is used for projecting each similar graph to the low-dimensional subspace;
a merging module for merging the subspaces on the Grassman manifold;
the identification module is used for identifying the cancer subtype through a k-means clustering algorithm based on the combined subspaces;
wherein, merging each subspace on the Grassman manifold specifically comprises:
space span (Y)The distance between is defined as the sum of the principal angles of all base pairs:
wherein ,is the base point Y i And base->The main angles between i represent the space span (Y) and +.>The number of the ith base pair in between, k denotes the space span (Y) and +.>Total number of base pairs between, < >>Representing the sum of squares of cosine values of main included angles of all base pairs;
based on this measurement, the distance between embeddings can be expressed as:
wherein M represents the total number of histology,representing span (U) common subspace and span (U) (m) ) The Grassmann manifold distance of subspaces, U denotes the base of all groups of common subspaces, U (m) Representing the base representing the mth histology specific subspace,/->Representation span (U) (m) ) Sum of squares of cosine values of main included angles of all substrate pairs between subspaces and span (U) common subspaces;
thus, the objective function is:
s.t.U T U=I
wherein I represents a unit array;
forcing the integrated representation U to approach all embedded U in terms of projected distance on the Grassman manifold (m) 。
6. The multiple set of chemical data based cancer subtype recognition system of claim 5, wherein the sample data includes gene expression, miRNA expression, and DNA methylation.
7. The multiple sets of chemical data based cancer subtype identification system of claim 5, wherein the expression of the similarity map is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge G (m) ={V (m) ,E (m) And represents the connection between patients.
8. The multiple-set based cancer subtype identification system of claim 5, further comprising:
the calculation module is used for calculating a similarity matrix of the similarity graph;
and the reservation module is used for reserving the local structure of each similarity graph by adopting a k-nearest neighbor algorithm according to the similarity matrix.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110813430.5A CN113537358B (en) | 2021-07-19 | 2021-07-19 | Cancer subtype identification method and system based on multiple sets of mathematical data sets |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110813430.5A CN113537358B (en) | 2021-07-19 | 2021-07-19 | Cancer subtype identification method and system based on multiple sets of mathematical data sets |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113537358A CN113537358A (en) | 2021-10-22 |
CN113537358B true CN113537358B (en) | 2023-09-01 |
Family
ID=78100178
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110813430.5A Active CN113537358B (en) | 2021-07-19 | 2021-07-19 | Cancer subtype identification method and system based on multiple sets of mathematical data sets |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113537358B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114025320A (en) * | 2021-11-08 | 2022-02-08 | 易枭零部件科技(襄阳)有限公司 | Indoor positioning method based on 5G signal |
CN117437973B (en) * | 2023-12-21 | 2024-03-08 | 齐鲁工业大学(山东省科学院) | Single cell transcriptome sequencing data interpolation method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101123983A (en) * | 2004-10-27 | 2008-02-13 | 米迪缪尼股份有限公司 | Modulation of antibody specificity by tailoring the affinity to cognate antigens |
CN101395472A (en) * | 2006-01-17 | 2009-03-25 | 协乐民公司 | Method for predicting biological systems responses |
CN101473031A (en) * | 2006-04-03 | 2009-07-01 | 普罗美加公司 | Permuted and nonpermuted luciferase biosensors |
CN106529165A (en) * | 2016-10-28 | 2017-03-22 | 合肥工业大学 | Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix |
CN110334748A (en) * | 2019-06-14 | 2019-10-15 | 大连理工大学 | The cancer subtypes classification method of multiple groups data integration is carried out based on D-S evidence theory |
CN111291777A (en) * | 2018-12-07 | 2020-06-16 | 深圳先进技术研究院 | Cancer subtype classification method based on multigroup chemical integration |
-
2021
- 2021-07-19 CN CN202110813430.5A patent/CN113537358B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101123983A (en) * | 2004-10-27 | 2008-02-13 | 米迪缪尼股份有限公司 | Modulation of antibody specificity by tailoring the affinity to cognate antigens |
CN101395472A (en) * | 2006-01-17 | 2009-03-25 | 协乐民公司 | Method for predicting biological systems responses |
CN101473031A (en) * | 2006-04-03 | 2009-07-01 | 普罗美加公司 | Permuted and nonpermuted luciferase biosensors |
CN106529165A (en) * | 2016-10-28 | 2017-03-22 | 合肥工业大学 | Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix |
CN111291777A (en) * | 2018-12-07 | 2020-06-16 | 深圳先进技术研究院 | Cancer subtype classification method based on multigroup chemical integration |
CN110334748A (en) * | 2019-06-14 | 2019-10-15 | 大连理工大学 | The cancer subtypes classification method of multiple groups data integration is carried out based on D-S evidence theory |
Also Published As
Publication number | Publication date |
---|---|
CN113537358A (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113537358B (en) | Cancer subtype identification method and system based on multiple sets of mathematical data sets | |
Klimovskaia et al. | Poincaré maps for analyzing complex hierarchies in single-cell data | |
CN110222745B (en) | Similarity learning based and enhanced cell type identification method | |
US11482305B2 (en) | Artificial intelligence analysis of RNA transcriptome for drug discovery | |
CN112750502B (en) | Single cell transcriptome sequencing data clustering recommendation method based on two-dimensional distribution structure judgment | |
Nanni et al. | Combining multiple approaches for gene microarray classification | |
Datta et al. | Evaluation of clustering algorithms for gene expression data | |
CN103559426A (en) | Protein functional module excavating method for multi-view data fusion | |
Mohammed et al. | Evaluation of partitioning around medoids algorithm with various distances on microarray data | |
US20230162818A1 (en) | Methods of determining correspondences between biological properties of cells | |
CN116741397B (en) | Cancer typing method, system and storage medium based on multi-group data fusion | |
Torkey et al. | Machine learning model for cancer diagnosis based on RNAseq microarray | |
Siraj-Ud-Doulah et al. | Defining homogenous climate zones of Bangladesh using cluster analysis | |
Bhaskar et al. | Diffusion curvature for estimating local curvature in high dimensional data | |
CN112163595B (en) | Method and device for acquiring typical electricity utilization mode of user and electronic equipment | |
Le Vuong et al. | Ranking loss: a ranking-based deep neural network for colorectal cancer grading in pathology images | |
Wirth et al. | Analysis of microRNA expression using machine learning | |
CN108108589A (en) | The recognition methods of esophageal squamous cell carcinoma label based on network index variance analysis | |
KR102225231B1 (en) | IDENTIFYING METHOD FOR TUMOR PATIENT BASED ON miRNA IN EXOSOME AND APPARATUS FOR THE SAME | |
Karaletsos et al. | ShapePheno: unsupervised extraction of shape phenotypes from biological image collections | |
Ren et al. | Multivariate functional data clustering using adaptive density peak detection | |
CN115828093B (en) | Method and device for analyzing histology sample, electronic equipment and storage medium | |
Leung et al. | Gene selection for brain cancer classification | |
CN116884554B (en) | Electronic medical record classification management method and system | |
CN115881218B (en) | Gene automatic selection method for whole genome association analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |