CN113537358B - Cancer subtype identification method and system based on multiple sets of mathematical data sets - Google Patents

Cancer subtype identification method and system based on multiple sets of mathematical data sets Download PDF

Info

Publication number
CN113537358B
CN113537358B CN202110813430.5A CN202110813430A CN113537358B CN 113537358 B CN113537358 B CN 113537358B CN 202110813430 A CN202110813430 A CN 202110813430A CN 113537358 B CN113537358 B CN 113537358B
Authority
CN
China
Prior art keywords
similarity
subspaces
representing
data
span
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110813430.5A
Other languages
Chinese (zh)
Other versions
CN113537358A (en
Inventor
蔡宏民
阿里
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110813430.5A priority Critical patent/CN113537358B/en
Publication of CN113537358A publication Critical patent/CN113537358A/en
Application granted granted Critical
Publication of CN113537358B publication Critical patent/CN113537358B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a cancer subtype identification method and system based on a plurality of sets of chemical data. The method comprises the following steps: acquiring sample data of each patient; performing dimension reduction treatment on the sample data by adopting a principal component analysis method; constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients; projecting each similarity graph into a low-dimensional subspace; merging the subspaces on a Grassman manifold; based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm. The present invention combines multilateral molecular data (mRNA, microRNA and methylation), clinical data and pathway information to identify patient populations with different biological characteristics and different prognosis, thereby enabling rapid and accurate identification of cancer subtypes.

Description

Cancer subtype identification method and system based on multiple sets of mathematical data sets
Technical Field
The invention relates to the technical field of cancer subtype identification, in particular to a method and a system for identifying cancer subtypes based on multiple sets of chemical data sets.
Background
Most of the previous studies focused on the identification of cancer subtypes using single data, with little reliance on comprehensive analysis. The definition of the analysis-by-synthesis is the use of multiple source datasets to better understand the system. Although there is a great deal of research based on single source histology data, most of the etiology of complex traits remains unexplained. Single source histology data does not allow for comprehensive observation of biological systems and performs poorly in identifying new subtypes.
Disclosure of Invention
The invention aims to provide a method and a system for identifying cancer subtypes based on multiple sets of chemical data sets, which are used for quickly and accurately identifying the cancer subtypes.
In order to achieve the above object, the present invention provides the following solutions:
a method of cancer subtype identification based on a plurality of sets of mathematical data, comprising:
acquiring sample data of each patient;
performing dimension reduction treatment on the sample data by adopting a principal component analysis method;
constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients;
projecting each similarity graph into a low-dimensional subspace;
merging the subspaces on a Grassman manifold;
based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm.
Optionally, the sample data comprises gene expression, miRNA expression, and DNA methylation.
Optionally, the expression of the similarity graph is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge E (m) Representing the connection between patients.
Optionally, constructing a similarity graph based on the dimensionality reduced data, and then further includes:
calculating a similarity matrix of the similarity graph;
and according to the similarity matrix, adopting a k-nearest neighbor algorithm to reserve the local structure of each similarity graph.
The invention also provides a cancer subtype identification system based on a plurality of groups of chemical data sets, which comprises:
a sample acquisition film for acquiring sample data of each patient;
the dimension reduction module is used for carrying out dimension reduction processing on the sample data by adopting a principal component analysis method;
the similarity diagram construction module is used for constructing a similarity diagram based on the dimensionality reduced data; the similarity graph is used for representing the similarity between patients;
the projection module is used for projecting each similar graph to the low-dimensional subspace;
a merging module for merging the subspaces on the Grassman manifold;
and the identification module is used for identifying the cancer subtype through a k-means clustering algorithm based on the combined subspaces.
Optionally, the sample data comprises gene expression, miRNA expression, and DNA methylation.
Optionally, the expression of the similarity graph is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge E (m) Representing the connection between patients.
Optionally, the method further comprises:
the calculation module is used for calculating a similarity matrix of the similarity graph;
and the reservation module is used for reserving the local structure of each similarity graph by adopting a k-nearest neighbor algorithm according to the similarity matrix.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a cancer subtype identification method based on a plurality of groups of chemical data sets, which comprises the following steps: acquiring sample data of each patient; performing dimension reduction treatment on the sample data by adopting a principal component analysis method; constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients; projecting each similarity graph into a low-dimensional subspace; merging the subspaces on a Grassman manifold; based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm. The present invention combines multilateral molecular data (mRNA, microRNA and methylation), clinical data and pathway information to identify patient populations with different biological characteristics and different prognosis, thereby enabling rapid and accurate identification of cancer subtypes.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for identifying cancer subtypes based on multiple sets of mathematical data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a cancer subtype identification method based on a plurality of sets of mathematical data according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a method and a system for identifying cancer subtypes based on multiple sets of chemical data sets, which are used for quickly and accurately identifying the cancer subtypes.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1-2, the invention discloses a method for identifying cancer subtype based on a plurality of groups of chemical data sets, which comprises the following steps:
step 101: sample data for each patient is obtained. The sample data includes gene expression, miRNA expression, and DNA methylation.
Step 102: and performing dimension reduction treatment on the sample data by adopting a principal component analysis method.
Step 103: constructing a similarity graph based on the dimension-reduced data; the similarity graph is used to represent the similarity between patients.
The expression of the similarity graph is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge E (m) And represents the connection between patients.
Step 104: each similarity graph is projected into a low-dimensional subspace.
Step 105: the subspaces are merged on a glasman manifold.
Step 106: based on the combined subspaces, the cancer subtypes are identified through a k-means clustering algorithm.
Wherein, after step 103, further comprises:
calculating a similarity matrix of the similarity graph;
and according to the similarity matrix, adopting a k-nearest neighbor algorithm to reserve the local structure of each similarity graph.
Specific examples are as follows:
(1) The present invention is downloaded from the TCGA website, including BIC (breast invasive carcinoma), COAD (colon adenocarcinoma), KRCC (renal clear cell carcinoma), GBM (glioblastoma multiforme) and LSCC (lung squamous cell carcinoma). Each cancer contains three data types (DNA methylation, gene expression, and miRNA expression).
(2) The present invention uses popular Principal Component Analysis (PCA) techniques for dimension reduction. The invention performs PCA on a single data type as a matrix, the goal of which is to find the maximum projection variance of all samples, which can be expressed as:
matrix w= [ W 1 ,w 2 ,…,w k ]Is a orthonormal basis for a low dimensional space. Clearly, eq.2 solution is defined by Z (m) Top k feature vector. Let lambda be 1 ≥λ 2 Not less than … not less than 0 is Z (m) />Is lambda i Is w k . Thus, the final result of PCA is calculated as H (m)T =W T Z (m)
(3) The present invention builds a patient-to-patient map in PCA space that models specific structures within each view. For the mth figure, G (m) ={V (m) ,E (m) Node V (m) Representing the patient in space, edge E (m) Representing the connection between these patients. Thus, the present invention first calculates the graph G (m) Similarity matrix of (c)W (m) . Each elementThe similarity between patients i and j is measured, and the calculation formula is as follows
The parameter t is a normalization factor. The higher the value, the more similar the two patients are.
Next, the present invention preserves the k-nearest neighbor (k-NN) of each patient to preserve the local structure of each graph.
wherein Ni Consists of the k nearest neighbors of patient i. The parameter k depends on the sample size. Since different histology have different structures, the k-NN map is more similar than the originalMore typically.
(4) To further extract key features of the histology, the present invention projects all the graphs into a low-dimensional subspace and obtains their relevant embedding in these spaces.
The invention firstly calculates the normalized graph Laplace matrix L (m) Defined as wherein D(m) Is->Is defined by +.>And (5) calculating. Using a learned Laplace matrix U (m) Can be communicated according to a spectral clustering methodThe relevant eigenvalue problems are solved to calculate their embedding.
The solution of equation (4) is a normalized Laplace matrix L (m) Is defined in the block (a) and the minimum feature vector k of (b). Since embedding is the base of each space, the histology is more comparable than the original graph.
(5) For M-embedding of histology, minimizing the integrated embedding and the euclidean distance between it is a natural way to obtain a fused representation,
however, this approach assumes that similar patients are close in euclidean space, but this is often not the case. It is clear that multiple sets of mathematical data are complex and heterogeneous and therefore more suitable for measuring their distance on manifolds than euclidean space, such as glasman manifolds.
The glasman manifold G (k, n) is a set of k-dimensional linear subspaces. Mathematically, each point of G (k, n) represents a set of orthonormal bases Y, which can span a k-space span (Y). Thus, the space span (Y) andthe distance between can be defined as the sum of the principal angles of all base pairs:
wherein Is the base point Y i And base->A main included angle between the two.
Based on this measurement, the distance between embeddings can be expressed as:
thus, the objective function is
Equation (8) forces the integrated representation U to approach all embedded U in terms of projected distance on the Grassman manifold (m) . Its solution is to correct the Laplace matrixIs defined as the average maximum eigenvector k of (c).
Finally, by the method in L mod And obtaining the cluster labels by applying a k-means algorithm.
To verify the effectiveness of this method, the present invention compares it to Similar Network Fusion (SNF) and glasmann clustering. The present invention compares the method of the present invention with the results of SNF and Grassman clustering using Cox survival p values, the results are shown in Table (1). For fair comparison, the invention takes the same number of subtypes for SNF and Grassman clusters for each cancer. The method of the present invention shows important differences between survival times. Three-fifths of the cancers were studied by SNF, indicating that the methods of the invention have significant differences in survival time between the different subtypes.
Table 1 log rank test analysis of five cancer survival rates
Type of cancer Grassman clustering SNF The method of the invention
BIC (5 kinds) 2.0×10 -4 1.1×10 -3 4.3×10 -5
GBM (3 kinds) 4.3×10 -3 2.0×10 -4 2.3×10 -4
KRCCC (3 kinds) 2.8×10 -2 2.9×10 -2 1.4×10 -1
LSCC (4 kinds) 1.6×10 -2 2.0×10 -2 2.7×10 -3
COAD (3 kinds) 4.2×10 -2 2.0×10 -2 2.7×10 -3
The invention also provides a cancer subtype identification system based on a plurality of groups of chemical data sets, which comprises:
a sample acquisition film for acquiring sample data of each patient;
the dimension reduction module is used for carrying out dimension reduction processing on the sample data by adopting a principal component analysis method;
the similarity diagram construction module is used for constructing a similarity diagram based on the dimensionality reduced data; the similarity graph is used for representing the similarity between patients;
the projection module is used for projecting each similar graph to the low-dimensional subspace;
a merging module for merging the subspaces on the Grassman manifold;
and the identification module is used for identifying the cancer subtype through a k-means clustering algorithm based on the combined subspaces.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (8)

1. A method of identifying a subtype of cancer based on a plurality of sets of mathematical data, comprising:
acquiring sample data of each patient;
performing dimension reduction treatment on the sample data by adopting a principal component analysis method;
constructing a similarity graph based on the dimension-reduced data; the similarity graph is used for representing the similarity between patients;
projecting each similarity graph into a low-dimensional subspace;
merging the subspaces on a Grassman manifold;
based on the combined subspaces, identifying cancer subtypes through a k-means clustering algorithm;
wherein, merging each subspace on the Grassman manifold specifically comprises:
space span (Y)The distance between is defined as the sum of the principal angles of all base pairs:
wherein ,is the base point Y i And base->The main angles between i represent the space span (Y) and +.>The number of the ith base pair in between, k denotes the space span (Y) and +.>Total number of base pairs between, < >>Representing the sum of squares of cosine values of main included angles of all base pairs;
based on this measurement, the distance between embeddings can be expressed as:
wherein M represents the total number of histology,representing span (U) common subspace and span (U) (m) ) The Grassmann manifold distance of subspaces, U denotes the base of all groups of common subspaces, U (m) Representing the base representing the mth histology-specific subspace, tr (UU) T U (m) U (m)T ) Representation span (U) (m) ) Sum of squares of cosine values of main included angles of all substrate pairs between subspaces and span (U) common subspaces;
thus, the objective function is:
wherein I represents a unit array;
forcing the integrated representation U to approach all embedded U in terms of projected distance on the Grassman manifold (m)
2. The method of claim 1, wherein the sample data comprises gene expression, miRNA expression, and DNA methylation.
3. The method for identifying cancer subtypes based on multiple sets of chemical data according to claim 1, characterized in that the expression of the similarity map is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge G (m) ={V (m) ,E (m) And represents the connection between patients.
4. The method for identifying cancer subtypes based on multiple sets of chemical data according to claim 1, characterized in that after constructing a similarity map based on the dimensionality-reduced data, it further comprises:
calculating a similarity matrix of the similarity graph;
and according to the similarity matrix, adopting a k-nearest neighbor algorithm to reserve the local structure of each similarity graph.
5. A cancer subtype identification system based on a plurality of sets of mathematical data, comprising:
a sample acquisition film for acquiring sample data of each patient;
the dimension reduction module is used for carrying out dimension reduction processing on the sample data by adopting a principal component analysis method;
the similarity diagram construction module is used for constructing a similarity diagram based on the dimensionality reduced data; the similarity graph is used for representing the similarity between patients;
the projection module is used for projecting each similar graph to the low-dimensional subspace;
a merging module for merging the subspaces on the Grassman manifold;
the identification module is used for identifying the cancer subtype through a k-means clustering algorithm based on the combined subspaces;
wherein, merging each subspace on the Grassman manifold specifically comprises:
space span (Y)The distance between is defined as the sum of the principal angles of all base pairs:
wherein ,is the base point Y i And base->The main angles between i represent the space span (Y) and +.>The number of the ith base pair in between, k denotes the space span (Y) and +.>Total number of base pairs between, < >>Representing the sum of squares of cosine values of main included angles of all base pairs;
based on this measurement, the distance between embeddings can be expressed as:
wherein M represents the total number of histology,representing span (U) common subspace and span (U) (m) ) The Grassmann manifold distance of subspaces, U denotes the base of all groups of common subspaces, U (m) Representing the base representing the mth histology specific subspace,/->Representation span (U) (m) ) Sum of squares of cosine values of main included angles of all substrate pairs between subspaces and span (U) common subspaces;
thus, the objective function is:
s.t.U T U=I
wherein I represents a unit array;
forcing the integrated representation U to approach all embedded U in terms of projected distance on the Grassman manifold (m)
6. The multiple set of chemical data based cancer subtype recognition system of claim 5, wherein the sample data includes gene expression, miRNA expression, and DNA methylation.
7. The multiple sets of chemical data based cancer subtype identification system of claim 5, wherein the expression of the similarity map is as follows:
G (m) ={V (m) ,E (m) }
wherein ,G(m) Represents the mth similarity graph, node V (m) Representing the patient, edge G (m) ={V (m) ,E (m) And represents the connection between patients.
8. The multiple-set based cancer subtype identification system of claim 5, further comprising:
the calculation module is used for calculating a similarity matrix of the similarity graph;
and the reservation module is used for reserving the local structure of each similarity graph by adopting a k-nearest neighbor algorithm according to the similarity matrix.
CN202110813430.5A 2021-07-19 2021-07-19 Cancer subtype identification method and system based on multiple sets of mathematical data sets Active CN113537358B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110813430.5A CN113537358B (en) 2021-07-19 2021-07-19 Cancer subtype identification method and system based on multiple sets of mathematical data sets

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110813430.5A CN113537358B (en) 2021-07-19 2021-07-19 Cancer subtype identification method and system based on multiple sets of mathematical data sets

Publications (2)

Publication Number Publication Date
CN113537358A CN113537358A (en) 2021-10-22
CN113537358B true CN113537358B (en) 2023-09-01

Family

ID=78100178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110813430.5A Active CN113537358B (en) 2021-07-19 2021-07-19 Cancer subtype identification method and system based on multiple sets of mathematical data sets

Country Status (1)

Country Link
CN (1) CN113537358B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114025320A (en) * 2021-11-08 2022-02-08 易枭零部件科技(襄阳)有限公司 Indoor positioning method based on 5G signal
CN117437973B (en) * 2023-12-21 2024-03-08 齐鲁工业大学(山东省科学院) Single cell transcriptome sequencing data interpolation method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101123983A (en) * 2004-10-27 2008-02-13 米迪缪尼股份有限公司 Modulation of antibody specificity by tailoring the affinity to cognate antigens
CN101395472A (en) * 2006-01-17 2009-03-25 协乐民公司 Method for predicting biological systems responses
CN101473031A (en) * 2006-04-03 2009-07-01 普罗美加公司 Permuted and nonpermuted luciferase biosensors
CN106529165A (en) * 2016-10-28 2017-03-22 合肥工业大学 Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix
CN110334748A (en) * 2019-06-14 2019-10-15 大连理工大学 The cancer subtypes classification method of multiple groups data integration is carried out based on D-S evidence theory
CN111291777A (en) * 2018-12-07 2020-06-16 深圳先进技术研究院 Cancer subtype classification method based on multigroup chemical integration

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101123983A (en) * 2004-10-27 2008-02-13 米迪缪尼股份有限公司 Modulation of antibody specificity by tailoring the affinity to cognate antigens
CN101395472A (en) * 2006-01-17 2009-03-25 协乐民公司 Method for predicting biological systems responses
CN101473031A (en) * 2006-04-03 2009-07-01 普罗美加公司 Permuted and nonpermuted luciferase biosensors
CN106529165A (en) * 2016-10-28 2017-03-22 合肥工业大学 Method for identifying cancer molecular subtype based on spectral clustering algorithm of sparse similar matrix
CN111291777A (en) * 2018-12-07 2020-06-16 深圳先进技术研究院 Cancer subtype classification method based on multigroup chemical integration
CN110334748A (en) * 2019-06-14 2019-10-15 大连理工大学 The cancer subtypes classification method of multiple groups data integration is carried out based on D-S evidence theory

Also Published As

Publication number Publication date
CN113537358A (en) 2021-10-22

Similar Documents

Publication Publication Date Title
CN113537358B (en) Cancer subtype identification method and system based on multiple sets of mathematical data sets
Klimovskaia et al. Poincaré maps for analyzing complex hierarchies in single-cell data
CN110222745B (en) Similarity learning based and enhanced cell type identification method
US11482305B2 (en) Artificial intelligence analysis of RNA transcriptome for drug discovery
CN112750502B (en) Single cell transcriptome sequencing data clustering recommendation method based on two-dimensional distribution structure judgment
Nanni et al. Combining multiple approaches for gene microarray classification
Datta et al. Evaluation of clustering algorithms for gene expression data
CN103559426A (en) Protein functional module excavating method for multi-view data fusion
Mohammed et al. Evaluation of partitioning around medoids algorithm with various distances on microarray data
US20230162818A1 (en) Methods of determining correspondences between biological properties of cells
CN116741397B (en) Cancer typing method, system and storage medium based on multi-group data fusion
Torkey et al. Machine learning model for cancer diagnosis based on RNAseq microarray
Siraj-Ud-Doulah et al. Defining homogenous climate zones of Bangladesh using cluster analysis
Bhaskar et al. Diffusion curvature for estimating local curvature in high dimensional data
CN112163595B (en) Method and device for acquiring typical electricity utilization mode of user and electronic equipment
Le Vuong et al. Ranking loss: a ranking-based deep neural network for colorectal cancer grading in pathology images
Wirth et al. Analysis of microRNA expression using machine learning
CN108108589A (en) The recognition methods of esophageal squamous cell carcinoma label based on network index variance analysis
KR102225231B1 (en) IDENTIFYING METHOD FOR TUMOR PATIENT BASED ON miRNA IN EXOSOME AND APPARATUS FOR THE SAME
Karaletsos et al. ShapePheno: unsupervised extraction of shape phenotypes from biological image collections
Ren et al. Multivariate functional data clustering using adaptive density peak detection
CN115828093B (en) Method and device for analyzing histology sample, electronic equipment and storage medium
Leung et al. Gene selection for brain cancer classification
CN116884554B (en) Electronic medical record classification management method and system
CN115881218B (en) Gene automatic selection method for whole genome association analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant