CN116312794A - Methylation sample clustering method fused with single cell analysis method - Google Patents

Methylation sample clustering method fused with single cell analysis method Download PDF

Info

Publication number
CN116312794A
CN116312794A CN202310027339.XA CN202310027339A CN116312794A CN 116312794 A CN116312794 A CN 116312794A CN 202310027339 A CN202310027339 A CN 202310027339A CN 116312794 A CN116312794 A CN 116312794A
Authority
CN
China
Prior art keywords
methylation
data
clustering
samples
ultra
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310027339.XA
Other languages
Chinese (zh)
Other versions
CN116312794B (en
Inventor
关荣伟
王利强
孟祥宁
蔡梦迪
关雪
梁诗寅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Medical University
Original Assignee
Harbin Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Medical University filed Critical Harbin Medical University
Priority to CN202310027339.XA priority Critical patent/CN116312794B/en
Priority to LU503668A priority patent/LU503668B1/en
Publication of CN116312794A publication Critical patent/CN116312794A/en
Application granted granted Critical
Publication of CN116312794B publication Critical patent/CN116312794B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics

Abstract

The invention discloses a methylation sample clustering method fusing a single-cell analysis method, which relates to the field of biological information methylation data analysis, aims to solve the technical problem that methylation data with smaller sample size cannot realize a better clustering effect, and is mainly used for clustering methylation data of a small number of disease samples by dividing methylation areas of samples according to methylation types of methylation sites and sequence position information of the methylation sites and then re-clustering and visualizing the data by combining the single-cell data clustering method. The invention is applied to the field of methylation data analysis.

Description

Methylation sample clustering method fused with single cell analysis method
Technical Field
The invention belongs to the field of biological information methylation data analysis, and particularly relates to a methylation sample clustering method fused with a single cell analysis method.
Background
DNA methylation is one of the most common modifications in epigenetic science. In general, DNA methylation is mainly used for regulating gene expression, namely, the higher the methylation level of a CpG island in a gene promoter region is, the lower the expression level of a corresponding gene is, the methylation level is closely related to the occurrence and development of diseases, and clustering analysis is performed on the methylation level, so that the prognosis of a patient can be effectively predicted.
The principle of methylation chip is to detect the hybridization signal of DNA sequence after sulfite treatment, which is to change unmethylated cytosine into uracil while methylated cytosine is unchanged, then change uracil into thymine and finally carry out chip hybridization. At present, a 450K chip of Illumina is mainly adopted, wherein the tail part of an unmethylated GpC locus, the U-shaped bead is A, is matched with unmethylated CpG sites, can successfully perform single nucleotide extension and is detected (U-shaped magnetic beads emit light), the tail part of the M-shaped bead is G, cannot be matched with the unmethylated sites, and no signal is generated; in methylated GpC locus, M-type beads can be matched to methylated CpG sites, single nucleotides are extended and signal is generated, while U-type beads are not matched and signal is not generated. The mainstream DNA methylation chip has two kinds of 450K and 850K, wherein 450K and 850K refer to the number of probes on the chip, the probes are in units of methylation sites, each probe is used for detecting one methylation site, and the DNA methylation level (beta value) is determined by calculating the intensity ratio between methylation (signal A) and unmethylation (signal B) alleles. Specifically, the β value is calculated from the intensities of the methylated (M corresponding to signal a) and unmethylated (U corresponding to signal B) alleles, the ratio β=max (M, 0)/[ Max (M, 0) +max (U, 0) +100] of the fluorescent signal. Thus, the β value ranges from 0 (completely unmethylated) to 1 (completely methylated).
The traditional methylation cluster analysis method lacks universality and specificity, and cannot realize small sample size methylation data clustering. Aiming at the limitation, an integrated single-cell data clustering method is provided, the types of the single-cell data clustering method are classified and analyzed based on methylation site information in a genome, data of a plurality of methylation types in one sample (the data matrix is taken as a sample) are divided into a plurality of columns, then the data quality is controlled, and the methylation data matrix is further analyzed by adopting the single-cell clustering method. The methylation cluster analysis method is a new data classification mode, and according to the characteristics, data objects can be classified into different types or similar data can be classified into one type to realize the clustering of small sample size methylation data so as to know the methylation mode related to diseases.
Disclosure of Invention
The invention aims to solve the technical problem that better clustering effect cannot be realized for methylated data with smaller sample size, and provides a methylated sample clustering method fused with a single cell analysis method. The method is characterized in that the methylation type of the same sample is segmented according to the methylation type of the methylation site and the sequence position information, and the data are clustered and visualized again by combining a single cell data clustering method.
The invention discloses a methylation sample clustering method fusing a single cell analysis method, which comprises the following steps:
step one, data segmentation
(1) Reintegration of methylation data, extracting information, and dividing the methylation level of the sample into ultrahigh methylation and ultralow methylation according to the extracted information;
(2) The data of ultra-high methylation and ultra-low methylation in step (1) are further divided into CpG, CHG and CHH according to three types of DNA methylation sites;
(3) Comparing the CpG, CHG and CHH sequences of the step (2) with a genome reference sequence to obtain a compared methylation position information profile;
(4) Dividing the CpG, CHG and CHH data of the ultra-high methylation and the ultra-low methylation in the step (2) into a plurality of methylation type data with position information according to the positions of methylation sequences respectively according to the methylation position information after comparison in the step (3), and obtaining sample data;
step two, data analysis
(5) And (3) data quality control:
performing data filtering on the data matrix of the sample obtained in the step (4) by utilizing a setup packet so as to remove columns with fewer methylated genes compared with other columns;
(6) Principal component analysis
Performing preliminary clustering treatment on the data subjected to the quality control in the step (5) by adopting a principal component analysis mode;
(7) And (3) performing cluster analysis on the data subjected to the preliminary clustering treatment in the step (6) and the positions of the sequences of the data by using a single-cell sequencing analysis method so as to obtain the epigenetic heterogeneity among samples.
Further, the extraction information in the step (1) refers to the sample name, the gene name, the methylation site of the gene, the methylation pattern and the methylation value in the extracted methylation data.
Further, the methylation location information profile given in step (3) refers to the distribution of the methylation region in the genomic reference sequence, the methylation region and the specific functional sequence.
Further, after the data in step (5) is filtered, the data is quality controlled by using the expression status of some specific genes in single cells (nfeature_rna) and the total number of gene expressions in single cells (ncount_rna).
Further, the single cell sequencing assay employs the t-SNE method or the UMAP method.
Further, the ultra-high methylation and ultra-low methylation are determined by calculating the methylation level, and the calculated methylation level formula is: ml=mc/(mc+ umC); where ML is the methylation level, mC is the number of methylated C and umC is the number of unmethylated C; the calculated result is less than or equal to 0.3 and is ultra-low methylation, and the calculated result is more than or equal to 0.7 and is ultra-high methylation.
Further, the removal of the fewer methylated genes than the other columns in step (4) refers to: columns with 10 or less methylation-generating genes were removed.
Further, the behavior genes in the data matrix of the sample obtained in the step (4) are listed as samples divided into different methylation areas, and the elements in the matrix are methylation values.
Further, the preliminary clustering in the step (6) refers to performing intersection processing on the principal components with the sum of contribution degrees of the principal components by using a semat package exceeding 85% of the total contribution degrees, and with the P value smaller than 0.05, and taking intersection data, namely, the preliminary clustering.
The invention is based on genome-wide methylation microarray data, and generally, a data matrix row represents methylation sites, a column represents a sample, and data objects with higher similarity can be clustered into the same group through clustering. However, in the case of a small sample size, a good clustering effect cannot be achieved. Therefore, the method disclosed by the invention is used for dividing the methylation region of the sample according to the methylation type of the methylation site and the sequence position information thereof, and then, the single-cell data clustering method is combined for reclustering and visualizing the data, and the method belongs to the technical field of bioinformatics and is mainly used for analyzing the clustering of the methylation data of a small quantity of disease samples.
The invention has the following beneficial effects:
according to the method, the number of the data units is increased by dividing the methylation type of the sample, so that the accuracy of the methylation data clustering is improved. The method is mainly used for dividing the methylation type of the sample aiming at the methylation data produced by the 450k chip, and improving the methylation data clustering.
The method solves the problem that the prior methylation clustering analysis cannot realize the clustering analysis due to too small sample quantity, improves the number of data units by using a single cell sequencing analysis technology, enables the clustering of a small number of samples to be implemented, finely classifies the characteristics of the samples from various methylation related characteristics based on the methylation degree, the methylation position, the methylation motif and the like, and enables the classified groups to be superior to the traditional principal component analysis method in terms of characteristics, pertinence and accuracy by inducing methylation characteristics from the methylation characteristic analysis.
Drawings
FIG. 1 is a flow chart of sample early data processing in methylation analysis: wherein a is an original methylated sample matrix map; b is an ultrahigh and ultralow methylation matrix diagram; c is CpG, CHG and CHH matrix diagram; d is a matrix diagram of methylated genomic sequence positions;
FIG. 2 is a methylation data sample quality control map; wherein A is a sample gene number graph; b is a sample methylation level diagram; c is a methylation level hypervariable gene map;
FIG. 3 is a plot of principal component analysis of a methylation data sample; wherein A is a principal component cluster map; b is a principal component P value distribution map of the sample; c is a principal component ranking chart;
FIG. 4 is a graph of a methylation data sample U-MAP analysis; wherein 0 is a first type; 1 is the second class; 2 is a third class; 3 is a fourth class; 4 is a fifth class;
FIG. 5 is a graph of a t-SNE analysis of methylation data samples; wherein 0 is a first type; 1 is the second class; 2 is a third class; 3 is a fourth class; and 4 is a fifth class.
Detailed Description
For the purposes of clarity, technical solutions and advantages of embodiments of the present invention, the spirit of the present disclosure will be described in detail below, and any person skilled in the art, after having appreciated the embodiments of the present disclosure, may make changes and modifications to the techniques taught by the present disclosure without departing from the spirit and scope of the present disclosure.
The exemplary embodiments of the present invention and the descriptions thereof are intended to illustrate the present invention, but not to limit the present invention.
Examples
The method for clustering methylation samples of the fusion single cell analysis method in this embodiment is as follows:
the technology can be divided into two major sections of data segmentation and data analysis.
a. And (5) data segmentation. Data segmentation is the operation of converting small amounts of data into large amounts of data, and is performed as follows (fig. 1 is a diagram of cleavage of methylated data into single cell sequencing type data):
(1) for sparse sample sizes of methylation data (as in fig. 1A), the methylation data are re-integrated, and sample names, gene methylation sites, methylation patterns and methylation values in the methylation data are extracted. First, the pattern of methylation of one sample is divided into two types, ultra-high methylation and ultra-low methylation (one sample can be divided into two data types, hypermethylation and hypomethylation), as shown in FIG. 1B;
the ultra-high methylation and ultra-low methylation are determined by calculating the methylation level, and the calculated methylation level formula is: ml=mc/(mc+ umC); where ML is the methylation level, mC is the number of methylated C and umC is the number of unmethylated C; the calculated result is less than or equal to 0.3 and is ultra-low methylation, and the calculated result is more than or equal to 0.7 and is ultra-high methylation;
(2) methylated C is classified into 3 classes according to the underlying sequence context in which it is located: cpG (abbreviated as CG), CHG and CHH. Where p represents a phosphodiester linkage, cpG means that the methylated C is 1G base downstream. In the human genome, more than 90% of CpG sites are methylated, but the methylation degree of CpG islands is usually low, in this case, the transcription is started without affecting the binding of proteins to the promoter region of DNA, so that the gene is expressed, and if the CpG islands of DNA are methylated, the proteins cannot bind to DNA, so that the transcription is silenced and the gene is not expressed. H represents a base other than the G base, i.e., any one of A, C, T, CHG represents H and G2 bases downstream of methylated C, and CHH represents H as both bases downstream of methylated C. The data of each of the ultra-high and ultra-low methylation in the previous step are further divided into three data types including CpG, CHG and CHH, as shown in FIG. 1B;
(3) the sequences are aligned to a reference sequence, giving an aligned methylation position information profile. For example, the distribution of the methylation region in the genome is determined by the positional relationship between the methylation region and a specific functional sequence (Promoter, TSS, TES,5'UTR,3'UTR,Exon,Intron, etc.). Each of the three classes of data for CpG, CHG and CHH in the second step is further subdivided into a plurality of methylation-type data with positional information according to the position of the methylation sequence, as shown in fig. 1C.
The re-segmentation into a plurality of methylation type data with positional information is based on the distribution of methylation regions in the genome, and the positional relationship between the methylation regions and specific functional sequences (e.g., CGI, CGI_Shore, exon, intron, promoter, repeat, TES, TSS, utr3, utr5, etc.). Each of the three classes of CpG, CHG and CHH data is further subdivided into a plurality of methylation-type data with positional information according to the position of the methylation sequence at which it occurs.
To this end, the samples were processed into single cell sequencing type samples, the number of samples was sufficiently expanded, taking the 3 samples shown in fig. 1 as an example, and by this processing, the 3 samples were expanded to 126 samples.
b. And (5) data analysis. The data analysis is to process the sample into single-cell sequencing type data, and then to perform quality control, principal component analysis and single-cell sequencing analysis t-SNE/UMAP analysis on the sample.
(1) And (5) controlling data quality. According to the data type after methylation segmentation obtained in the last step, the data matrix obtained in the third step is subjected to data filtering by utilizing a SEURat included data preprocessing function in a single cell analysis method, and rows and columns with low quality (columns with fewer genes removed and rows with gene methylation in fewer samples) are removed by utilizing parameters such as min. The data were then again quality controlled using nfeature_rna, which represents the number of genes that were methylated measured in each sample, and nCount, which represents the sum of the values of all gene methylation measured in each sample. As shown in fig. 2.
The columns with fewer genes knocked out and the rows with gene methylation in fewer samples refer to: columns and rows with 10 or less methylation-generating genes were removed. This step ensures that at least ten genes in a column of the matrix have methylated values, otherwise this column is removed; at least ten samples in a row have methylated values, otherwise the row is removed. Next, the number of genes measured for each methylated sample is filtered to remove extrema (FIG. 2A), and then the sum of the methylation of all genes measured for each sample is further filtered to remove extrema (FIG. 2B). The mean and variance of each gene was calculated, and genes with highly variable methylation values between samples were selected, and subsequent studies required concentration of this gene, returning 2000 genes by default (see fig. 3C).
(2) And (5) principal component analysis. And carrying out preliminary clustering treatment on the data by adopting a principal component analysis mode so as to select important principal components. As shown in fig. 3.
(3) Single cell sequencing analysis t-SNE/UMAP analysis. The non-linear dimension reduction method t-SNE (which focuses on capturing local similarity at the cost of discarding global structures) or UMAP is used based on segmented and quality controlled methylation data. The method performs cluster analysis on the positions of different types of methylation data and sequences thereof so as to accurately reflect the epigenetic heterogeneity among samples. As shown in fig. 4 and 5.

Claims (9)

1. A methylation sample clustering method fused with a single cell analysis method is characterized by comprising the following steps:
step one, data segmentation
(1) Reintegration of methylation data, extracting information, and dividing the methylation level of the sample into ultrahigh methylation and ultralow methylation according to the extracted information;
(2) The data of ultra-high methylation and ultra-low methylation in step (1) are further divided into CpG, CHG and CHH according to three types of DNA methylation sites;
(3) Comparing the CpG, CHG and CHH sequences of the step (2) with a genome reference sequence to obtain a compared methylation position information profile;
(4) Dividing the CpG, CHG and CHH data of the ultra-high methylation and the ultra-low methylation in the step (2) into a plurality of methylation type data with position information according to the positions of methylation sequences respectively according to the methylation position information after comparison in the step (3), and obtaining sample data;
step two, data analysis
(5) And (3) data quality control:
performing data filtering on the data matrix of the sample obtained in the step (4) by utilizing a setup packet so as to remove columns with fewer methylated genes compared with other columns;
(6) Principal component analysis
Performing preliminary clustering treatment on the data subjected to the quality control in the step (5) by adopting a principal component analysis mode;
(7) And (3) performing cluster analysis on the data subjected to the preliminary clustering treatment in the step (6) and the positions of the sequences of the data by using a single-cell sequencing analysis method so as to obtain the epigenetic heterogeneity among samples.
2. The method of claim 1, wherein the extracted information in step (1) is the sample name, gene name, methylation site of gene, methylation pattern and methylation value of the extracted methylation data.
3. The method of claim 1, wherein the methylation location information profile of the alignment in step (3) refers to the distribution of the methylation region in the genome reference sequence, the methylation region and the specific functional sequence.
4. The method for clustering methylation samples in a fused single cell assay according to claim 1, wherein after the data in step (5) is filtered, the data is quality controlled by using the expression of some specific genes in single cells and the total number of gene expressions in single cells.
5. The method for clustering methylated samples in a fused single-cell assay according to claim 1, wherein the single-cell sequencing assay employs t-SNE or UMAP.
6. The method for clustering methylation samples in a fused single cell assay according to claim 1, wherein the ultra-high methylation and ultra-low methylation are determined by calculating the methylation level, and the formula for calculating the methylation level is: ml=mc/(mc+ umC); where ML is the methylation level, mC is the number of methylated C and umC is the number of unmethylated C; the calculated result is less than or equal to 0.3 and is ultra-low methylation, and the calculated result is more than or equal to 0.7 and is ultra-high methylation.
7. The method of claim 1, wherein the removal of the columns having a smaller number of methylated genes than the other columns in step (4) is: columns with 10 or less methylation-generating genes were removed.
8. The method for clustering methylated samples in a fused single-cell assay according to claim 1, wherein the behavior genes in the data matrix of the samples obtained in the step (4) in the step (5) are classified into samples with different methylation regions, and the elements in the matrix are methylation values.
9. The method for clustering the methylation samples by using the fusion single cell analysis method according to claim 1, wherein the preliminary clustering in the step (6) is performed by adopting a semat package to perform intersection processing on principal components with a sum of contribution degrees of the principal components exceeding 85% of total contribution degrees and a P value of less than 0.05, and acquiring intersection data, namely the preliminary clustering.
CN202310027339.XA 2023-01-09 2023-01-09 Methylation sample clustering method fused with single cell analysis method Active CN116312794B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202310027339.XA CN116312794B (en) 2023-01-09 2023-01-09 Methylation sample clustering method fused with single cell analysis method
LU503668A LU503668B1 (en) 2023-01-09 2023-03-16 Clustering Method of Methylation Samples Integrated with Single-cell Sequencing Analysis Method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310027339.XA CN116312794B (en) 2023-01-09 2023-01-09 Methylation sample clustering method fused with single cell analysis method

Publications (2)

Publication Number Publication Date
CN116312794A true CN116312794A (en) 2023-06-23
CN116312794B CN116312794B (en) 2023-11-14

Family

ID=86778724

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310027339.XA Active CN116312794B (en) 2023-01-09 2023-01-09 Methylation sample clustering method fused with single cell analysis method

Country Status (2)

Country Link
CN (1) CN116312794B (en)
LU (1) LU503668B1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012162660A2 (en) * 2011-05-25 2012-11-29 Brown University Methods using dna methylation for identifying a cell or a mixture of cells for prognosis and diagnosis of diseases, and for cell remediation therapies
CN107301330A (en) * 2017-06-02 2017-10-27 西安电子科技大学 A kind of method of utilization full-length genome data mining methylation patterns
CN111261229A (en) * 2020-01-17 2020-06-09 广州基迪奥生物科技有限公司 Biological analysis process of MeRIP-seq high-throughput sequencing data
CN112802545A (en) * 2021-01-28 2021-05-14 哈尔滨医科大学 Cardiovascular disease patient DNA methylation data processing platform and method
CN114203259A (en) * 2021-12-17 2022-03-18 暨南大学 Multi-group chemical data integration analysis method and online interactive comprehensive analysis platform
CN114298201A (en) * 2021-12-23 2022-04-08 电子科技大学(深圳)高等研究院 Single cell methylation data clustering method based on multi-distance spectrum embedding fusion
CN114722918A (en) * 2022-03-18 2022-07-08 马杰 Tumor classification method based on DNA methylation
US20220396838A1 (en) * 2021-04-08 2022-12-15 The Chinese University Of Hong Kong Cell-free dna methylation and nuclease-mediated fragmentation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012162660A2 (en) * 2011-05-25 2012-11-29 Brown University Methods using dna methylation for identifying a cell or a mixture of cells for prognosis and diagnosis of diseases, and for cell remediation therapies
CN107301330A (en) * 2017-06-02 2017-10-27 西安电子科技大学 A kind of method of utilization full-length genome data mining methylation patterns
CN111261229A (en) * 2020-01-17 2020-06-09 广州基迪奥生物科技有限公司 Biological analysis process of MeRIP-seq high-throughput sequencing data
CN112802545A (en) * 2021-01-28 2021-05-14 哈尔滨医科大学 Cardiovascular disease patient DNA methylation data processing platform and method
US20220396838A1 (en) * 2021-04-08 2022-12-15 The Chinese University Of Hong Kong Cell-free dna methylation and nuclease-mediated fragmentation
CN114203259A (en) * 2021-12-17 2022-03-18 暨南大学 Multi-group chemical data integration analysis method and online interactive comprehensive analysis platform
CN114298201A (en) * 2021-12-23 2022-04-08 电子科技大学(深圳)高等研究院 Single cell methylation data clustering method based on multi-distance spectrum embedding fusion
CN114722918A (en) * 2022-03-18 2022-07-08 马杰 Tumor classification method based on DNA methylation

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANNALAURA MANCIA ET AL.: "Skin distress associated with xenobiotics exposure: An epigenetic study in the Mediterranean fin whale (Balaenoptera physalus)", 《MARINE GENOMICS》, vol. 57 *
刘照洋: "基于RNA m6A修饰高通数据的共甲基化模式聚类解析模型", 《中国博士学位论文全文数据库基础科学辑》, no. 04 *

Also Published As

Publication number Publication date
CN116312794B (en) 2023-11-14
LU503668B1 (en) 2023-09-22

Similar Documents

Publication Publication Date Title
US10947595B2 (en) Nucleic acids and methods for detecting chromosomal abnormalities
CN113366122B (en) Free DNA end characterization
KR102447079B1 (en) Methods and processes for non-invasive assessment of genetic variations
CN105441432B (en) Composition and its purposes in sequencing and variation detection
Park et al. Genome-wide mRNA profiling and multiplex quantitative RT-PCR for forensic body fluid identification
JP6560465B1 (en) Method for multi-resolution analysis of cell-free nucleic acids
US20190233883A1 (en) Methods and compositions for analyzing nucleic acid
TW202146657A (en) Detecting mutations for cancer screening and fetal analysis
JP2020010700A (en) Methods for detecting cancer through generalized loss of stability of epigenetic domains and compositions thereof
CN109652513B (en) Method and kit for accurately detecting individual mutation of liquid biopsy based on second-generation sequencing technology
EP3973080A1 (en) Systems and methods for determining whether a subject has a cancer condition using transfer learning
CN115418401A (en) Diagnostic assay for urine monitoring of bladder cancer
Smart et al. A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes
CN109524060B (en) Genetic disease risk prompting gene sequencing data processing system and processing method
CN116312794B (en) Methylation sample clustering method fused with single cell analysis method
CN116716397A (en) Method and device for detecting DMD gene variation, probe and kit
Roy et al. NGS-μsat: Bioinformatics framework supporting high throughput microsatellite genotyping from next generation sequencing platforms
CN114400045A (en) Method, probe set, kit and system for detecting homologous recombination repair defects based on second-generation sequencing
CN110684830A (en) RNA analysis method for paraffin section tissue
EP4234720A1 (en) Epigenetic biomarkers for the diagnosis of thyroid cancer
Huang Computational Discovery and Annotations of Cell-Type Specific Long-Range Gene Regulation
Skrant et al. Differentiating monozygotic twins using NGS
WO2024020036A1 (en) Dynamically selecting sequencing subregions for cancer classification
WO2023240194A1 (en) Determining b-allele frequency values from optical genome mapping data
CN117766029A (en) Analysis method of congenital equinovarus related abnormal methylation genes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant