CN111755071A - Single cell chromatin accessibility sequencing data analysis method and system based on peak clustering - Google Patents

Single cell chromatin accessibility sequencing data analysis method and system based on peak clustering Download PDF

Info

Publication number
CN111755071A
CN111755071A CN201910256667.0A CN201910256667A CN111755071A CN 111755071 A CN111755071 A CN 111755071A CN 201910256667 A CN201910256667 A CN 201910256667A CN 111755071 A CN111755071 A CN 111755071A
Authority
CN
China
Prior art keywords
clustering
cell
peaks
peak
accesson
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910256667.0A
Other languages
Chinese (zh)
Other versions
CN111755071B (en
Inventor
瞿昆
方靖文
黎斌
李杨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qu Kun
University of Science and Technology of China USTC
Original Assignee
University of Science and Technology of China USTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Science and Technology of China USTC filed Critical University of Science and Technology of China USTC
Priority to CN201910256667.0A priority Critical patent/CN111755071B/en
Publication of CN111755071A publication Critical patent/CN111755071A/en
Application granted granted Critical
Publication of CN111755071B publication Critical patent/CN111755071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs

Abstract

A method and system for peak clustering based single cell chromatin accessibility sequencing data analysis, the method comprising: comparing the single-cell chromatin accessibility sequencing data with corresponding biological sample genome data to obtain a comparison result, searching peaks on the basis of the comparison result, and calculating the reading in each peak to obtain a reading matrix of cells and peaks; calculating the mathematical distance between peaks in the reading matrix of the cell peaks, clustering the peaks, and combining the reading matrix of the cell peaks into the reading matrix of the cell accesson, wherein the accesson is the clustered peaks. The invention provides a first method and a first system for analyzing data from fastq to clustering, visualization and development path remodeling and obviously improves the clustering effect.

Description

Single cell chromatin accessibility sequencing data analysis method and system based on peak clustering
Technical Field
The invention belongs to the technical field of biological sequencing data analysis, and particularly relates to a single cell chromatin accessibility sequencing data analysis method and system based on peak clustering.
Background
ATAC-seq is widely popularized in the research of the biological field due to the advantages of simplicity, low price and less required cells since the invention of 2012, and contributes to breakthrough progress in the research of embryonic development, stem cell differentiation, cancer mechanism, typing and the like. ATAC-seq can be used for explaining the pathogenesis and precise dose type of T Cell lymphoma as found in a CANCER Cell (IF ═ 24) in 2017, and ATAC-seq data are entered into a TCGA database in 2018. Thus, to further investigate cellular heterogeneity, the scATAC-seq sequencing technology was proposed in 2015 and implemented a number of different technical solutions in several years of development, with the consequent analytical interpretation of data of the scATAC-seq sequencing results.
The primary purpose of the scATAC-seq data analysis is to restore the major cell population or developmental differentiation pathways in the mixed biological sample by sequencing results. However, the current scATAC-seq technique is relatively low in signal-to-noise ratio of data compared to the leading edge. Therefore, the scATAC-seq data analysis requires a set of easy-to-use analysis methods and minimizes the reduction of cell heterogeneity information. On one hand, the existing scATAC-seq data analysis method has no perfect and easy-to-use analysis process from fastq initiation to clustering, visualization and development path reconstruction. On the other hand, by using gold standard test datasets, i.e., some test datasets that are known to each cell to belong to a subpopulation or location in the developmental differentiation pathway. The existing method still has poor effect on information reduction, and needs to be improved (by utilizing ARI evaluation). As such, there is currently no uniform analytical approach in the industry for the scaTAC-seq analysis.
The prior art has the following three analysis methods: ChromVAR, LSI and Cicero.
In the ChromVAR method, the input data is a reading matrix of cell peaks, and sequence information for each peak. Thus, a cell transcription factor preference fraction matrix is constructed, and information is restored by using the matrix.
In the LSI method, the input data is a reading matrix of cell peaks, which complicates the matrix by TF-IDF algorithm (Term Frequency, IDF means inverse text Frequency index), and then performs information restoration by a new matrix.
In the Cicero method, the input data is a reading matrix of the cell-peak and information about the position of the peak on the chromosome. And then using the matrix to perform downstream information restoration.
Disclosure of Invention
In view of the above, the present invention provides a complete, easy-to-use, and efficient method and system for analyzing scATAC-seq data of biological samples with cell heterogeneity information reduction capability.
In order to achieve the above object, in one aspect, the present invention provides a single-cell chromatin accessibility sequencing data analysis method based on peak clustering, comprising:
comparing the single-cell chromatin accessibility sequencing data with corresponding biological sample genome data to obtain a comparison result, searching peaks on the basis of the comparison result, and calculating the reading in each peak to obtain a reading matrix of cells and peaks;
calculating the mathematical distance between peaks in the reading matrix of the cell peaks, clustering the peaks, and combining the reading matrix of the cell peaks into the reading matrix of the cell accesson, wherein the accesson is the clustered peaks.
In some embodiments, the method further comprises reducing the reading matrix of the cells accesson to a two-dimensional visualization matrix, preferably the method of reducing the dimension comprises PCA, T-SNE or UMAP.
In some embodiments, the method further comprises clustering cells according to the reading matrix of the cells accesson, preferably the clustering algorithm comprises KNN clustering, kernel clustering or lovain clustering.
In some embodiments, the method further comprises constructing a cell development pathway pseudo-temporal profile using the read matrix of cells accesson, preferably the algorithm used in constructing the cell development pathway pseudo-temporal profile comprises SPRING or monocle.
On the other hand, the invention provides a single cell chromatin accessibility sequencing data analysis system based on peak clustering, which comprises a pretreatment module and an accesson construction module;
the pretreatment module comprises a) a comparison unit, a comparison unit and a comparison unit, wherein the comparison unit is used for comparing the single-cell chromatin accessibility sequencing data with the corresponding biological sample genome data to obtain a comparison result; b) the peak searching unit is used for merging the comparison results of all the single cells and then searching peaks; c) a reading calculation unit for calculating the reading in each peak to obtain a reading matrix of cells and peaks;
the accesson construction module comprises a) a peak distance calculation unit for calculating the mathematical distance between peaks in a reading matrix of cells by peaks; b) the peak clustering unit is used for clustering peaks according to the mathematical distance between the peaks; c) and the matrix conversion unit is used for merging the reading matrixes of the cell peaks into a reading matrix of the cell accesson, wherein the accesson is the clustered peaks.
In some embodiments, the system further comprises a visualization module for reducing the reading matrix of the cells accesson to a two-dimensional visualization matrix, preferably the method of reducing the dimension comprises PCA, T-SNE or UMAP.
In some embodiments, the system further comprises a cell clustering module for clustering cells according to the reading matrix of cells accesson, preferably the clustering algorithm comprises KNN clustering, kernel clustering or lovain clustering.
In some embodiments, the system further comprises a cell development pathway remodeling module for constructing a cell development pathway pseudo-temporal scenario using the read matrix of the cells, preferably, the algorithm used in constructing the cell development pathway pseudo-temporal scenario comprises SPRING or monocle.
In some embodiments, the mathematical distance comprises a euclidean distance, a pearson correlation coefficient, or a cityblock distance.
In some embodiments, the method of peak clustering includes KNN, DBSAN, or K-Mean.
In some embodiments, the method of consolidating the reading matrices of cell peaks into a reading matrix of cell peaks comprises taking the sum of the peak readings in accesson, the mean of the peak readings, the median of the peak readings, or the variance of the peak readings.
In another aspect, the present invention further provides a single-cell chromatin accessibility sequencing data analysis apparatus based on peak clustering, including:
a processor;
a memory having instructions stored thereon that, when executed by the processor, cause the processor to perform the analysis method.
In yet another aspect, the present invention also proposes a computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the analysis method.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a first method and a first system for analyzing scATAC-seq data from fastq to clustering, visualization and development path remodeling;
the invention provides an accesson construction method based on peak clustering, which is used as a key module for data analysis of scaTAC-seq. Transformed cells are used for subsequent clustering, visualization and cell development pathway remodeling. The clustering effect was statistically significantly higher on the gold labeled dataset test than the existing Approach (ARI).
Drawings
FIG. 1 is a schematic diagram of the construction and downstream analysis of accesson based on peak clustering in the embodiment of the present invention;
FIG. 2 shows the relationship between the number of accessons and the ARI (gold mark test data set 1) according to the embodiment of the present invention;
FIG. 3 is data of human leukemia cells and related lineage cells, scATAC-seq, according to the present invention: A. data clustering (hierarchical clustering) and b.
FIG. 4 is data of human hematopoietic stem cell developmental differentiation lineage associated scATAC-seq in accordance with an embodiment of the present invention: data development pathway remodeling (monocle);
FIG. 5 shows data of mouse forebrain neural cells scATAC-seq in examples of the present invention: data clustering (KNN) and visualization (tSNE);
FIGS. 6A-6D are data of mouse thymic T cell scatAC-seq in accordance with the present invention: data clustering (Louvain, hierarchical clustering), visualization (tSNE), and developmental pathway remodeling (monocle);
FIG. 7 shows a comparison between the clustering effect and the time of use in the embodiment of the present invention and the existing method (gold mark test data set 1);
FIG. 8 shows a comparison between the clustering effect and the time of use in the embodiment of the present invention and the existing method (gold test data set 2).
Detailed Description
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
For the sake of understanding, the term of art will be used herein to explain the same and to avoid redundancy.
Cell: the basic components of the life activities of mammals (such as human and mouse) are also the pathogenesis of various diseases, such as nerve cells, epithelial cells and tumor cells.
Cellular heterogeneity: biological tissue samples (e.g., tumor tissue, brain tissue) are composed of a large number of cells, whose constituent cells have different physiological functions. The following two manifestations of common cellular heterogeneity exist: 1) the constituent cells are composed of a variety of well-defined cell populations (discrete). 2) The constituent cells are in a continuous cellular differentiation pathway (continuum).
Genome: i.e. the organism full DNA sequence, is composed of the ATCG four basic groups ordered arrangement. The genome of major mammals such as human and mouse has been completely sequenced.
Gene: a gene (genetic element) is the entire DNA sequence required to produce a polypeptide chain or functional RNA. A gene is typically one or more DNA segments of a genome.
Transcription factors: a protein that binds to DNA, initiates or regulates expression of a gene. Its binding to DNA is often through the recognition of specific DNA sequence patterns (Motif).
Chromatin: the cell nucleus is a linear composite structure consisting of DNA, histone, non-histone and a small amount of RNA. The basic element is a nucleosome formed by winding DNA on histone.
Chromatin accessibility: i.e., to assess whether a stretch of DNA is wound around a histone. In general, chromatin accessibility is in two contexts: 1) DNA tightly wound around nucleosomes, termed closed DNA; 2) DNA is wound around nucleosomes in a naked state and is called open DNA.
Chromatin accessibility sequencing (ATAC-seq): a sequencing technique developed at stanford university 2012 for detecting chromatin accessibility in biological samples (> 500 cells).
TCGA: namely The Cancer Genome Atlas (TCGA) project. Contains distinct omics sequencing data for cancer tissues and normal tissues for 33 different cancers and 11,000 patients.
Single cell chromatin accessibility sequencing (scATAC-seq): there are several sequencing methods used to detect the accessibility of chromatin to individual cells. Including single cell chromatin accessibility sequencing (snATAC-seq), single cell combinatorial index chromatin accessibility sequencing (sciATAC-seq), flow-based single cell chromatin accessibility sequencing (FACS scATAC-seq).
Short sequences (Sequence reads): i.e., in the biomemics, the DNA fragments obtained.
Alignment (Mapping): the short sequences are compared to known genomic information to find the location of each short sequence on the genome.
Peak finding (Peak Calling): and searching the position of DNA opening through the result of data analysis and comparison, wherein the position information is called as a peak and is assigned with a number.
Reading: i.e., the number of short sequences per sample, per peak.
And (2) Access: the peak clustering result provided by the invention is referred to as an Access, namely the clustering condition of the peak. Such as access 1 ═ peak 2, peak 3, peak 5; accesson 2 ═ peak 1, peak 4.
Ari (adjusted rank index) is a commonly used evaluation index for clustering algorithm, and is used for evaluating the consistency of algorithm clustering results and actual clustering results.
One embodiment of the present invention proposes a single-cell chromatin accessibility (scATAC-seq) sequencing data analysis system (hereinafter abbreviated as APEC) based on peak clustering: the system comprises the following modules:
1) a preprocessing module: comprises a) an alignment unit for aligning a fastq file (namely single cell chromatin accessibility sequencing data) to a genome sequence to form a bam file; b) the peak searching unit is used for merging the bam files of all the single cell comparison results into a merge _ bam file and searching peaks on the basis; c) and a reading calculation unit for finally outputting a reading matrix of the cell-peak by calculating the counts of reads in each peak.
2) The accesson building module: comprising a) a peak distance calculation unit, calculating the mathematical distance (including but not limited to Euclidean distance, Pearson correlation coefficient, cityblock distance) between the peaks through the reading matrix of cell peaks; b) and a peak clustering unit for clustering peaks through mathematical distances between the peaks, wherein the clustered peaks are called accesson, and the clustering method includes but is not limited to (KNN, DBSAN). c) And the matrix conversion unit is used for combining the reading matrixes of the cell peaks into the cell peaks matrix according to the accesson information, wherein the combination method comprises but is not limited to taking the sum, the average value, the median, the variance and the like of the reading matrixes of the peaks in the accesson.
3) A visualization module: and reducing the dimension of the cell accesson reading matrix into a two-dimensional visualization matrix, wherein the used dimension reduction visualization method comprises but is not limited to PCA, T-SNE and UMAP.
4) A cell clustering module: and clustering the cells by using the accesson reading matrix, wherein the clustering algorithm comprises but is not limited to KNN clustering, kernel clustering and louvain clustering.
5) Cell developmental pathway remodeling module: cell development pathway pseudo-time profiles are constructed using cell accesson readout matrices using algorithms including, but not limited to, SPRING, monocle.
The following is a use case of the APEC in 4 different gold-labeled test data sets for illustrating the universality of the APEC in data analysis of different biological samples scATAC-seq in the embodiment according to the present invention, wherein the data sets include: 1) human leukemia cell and related lineage cell scATAC-seq data; 2) human hematopoietic stem cell developmental differentiation lineage associated scATAC-seq data; 3) mouse forebrain nerve cell scATAC-seq data; 4) mouse thymic T cell scATAC-seq data.
The analysis process of the scATAC-seq analysis system (APEC) based on peak clustering according to the present invention comprises the following steps:
1) data input:
the input data is a fastq file, and the format of the fastq file can be as follows: a) one fastq file per cell; b) a fastq file mixed together, but each cell can be split into data of each cell by a splitting rule given by a data provider. Such as index sequences (different splits with 5-10 bases before fastq)
2) Data preprocessing:
the input data can be aligned to different biological sample genomes through the alignment unit, such as the data sets 1 and 2 are aligned to human genomes, and the data sets 3 and 4 are aligned to mouse genomes. Or a biological sample genome specified by a data provider. The alignment results produced a Bam file that indicates where the reads in each fastq align to the genome. The bam file is processed by a peak finding unit to define chromatin opening sites in the biological sample, and a matrix of reads (m × n) per peak (n) per cell (m) is obtained in conjunction with a read calculation unit.
3) The accesson constructs:
fig. 1 is a schematic diagram of the construction and downstream analysis of accesson based on peak clustering in the embodiment of the present invention. In the accesson construction, an m × n matrix of readings is first passed into the accesson construction module.
In the peak distance calculation unit, the relative distance between peaks ( data sets 1, 2, 3, 4) may be calculated using the euclidean distance, or other commonly used vector distance calculation methods may be used, such as pearson correlation coefficient, cityblock distance, etc.
In the peak clustering unit, peaks can be clustered into a specified number of accessons ( data sets 1, 2, 3, 4) using the KNN algorithm. The clustering algorithm may be a common vector clustering algorithm, such as DBSCAN, K-Mean, and the like. Where the specified number of accessons does not affect the result over a wide distance (fig. 2), the default is 2000, which can be adjusted according to the specific data.
In the matrix conversion unit, firstly, certain screening is carried out on the accessons according to the basic properties of the accessons, for example, the accessons with the number of peaks smaller than a specified value are removed, or the accessons with the internal damping coefficient smaller than the specified value are removed. The cell peak readings are then combined into a cell peak matrix according to the accesson information by summing the peak readings in the accesson ( data sets 1, 2, 3, 4). Other simple vector property calculation methods such as mean of readings, median of readings, variance of readings, etc. may also be utilized.
4) Data clustering and visualization
In this step, the visualization module may be used to reduce the cells-accesson reading matrix into a two-dimensional visualization matrix, and/or the cell clustering module may be used to cluster the cells, and/or the cell development pathway remodeling module may be used to construct the cell development pathway pseudo-time.
FIG. 3 is data for human leukemia cells and related lineage cells, scATAC-seq: A. data clustering (hierarchical clustering) and b.
FIG. 4 is data of human hematopoietic stem cell developmental differentiation lineage associated scATAC-seq: data development pathway remodeling (monocle);
FIG. 5 shows data of mouse forebrain nerve cells scATAC-seq: data clustering (KNN) and visualization (tSNE);
FIGS. 6A-6D are mouse thymic T cell scataC-seq data: in which fig. 6A is luvain clustering, fig. 6B is hierarchical clustering, fig. 6C is visualization (tSNE), and fig. 6D is developmental pathway remodeling (monocle).
Therefore, the method can realize the remodeling of the clustering, visualization and development paths from fastq. And the clustering effect (ARI) was statistically significantly higher on the gold labeled dataset test than the existing methods, as shown in fig. 7 and 8. The reason that the cell heterogeneity information can be efficiently reduced is that the accesson construction method provided by the method is a filtering process for reducing noise and amplifying signals, and the details are as follows: 1) compared with LSI and ChromVAR, the method can convert originally sparse cell peak matrix into more compact cell accesson matrix, and reduces noise signals in subsequent analysis; 2) compared with the Cicero method for peak combination based on chromatin position, the invention combines the clustered peaks through mathematical distance and clustering algorithm. Peaks clustered together in the method have similar expression patterns, so that the construction of the accesson is more biological, for example, peaks inside the accesson are possibly regulated and controlled by the same transcription factor or are closer in the three-dimensional structure of chromatin. Thus, the transformed cells are in the accesson matrix, further amplifying the cell heterogeneity.
The invention also provides a single cell chromatin accessibility sequencing data analysis device based on peak clustering, which comprises:
a processor;
a memory having instructions stored thereon that, when executed by the processor, cause the processor to perform the analysis method.
The invention also proposes a computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the analysis method.
It should be noted that each functional module/unit in the present invention may be hardware, for example, the hardware may be a circuit, including a digital circuit, an analog circuit, and the like. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The data processing module may be any suitable hardware processor such as a CPU, GPU, FPGA, DSP, ASIC, etc. The memory unit may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A single cell chromatin accessibility sequencing data analysis method based on peak clustering, comprising:
comparing the single-cell chromatin accessibility sequencing data with corresponding biological sample genome data to obtain a comparison result, searching peaks on the basis of the comparison result, and calculating the reading in each peak to obtain a reading matrix of cells and peaks;
calculating the mathematical distance between peaks in the reading matrix of the cell peaks, clustering the peaks, and combining the reading matrix of the cell peaks into the reading matrix of the cell accesson, wherein the accesson is the clustered peaks.
2. An assay method according to claim 1, wherein the method further comprises reducing the reading matrix of the cells accesson to a two-dimensional visualisation matrix, preferably the dimension reduction method comprises PCA, T-SNE or UMAP.
3. An analysis method according to claim 1, wherein the method further comprises clustering cells according to the cell accesson's reading matrix, preferably the clustering algorithm comprises KNN clustering, kernel clustering or lovain clustering.
4. An assay method according to claim 1, wherein the method further comprises using the reading matrix for cells accesson to construct cell development pathway pseudo-temporal profiles, preferably wherein the algorithm used to construct the cell development pathway pseudo-temporal profiles comprises SPRING or monocle.
5. The analysis method of any of claims 1-4, wherein the mathematical distance comprises a Euclidean distance, a Pearson correlation coefficient, or a cityblock distance;
preferably, the method of peak clustering comprises KNN, DBSAN or K-Mean.
Preferably, the method of combining the reading matrices of cell peaks into a reading matrix of cell accesson comprises taking the sum of the peak readings in accesson, the mean of the peak readings, the median of the peak readings or the variance of the peak readings.
6. A single cell chromatin accessibility sequencing data analysis system based on peak clustering comprises a preprocessing module and an accesson construction module;
the pretreatment module comprises a) a comparison unit, a comparison unit and a comparison unit, wherein the comparison unit is used for comparing the single-cell chromatin accessibility sequencing data with the corresponding biological sample genome data to obtain a comparison result; b) the peak searching unit is used for merging the comparison results of all the single cells and then searching peaks; c) a reading calculation unit for calculating the reading in each peak to obtain a reading matrix of cells and peaks;
the accesson construction module comprises a) a peak distance calculation unit for calculating the mathematical distance between peaks in a reading matrix of cells by peaks; b) the peak clustering unit is used for clustering peaks according to the mathematical distance between the peaks; c) and the matrix conversion unit is used for merging the reading matrixes of the cell peaks into a reading matrix of the cell accesson, wherein the accesson is the clustered peaks.
7. The analysis system according to claim 6, wherein the system further comprises a visualization module for reducing the reading matrix of cells accesson to a two-dimensional visualization matrix, preferably the method of reducing the dimensions comprises PCA, T-SNE or UMAP;
preferably, the system further comprises a cell clustering module for clustering cells according to the reading matrix of the cells, preferably, the clustering algorithm comprises KNN clustering, kernel clustering or louvain clustering;
preferably, the system further comprises a cell development pathway remodeling module for constructing a cell development pathway pseudo-temporal condition using the reading matrix of the cells accesson, and preferably, the algorithm used in constructing the cell development pathway pseudo-temporal condition comprises SPRING or monocle.
8. The analysis system of claim 6 or 7, wherein the mathematical distance comprises a Euclidean distance, a Pearson correlation coefficient, or a cityblock distance;
preferably, the method of peak clustering comprises KNN, DBSAN or K-Mean;
preferably, the method of combining the reading matrices of cell peaks into a reading matrix of cell accesson comprises taking the sum of the peak readings in accesson, the mean of the peak readings, the median of the peak readings or the variance of the peak readings.
9. A single cell chromatin accessibility sequencing data analysis apparatus based on peak clustering, comprising:
a processor;
a memory having instructions stored thereon that, when executed by the processor, cause the processor to perform the analysis method of any of claims 1-5.
10. A computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the analysis method of any one of claims 1-5.
CN201910256667.0A 2019-03-29 2019-03-29 Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering Active CN111755071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910256667.0A CN111755071B (en) 2019-03-29 2019-03-29 Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910256667.0A CN111755071B (en) 2019-03-29 2019-03-29 Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering

Publications (2)

Publication Number Publication Date
CN111755071A true CN111755071A (en) 2020-10-09
CN111755071B CN111755071B (en) 2023-04-21

Family

ID=72672727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910256667.0A Active CN111755071B (en) 2019-03-29 2019-03-29 Single-cell chromatin accessibility sequencing data analysis method and system based on peak clustering

Country Status (1)

Country Link
CN (1) CN111755071B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270953A (en) * 2020-10-29 2021-01-26 哈尔滨因极科技有限公司 Analysis method, device and equipment based on BD single cell transcriptome sequencing data
CN112992267A (en) * 2021-04-13 2021-06-18 中国人民解放军军事科学院军事医学研究院 Single-cell transcription factor regulation network prediction method and device
CN113178233A (en) * 2021-04-27 2021-07-27 西安电子科技大学 Efficient clustering method for large-scale single-cell transcriptome data
WO2022188785A1 (en) * 2021-03-08 2022-09-15 中国科学院上海营养与健康研究所 Single cell transcriptome computation and analysis method and system incorporating deep learning model
CN116981779A (en) * 2022-02-08 2023-10-31 染色质(北京)科技有限公司 Method for identifying chromatin structural features from a Hi-C matrix, non-transitory computer readable medium storing a program for identifying chromatin structural features from a Hi-C matrix

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030162219A1 (en) * 2000-12-29 2003-08-28 Sem Daniel S. Methods for predicting functional and structural properties of polypeptides using sequence models
CN103955629A (en) * 2014-02-18 2014-07-30 吉林大学 Micro genome segment clustering method based on fuzzy k-mean
CN105339503A (en) * 2013-05-23 2016-02-17 斯坦福大学托管董事会 Transposition into native chromatin for personal epigenomics
US20160097088A1 (en) * 2013-03-15 2016-04-07 Carnegie Institution Of Washington Methods of Genome Sequencing and Epigenetic Analysis
CN105930862A (en) * 2016-04-13 2016-09-07 江南大学 Density peak clustering algorithm based on density adaptive distance
CN107368701A (en) * 2017-07-31 2017-11-21 浙江绍兴千寻生物科技有限公司 In high volume unicellular ATAC seq data quality controls and analysis method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030162219A1 (en) * 2000-12-29 2003-08-28 Sem Daniel S. Methods for predicting functional and structural properties of polypeptides using sequence models
US20160097088A1 (en) * 2013-03-15 2016-04-07 Carnegie Institution Of Washington Methods of Genome Sequencing and Epigenetic Analysis
CN105339503A (en) * 2013-05-23 2016-02-17 斯坦福大学托管董事会 Transposition into native chromatin for personal epigenomics
CN103955629A (en) * 2014-02-18 2014-07-30 吉林大学 Micro genome segment clustering method based on fuzzy k-mean
CN105930862A (en) * 2016-04-13 2016-09-07 江南大学 Density peak clustering algorithm based on density adaptive distance
CN107368701A (en) * 2017-07-31 2017-11-21 浙江绍兴千寻生物科技有限公司 In high volume unicellular ATAC seq data quality controls and analysis method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHANG TAO ET AL.: ""Identification, classification and phylogenetic analysis of SET domain gene in barley"", 《2010 4TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING》 *
高胜寒 等: ""复杂基因组测序技术研究进展"", 《遗传》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112270953A (en) * 2020-10-29 2021-01-26 哈尔滨因极科技有限公司 Analysis method, device and equipment based on BD single cell transcriptome sequencing data
WO2022188785A1 (en) * 2021-03-08 2022-09-15 中国科学院上海营养与健康研究所 Single cell transcriptome computation and analysis method and system incorporating deep learning model
CN112992267A (en) * 2021-04-13 2021-06-18 中国人民解放军军事科学院军事医学研究院 Single-cell transcription factor regulation network prediction method and device
CN112992267B (en) * 2021-04-13 2024-02-09 中国人民解放军军事科学院军事医学研究院 Single-cell transcription factor regulation network prediction method and device
CN113178233A (en) * 2021-04-27 2021-07-27 西安电子科技大学 Efficient clustering method for large-scale single-cell transcriptome data
CN113178233B (en) * 2021-04-27 2023-04-28 西安电子科技大学 Large-scale single-cell transcriptome data efficient clustering method
CN116981779A (en) * 2022-02-08 2023-10-31 染色质(北京)科技有限公司 Method for identifying chromatin structural features from a Hi-C matrix, non-transitory computer readable medium storing a program for identifying chromatin structural features from a Hi-C matrix

Also Published As

Publication number Publication date
CN111755071B (en) 2023-04-21

Similar Documents

Publication Publication Date Title
Ashhurst et al. Integration, exploration, and analysis of high‐dimensional single‐cell cytometry data using Spectre
Stuart et al. Single-cell chromatin state analysis with Signac
CN111755071A (en) Single cell chromatin accessibility sequencing data analysis method and system based on peak clustering
Forslund et al. Predicting protein function from domain content
Peng et al. Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data
Ji et al. An integrated software system for analyzing ChIP-chip and ChIP-seq data
Rizk et al. GASSST: global alignment short sequence search tool
Cannistraci et al. Minimum curvilinearity to enhance topological prediction of protein interactions by network embedding
Fraser et al. Evolutionary rate in the protein interaction network
US11954614B2 (en) Systems and methods for visualizing a pattern in a dataset
Persad et al. SEACells infers transcriptional and epigenomic cellular states from single-cell genomics data
Zaki et al. Protein-protein interaction based on pairwise similarity
Postic et al. An ambiguity principle for assigning protein structural domains
Ding et al. Biological process activity transformation of single cell gene expression for cross-species alignment
Singh et al. Schema: metric learning enables interpretable synthesis of heterogeneous single-cell modalities
Autio et al. Comparison of Affymetrix data normalization methods using 6,926 experiments across five array generations
Schmidt et al. Integrative analysis of epigenetics data identifies gene-specific regulatory elements
Persad et al. SEACells: Inference of transcriptional and epigenomic cellular states from single-cell genomics data
Wu et al. StackTADB: a stacking-based ensemble learning model for predicting the boundaries of topologically associating domains (TADs) accurately in fruit flies
Chen et al. Integration of spatial and single-cell data across modalities with weakly linked features
Jiang et al. Dimensionality reduction and visualization of single-cell RNA-seq data with an improved deep variational autoencoder
Yang et al. DeepCCI: a deep learning framework for identifying cell–cell interactions from single-cell RNA sequencing data
Turenne et al. Finding biomarkers in non-model species: literature mining of transcription factors involved in bovine embryo development
Liu et al. Are dropout imputation methods for scRNA-seq effective for scATAC-seq data?
Becker et al. Large-scale correlation network construction for unraveling the coordination of complex biological systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240130

Address after: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Patentee after: University of Science and Technology of China

Country or region after: China

Patentee after: Qu Kun

Address before: 230026 Jinzhai Road, Baohe District, Hefei, Anhui Province, No. 96

Patentee before: University of Science and Technology of China

Country or region before: China