CN114944190B - TAD (transcription activator) identification method and system based on Hi-C sequencing data - Google Patents

TAD (transcription activator) identification method and system based on Hi-C sequencing data Download PDF

Info

Publication number
CN114944190B
CN114944190B CN202210512716.4A CN202210512716A CN114944190B CN 114944190 B CN114944190 B CN 114944190B CN 202210512716 A CN202210512716 A CN 202210512716A CN 114944190 B CN114944190 B CN 114944190B
Authority
CN
China
Prior art keywords
tad
chromosome
sequencing data
bin
segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210512716.4A
Other languages
Chinese (zh)
Other versions
CN114944190A (en
Inventor
刘健
李平静
陈娇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nankai University
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202210512716.4A priority Critical patent/CN114944190B/en
Publication of CN114944190A publication Critical patent/CN114944190A/en
Application granted granted Critical
Publication of CN114944190B publication Critical patent/CN114944190B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B15/00ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
    • G16B15/20Protein or domain folding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Medical Informatics (AREA)
  • Crystallography & Structural Chemistry (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a TAD identification method and a system based on Hi-C sequencing data; wherein the method comprises the following steps: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments; performing TAD structure identification on each chromosome segment; based on the identified TAD structure, false positive results are identified. The whole chromosome Hi-C sequencing data is fully utilized, and the precision is improved; and meanwhile, a random restarting wandering algorithm and punishment operation are introduced, and the influence caused by genetic variation is effectively limited through punishment coefficients.

Description

TAD (transcription activator) identification method and system based on Hi-C sequencing data
Technical Field
The invention relates to the technical field of gene sequencing, in particular to a TAD identification method and system based on Hi-C sequencing data.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
The study of the three-dimensional structure of chromosomes in space has achieved a certain result. Chromatin conformation capture (3C) is a technique developed by biologists to study one-to-one site interactions of chromosomal fragments, on the basis of which one-to-many site (4C), many-to-many site (5C) and full-to-full site techniques are developed, respectively. Wherein the whole pair of whole is called High-throughput chromosome conformation capture (High-throughputchromosomeconformationcapture), i.e., hi-C sequencing technology. Scientists have successively discovered spatial structures formed by cell chromosomes using Hi-C technology, such as topologically related regions (topologically associating domains, TADs), A/B compartment (A/B com-partment) chromatin loops (loops), and the like.
Topologically related regions (topologically associating domains, TADs), which are fragments that are highly folded within a region of the chromosome to form interactions, are of great relevance to genetics, development, disease and evolution. Therefore, the identification of TAD structures is required for a wide range of applications such as studying chromosome space conformation and function.
The inventor finds that the existing TAD recognition algorithm in Hi-C sequencing data generally needs a certain input parameter, and cannot meet the convenience of biological researchers. And their calculation results tend to be sensitive to the input parameters, with tiny parameters leading to disparate results.
Although the detection method and the detection system of the TAD nested structure in the Hi-C data of the China patent CN 113178230A-three-dimensional genome realize the detection of the TAD nested structure, the patent utilizes a deep learning mode to enhance the original Hi-C data and overcomes a large amount of resources required for acquiring high-precision data, but the patent introduces analog data for a TAD identification method and does not propose a TAD identification method based on the original data.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a TAD identification method and a TAD identification system based on Hi-C sequencing data; sensitivity to input parameters is reduced for researchers and clinicians without specialized bioinformatics knowledge. Meanwhile, the chromosome whole Hi-C data can be utilized to obtain a more accurate identification result.
In a first aspect, the invention provides a TAD identification method based on Hi-C sequencing data;
a TAD identification method based on Hi-C sequencing data, comprising:
obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
Performing TAD structure identification on each chromosome segment;
Based on the identified TAD structure, false positive results are identified.
In a second aspect, the invention provides a TAD identification system based on Hi-C sequencing data;
A TAD identification system based on Hi-C sequencing data, comprising:
An acquisition module configured to: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
A TAD structure identification module configured to: performing TAD structure identification on each chromosome segment;
a false positive identification module configured to: based on the identified TAD structure, false positive results are identified.
In a third aspect, the present invention also provides an electronic device, including:
a memory for non-transitory storage of computer readable instructions; and
A processor for executing the computer-readable instructions,
Wherein the computer readable instructions, when executed by the processor, perform the method of the first aspect described above.
In a fourth aspect, the invention also provides a storage medium storing non-transitory computer readable instructions, wherein the instructions of the method of the first aspect are executed when the non-transitory computer readable instructions are executed by a computer.
In a fifth aspect, the invention also provides a computer program product comprising a computer program for implementing the method of the first aspect described above when run on one or more processors.
Compared with the prior art, the invention has the beneficial effects that:
The scheme of the disclosure compares the data of TAD on chromosome Hi-C with the correlation diagram of the community, and provides a new idea for subsequent research; the whole chromosome Hi-C sequencing data is fully utilized, and the precision is improved; and meanwhile, a random restarting wandering algorithm and punishment operation are introduced, and the influence caused by genetic variation is effectively limited through punishment coefficients. In addition, unlike other disclosed algorithms, the present disclosure greatly reduces the sensitivity of the results to input parameters. Introducing a brand new identification TAD identification thought; reducing the dependence of the result of TAD identification on the parameter; the utilization rate of the chromosome Hi-C sequencing data is improved; the accuracy of the recognition result is improved. For researchers and clinicians without specialized bioinformatics knowledge, the protocol reduces the number of parameter choices and numerical difficulties, and can provide accurate analysis results.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flow chart of a method according to a first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
All data acquisition in the embodiment is legal application of the data on the basis of meeting laws and regulations and agreements of users.
Example 1
The present example provides TAD identification methods based on Hi-C sequencing data;
As shown in fig. 1, the TAD identification method based on Hi-C sequencing data includes:
S101: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
s102: performing TAD structure identification on each chromosome segment;
s103: based on the identified TAD structure, false positive results are identified.
False positives are understood to mean incorrect TAD regions calculated due to experimental or computational errors.
Further, the S101 acquires Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments; the method specifically comprises the following steps:
obtaining Hi-C sequencing data of a single chromosome; wherein Hi-C sequencing data of a single chromosome is in a matrix structure;
calculating the local contact frequency of each segment bin in a single chromosome;
after each segment bin calculates a local contact frequency value, screening out the segment bin with the minimum value point;
starting from the bin where the minimum value point is located, calculating the maximum boundary with strictly monotonically rising local contact frequency to the left and the right respectively; the left-right boundary difference of each minimum value is referred to as the maximum rising distance;
Sorting the maximum ascending distance according to the order from big to small, and taking a plurality of values which are sorted to the front as TAD boundaries;
and dividing the whole chromosome according to the TAD boundary to obtain a plurality of chromosome fragments.
Further, the local contact frequency of each segment bin (length is called resolution, marked bin in Hi-C matrix) in the chromosome is calculated; the method comprises the following steps:
Where w is the resolution of the user input divided by 2MB, cont.freq is the frequency of contact between two bins and its value is the value of the matrix formed by Hi-C sequencing data. U, D refer to the upstream (up) and downstream (down) regions, respectively. The local contact frequency value local density describes the sum of the contacts of a bin with its upstream and downstream distances w, the TAD center has a maximum and the TAD boundary has a minimum.
According to the formula (1), each bin calculates a local contact frequency value, and the bin with the minimum value point is screened out. The minimum value is defined mathematically as a value smaller than both the left and right sides (neighborhood).
Starting from the bin where the minimum value point is located, calculating the maximum boundary with strictly monotonically rising local contact frequency to the left and the right respectively; the left-right boundary difference of each minimum value is referred to as the maximum rising distance.
The maximum ascending distance is ordered from the big to the small, a plurality of values which are ordered at the front are taken,
(This value may be entered by the user, and there is also a preset value of 45%), the bins corresponding to these first several values taken are determined as TAD boundaries (boundaries).
And dividing the whole chromosome according to the boundary to obtain a plurality of chromosome fragments.
It should be noted that the TAD structure exists inside a single chromosome, and the TAD structure does not exist between chromosomes, and under this condition, the input of the algorithm should be a single chromosome Hi-C matrix.
Further, the step S102: performing TAD structure identification on each chromosome segment; the method specifically comprises the following steps:
For each chromosome segment, adopting a random restart migration algorithm (RWR, random WALK WITH RESTART) to acquire the similarity between all segment bins in the current chromosome segment;
dividing the similarity between the two-segment bins and the distance between the two-segment bins (punishment operation) to obtain punishment results;
and taking the division result as input data of a tag propagation algorithm, and performing a tag propagation process on the input data by adopting the tag propagation algorithm (Label Propagation), wherein the output content of the tag propagation process is a community structure, and the community structure corresponds to the TAD structure in meaning.
The community structure is defined by a label propagation algorithm as a region, and the internal correlation of the region is higher than the correlation of the region and other regions.
It should be understood that the random restart walk algorithm can fully utilize global data to quickly find the association degree between every two bins.
It should be understood that the division operation (penalty operation) is performed on the similarity between the two bins and the distance between the two bins, and this step is to prevent the fast propagation of super nodes (super nodes) during the subsequent tag propagation, and reduce the errors caused by factors such as chromosome variation. The accuracy is improved for subsequent unsupervised learning, and meanwhile, the influence caused by factors such as chromosome copy number variation (Copy number variations, CNV), chromosome translocation (transfer) and the like is eliminated; and (3) running an unsupervised learning algorithm to complete the community discovery process, wherein the algorithm used by the method is a label propagation algorithm.
Further, S103: identifying a false positive result according to the identified TAD structure; the method specifically comprises the following steps:
according to biological conclusions, the standard range of TAD structures is between 180Kb to 2 Mb.
The user input parameter is a Hi-C matrix resolution (resolution) value, and two end point values (180 Kb, 2 Mb) of the standard range are divided by the resolution value to obtain a bin number range contained in a topology association area under the Hi-C matrix resolution;
According to the number range of the bin contained in the topology association area, false positive results which are not in the range in the identified TAD structure are filtered out.
Further, the method further comprises: the false positives are further filtered according to the frequency of contact difference (the difference of average interaction frequency between intra-domainand the corresponding inter-domain,DIFF) and the pearson correlation coefficient (Pearson correlation coefficient, PCC) between the interior of the quality index topology correlation region and the adjacent topology correlation region.
The pearson correlation coefficient is a statistically significant indicator for measuring the correlation within a data, and is equally applicable to describing correlations within topologically related regions. It should be clear that the correlation inside the TAD is extremely high, so that results with pearson correlation coefficients below 0.6 will be considered as false positive results.
The frequency of contact Difference (DIFF) between the inside of the topology associated region and the adjacent topology associated region is a quality evaluation index in TAD identification studies. The index calculates the sum of the contacts of all the bins within one TAD, and calculates the sum of the contacts between the bins respectively falling within the adjacent TAD.
It should be clear that the frequency of contact between the bins within a TAD is extremely high, and the frequency of contact between the bins of different TADs is extremely low. DIFF calculates the frequency difference of contact between the inside of a topologically related area and the adjacent topologically related area, which index is below 20, which will be regarded as a false positive result.
In addition, the DIFF and PCC values are often used as TAD evaluation indexes in related researches, the index calculation process is included, meanwhile, the index calculation process is also used for filtering false positive results, and finally the residual communities are used as TAD recognition results.
The scheme of the disclosure provides a user-friendly recognition algorithm, simplifies the analysis difficulty of bioinformatics for researchers and clinicians without professional bioinformatics knowledge, and provides more accurate calculation results.
Example two
The present embodiment provides a TAD identification system based on Hi-C sequencing data;
A TAD identification system based on Hi-C sequencing data, comprising:
An acquisition module configured to: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
A TAD structure identification module configured to: performing TAD structure identification on each chromosome segment;
a false positive identification module configured to: based on the identified TAD structure, false positive results are identified.
It should be noted that the acquiring module, the TAD structure identifying module, and the false positive identifying module correspond to steps S101 to S103 in the first embodiment, and the modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.
Example III
The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.
The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. A TAD identification method based on Hi-C sequencing data, comprising:
obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
Performing TAD structure identification on each chromosome segment;
identifying a false positive result according to the identified TAD structure;
The Hi-C sequencing data of a single chromosome are obtained; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments; the method specifically comprises the following steps:
obtaining Hi-C sequencing data of a single chromosome; wherein Hi-C sequencing data of a single chromosome is of a rectangular structure; calculating the local contact frequency of each segment bin in a single chromosome; after each segment bin calculates a local contact frequency value, screening out the segment bin with the minimum value point; starting from the bin where the minimum value point is located, calculating the maximum boundary with strictly monotonically rising local contact frequency to the left and the right respectively; the left-right boundary difference of each minimum value is referred to as the maximum rising distance; sorting the maximum ascending distance according to the order from big to small, and taking a plurality of values which are sorted to the front as TAD boundaries; dividing the whole chromosome according to the TAD boundary to obtain a plurality of chromosome fragments;
calculating the local contact frequency of each segment bin in a single chromosome; the method comprises the following steps:
Where w is the resolution input by the user divided by 2MB, cont.freq is the frequency of contact between two bins, the value of which is the value of the matrix formed by Hi-C sequencing data; u, D refer to upstream up and downstream down regions, respectively; the local contact frequency value local density describes the sum of the contacts of a bin with its upstream and downstream distances w, the TAD center has a maximum value, and the TAD boundary has a minimum value;
Performing TAD structure identification on each chromosome segment; the method specifically comprises the following steps:
for each chromosome segment, obtaining the similarity between every two segments of the bin in the current chromosome segment by adopting a random restarting walk algorithm; dividing the similarity between the two fragment bins and the distance between the two fragment bins to obtain a punishment result; taking the division result as input data of a tag propagation algorithm, and performing a tag propagation process on the input data by adopting the tag propagation algorithm, wherein the output content of the tag propagation process is a community structure, and the community structure corresponds to the TAD structure in meaning;
The community structure is defined as a region with internal correlation higher than the correlation of the region with other regions by a label propagation algorithm;
identifying a false positive result according to the identified TAD structure; the method specifically comprises the following steps:
according to biological conclusion, the standard range of TAD structures is between 180Kb and 2 Mb;
The user input parameter is a Hi-C matrix resolution value, and two end point values of the standard range are divided by a resolution value to obtain a bin number range contained in a topological association area under the Hi-C matrix resolution; filtering false positive results which are not in the range in the identified TAD structure according to the number range of the bin contained in the topology association area; further filtering false positive according to the contact frequency difference between the inside of the quality index topological correlation area and the adjacent topological correlation area and the pearson correlation coefficient; and finally, taking the residual community as a TAD recognition result.
2. TAD identification system based on Hi-C sequencing data, based on a TAD identification method based on Hi-C sequencing data according to claim 1, characterized by comprising:
An acquisition module configured to: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
A TAD structure identification module configured to: performing TAD structure identification on each chromosome segment;
a false positive identification module configured to: based on the identified TAD structure, false positive results are identified.
3. An electronic device, comprising:
a memory for non-transitory storage of computer readable instructions; and
A processor for executing the computer-readable instructions,
Wherein the computer readable instructions, when executed by the processor, perform the method of claim 1.
4. A storage medium storing computer readable instructions non-transitory, wherein the instructions of the method of claim 1 are performed when the non-transitory computer readable instructions are executed by a computer.
CN202210512716.4A 2022-05-12 2022-05-12 TAD (transcription activator) identification method and system based on Hi-C sequencing data Active CN114944190B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210512716.4A CN114944190B (en) 2022-05-12 2022-05-12 TAD (transcription activator) identification method and system based on Hi-C sequencing data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210512716.4A CN114944190B (en) 2022-05-12 2022-05-12 TAD (transcription activator) identification method and system based on Hi-C sequencing data

Publications (2)

Publication Number Publication Date
CN114944190A CN114944190A (en) 2022-08-26
CN114944190B true CN114944190B (en) 2024-04-19

Family

ID=82907880

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210512716.4A Active CN114944190B (en) 2022-05-12 2022-05-12 TAD (transcription activator) identification method and system based on Hi-C sequencing data

Country Status (1)

Country Link
CN (1) CN114944190B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113946730A (en) * 2021-10-19 2022-01-18 四川大学 Gene data-based visual method for analyzing chromatin hierarchical structure
CN114444286A (en) * 2022-01-19 2022-05-06 四川大学 Chromatin topological association domain prediction method based on spectral clustering and electronic device
CN114446384A (en) * 2022-03-14 2022-05-06 中南大学 Prediction method and prediction system of chromosome topological correlation structure domain

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8843356B2 (en) * 2002-12-27 2014-09-23 Merck Sharp & Dohme Corp. Computer systems and methods for associating genes with traits using cross species data
WO2019094636A1 (en) * 2017-11-09 2019-05-16 Dovetail Genomics, Llc Structural variant analysis
US11074991B2 (en) * 2017-12-27 2021-07-27 The Jackson Laboratory Methods for multiplex chromatin interaction analysis by droplet sequencing with single molecule precision
US20190295684A1 (en) * 2018-03-22 2019-09-26 The Regents Of The University Of Michigan Method and apparatus for analysis of chromatin interaction data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113946730A (en) * 2021-10-19 2022-01-18 四川大学 Gene data-based visual method for analyzing chromatin hierarchical structure
CN114444286A (en) * 2022-01-19 2022-05-06 四川大学 Chromatin topological association domain prediction method based on spectral clustering and electronic device
CN114446384A (en) * 2022-03-14 2022-05-06 中南大学 Prediction method and prediction system of chromosome topological correlation structure domain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LPAD:usingnetworkconstructionandlabelpropagationtodetecttopologicallyassociatingdomainsfromHi-Cdata;JianLiu et al;《Briefings in Bioinformatics》;20230503;第3卷(第24期);第1-11页 *
Topological domains in mammalian genomes identified by analysis of chromatin interactions;Jesse R. Dixon et al;《LETTER》;20120517;第376-379页 *

Also Published As

Publication number Publication date
CN114944190A (en) 2022-08-26

Similar Documents

Publication Publication Date Title
Bader et al. An automated method for finding molecular complexes in large protein interaction networks
Li et al. Interaction graph mining for protein complexes using local clique merging
CN110517729B (en) Method for excavating protein compound from dynamic and static protein interaction network
Zhang et al. Computational methods for analysing multiscale 3D genome organization
CN109545275B (en) Uncertain PPI network function module mining method based on fuzzy spectral clustering
CN114550906B (en) Cancer subtype identification system based on multi-view robust representation
Wang et al. An ensemble learning framework for detecting protein complexes from PPI networks
CN113889181A (en) Medical event analysis method and device, computer equipment and storage medium
CN114944190B (en) TAD (transcription activator) identification method and system based on Hi-C sequencing data
Liu et al. DeepChIA-PET: Accurately predicting ChIA-PET from Hi-C and ChIP-seq with deep dilated networks
Pasupuleti Detection of protein complexes in protein interaction networks using n-clubs
CN113539479A (en) Similarity constraint-based miRNA-disease association prediction method and system
CN116564418B (en) Cell group correlation network construction method, device, equipment and storage medium
Stilianoudakis et al. preciseTAD: a transfer learning framework for 3D domain boundary prediction at base-pair resolution
CN111177190B (en) Data processing method, device, electronic equipment and readable storage medium
Zhan et al. Conformational analysis of chromosome structures reveals vital role of chromosome morphology in gene function
Murua et al. Biclustering via semiparametric Bayesian inference
CN116052762A (en) Method and server for matching drug molecules with target proteins
Dazard et al. ROCS: a reproducibility index and confidence score for interaction proteomics studies
CN114446384A (en) Prediction method and prediction system of chromosome topological correlation structure domain
CN111383717B (en) Method and system for constructing biological information analysis reference data set
CN114550832A (en) Methods, systems, and media for holistic screening of proteomic clinical biomarkers
US20170024514A1 (en) Distance maps using multiple alignment consensus construction
CN118038990B (en) Multi-level chromatin topological structure domain identification method and system based on community discovery
Yu et al. A hybrid clustering algorithm for identifying modules in Protein? Protein Interaction networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant