CN114944190B - TAD (transcription activator) identification method and system based on Hi-C sequencing data - Google Patents
TAD (transcription activator) identification method and system based on Hi-C sequencing data Download PDFInfo
- Publication number
- CN114944190B CN114944190B CN202210512716.4A CN202210512716A CN114944190B CN 114944190 B CN114944190 B CN 114944190B CN 202210512716 A CN202210512716 A CN 202210512716A CN 114944190 B CN114944190 B CN 114944190B
- Authority
- CN
- China
- Prior art keywords
- tad
- chromosome
- sequencing data
- bin
- segment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000012163 sequencing technique Methods 0.000 title claims abstract description 46
- 239000012190 activator Substances 0.000 title description 2
- 238000013518 transcription Methods 0.000 title description 2
- 230000035897 transcription Effects 0.000 title description 2
- 210000000349 chromosome Anatomy 0.000 claims abstract description 70
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 19
- 239000012634 fragment Substances 0.000 claims abstract description 16
- 230000015654 memory Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 230000000630 rising effect Effects 0.000 claims description 6
- 238000011144 upstream manufacturing Methods 0.000 claims description 4
- 230000001174 ascending effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 2
- 230000007614 genetic variation Effects 0.000 abstract description 2
- 238000004590 computer program Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 108010077544 Chromatin Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 210000003483 chromatin Anatomy 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 108090000623 proteins and genes Proteins 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000005945 translocation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
- G16B15/20—Protein or domain folding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- General Health & Medical Sciences (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Chemical & Material Sciences (AREA)
- Medical Informatics (AREA)
- Crystallography & Structural Chemistry (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a TAD identification method and a system based on Hi-C sequencing data; wherein the method comprises the following steps: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments; performing TAD structure identification on each chromosome segment; based on the identified TAD structure, false positive results are identified. The whole chromosome Hi-C sequencing data is fully utilized, and the precision is improved; and meanwhile, a random restarting wandering algorithm and punishment operation are introduced, and the influence caused by genetic variation is effectively limited through punishment coefficients.
Description
Technical Field
The invention relates to the technical field of gene sequencing, in particular to a TAD identification method and system based on Hi-C sequencing data.
Background
The statements in this section merely relate to the background of the present disclosure and may not necessarily constitute prior art.
The study of the three-dimensional structure of chromosomes in space has achieved a certain result. Chromatin conformation capture (3C) is a technique developed by biologists to study one-to-one site interactions of chromosomal fragments, on the basis of which one-to-many site (4C), many-to-many site (5C) and full-to-full site techniques are developed, respectively. Wherein the whole pair of whole is called High-throughput chromosome conformation capture (High-throughputchromosomeconformationcapture), i.e., hi-C sequencing technology. Scientists have successively discovered spatial structures formed by cell chromosomes using Hi-C technology, such as topologically related regions (topologically associating domains, TADs), A/B compartment (A/B com-partment) chromatin loops (loops), and the like.
Topologically related regions (topologically associating domains, TADs), which are fragments that are highly folded within a region of the chromosome to form interactions, are of great relevance to genetics, development, disease and evolution. Therefore, the identification of TAD structures is required for a wide range of applications such as studying chromosome space conformation and function.
The inventor finds that the existing TAD recognition algorithm in Hi-C sequencing data generally needs a certain input parameter, and cannot meet the convenience of biological researchers. And their calculation results tend to be sensitive to the input parameters, with tiny parameters leading to disparate results.
Although the detection method and the detection system of the TAD nested structure in the Hi-C data of the China patent CN 113178230A-three-dimensional genome realize the detection of the TAD nested structure, the patent utilizes a deep learning mode to enhance the original Hi-C data and overcomes a large amount of resources required for acquiring high-precision data, but the patent introduces analog data for a TAD identification method and does not propose a TAD identification method based on the original data.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a TAD identification method and a TAD identification system based on Hi-C sequencing data; sensitivity to input parameters is reduced for researchers and clinicians without specialized bioinformatics knowledge. Meanwhile, the chromosome whole Hi-C data can be utilized to obtain a more accurate identification result.
In a first aspect, the invention provides a TAD identification method based on Hi-C sequencing data;
a TAD identification method based on Hi-C sequencing data, comprising:
obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
Performing TAD structure identification on each chromosome segment;
Based on the identified TAD structure, false positive results are identified.
In a second aspect, the invention provides a TAD identification system based on Hi-C sequencing data;
A TAD identification system based on Hi-C sequencing data, comprising:
An acquisition module configured to: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
A TAD structure identification module configured to: performing TAD structure identification on each chromosome segment;
a false positive identification module configured to: based on the identified TAD structure, false positive results are identified.
In a third aspect, the present invention also provides an electronic device, including:
a memory for non-transitory storage of computer readable instructions; and
A processor for executing the computer-readable instructions,
Wherein the computer readable instructions, when executed by the processor, perform the method of the first aspect described above.
In a fourth aspect, the invention also provides a storage medium storing non-transitory computer readable instructions, wherein the instructions of the method of the first aspect are executed when the non-transitory computer readable instructions are executed by a computer.
In a fifth aspect, the invention also provides a computer program product comprising a computer program for implementing the method of the first aspect described above when run on one or more processors.
Compared with the prior art, the invention has the beneficial effects that:
The scheme of the disclosure compares the data of TAD on chromosome Hi-C with the correlation diagram of the community, and provides a new idea for subsequent research; the whole chromosome Hi-C sequencing data is fully utilized, and the precision is improved; and meanwhile, a random restarting wandering algorithm and punishment operation are introduced, and the influence caused by genetic variation is effectively limited through punishment coefficients. In addition, unlike other disclosed algorithms, the present disclosure greatly reduces the sensitivity of the results to input parameters. Introducing a brand new identification TAD identification thought; reducing the dependence of the result of TAD identification on the parameter; the utilization rate of the chromosome Hi-C sequencing data is improved; the accuracy of the recognition result is improved. For researchers and clinicians without specialized bioinformatics knowledge, the protocol reduces the number of parameter choices and numerical difficulties, and can provide accurate analysis results.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
Fig. 1 is a flow chart of a method according to a first embodiment.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, unless the context clearly indicates otherwise, the singular forms also are intended to include the plural forms, and furthermore, it is to be understood that the terms "comprises" and "comprising" and any variations thereof are intended to cover non-exclusive inclusions, such as, for example, processes, methods, systems, products or devices that comprise a series of steps or units, are not necessarily limited to those steps or units that are expressly listed, but may include other steps or units that are not expressly listed or inherent to such processes, methods, products or devices.
Embodiments of the invention and features of the embodiments may be combined with each other without conflict.
All data acquisition in the embodiment is legal application of the data on the basis of meeting laws and regulations and agreements of users.
Example 1
The present example provides TAD identification methods based on Hi-C sequencing data;
As shown in fig. 1, the TAD identification method based on Hi-C sequencing data includes:
S101: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
s102: performing TAD structure identification on each chromosome segment;
s103: based on the identified TAD structure, false positive results are identified.
False positives are understood to mean incorrect TAD regions calculated due to experimental or computational errors.
Further, the S101 acquires Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments; the method specifically comprises the following steps:
obtaining Hi-C sequencing data of a single chromosome; wherein Hi-C sequencing data of a single chromosome is in a matrix structure;
calculating the local contact frequency of each segment bin in a single chromosome;
after each segment bin calculates a local contact frequency value, screening out the segment bin with the minimum value point;
starting from the bin where the minimum value point is located, calculating the maximum boundary with strictly monotonically rising local contact frequency to the left and the right respectively; the left-right boundary difference of each minimum value is referred to as the maximum rising distance;
Sorting the maximum ascending distance according to the order from big to small, and taking a plurality of values which are sorted to the front as TAD boundaries;
and dividing the whole chromosome according to the TAD boundary to obtain a plurality of chromosome fragments.
Further, the local contact frequency of each segment bin (length is called resolution, marked bin in Hi-C matrix) in the chromosome is calculated; the method comprises the following steps:
Where w is the resolution of the user input divided by 2MB, cont.freq is the frequency of contact between two bins and its value is the value of the matrix formed by Hi-C sequencing data. U, D refer to the upstream (up) and downstream (down) regions, respectively. The local contact frequency value local density describes the sum of the contacts of a bin with its upstream and downstream distances w, the TAD center has a maximum and the TAD boundary has a minimum.
According to the formula (1), each bin calculates a local contact frequency value, and the bin with the minimum value point is screened out. The minimum value is defined mathematically as a value smaller than both the left and right sides (neighborhood).
Starting from the bin where the minimum value point is located, calculating the maximum boundary with strictly monotonically rising local contact frequency to the left and the right respectively; the left-right boundary difference of each minimum value is referred to as the maximum rising distance.
The maximum ascending distance is ordered from the big to the small, a plurality of values which are ordered at the front are taken,
(This value may be entered by the user, and there is also a preset value of 45%), the bins corresponding to these first several values taken are determined as TAD boundaries (boundaries).
And dividing the whole chromosome according to the boundary to obtain a plurality of chromosome fragments.
It should be noted that the TAD structure exists inside a single chromosome, and the TAD structure does not exist between chromosomes, and under this condition, the input of the algorithm should be a single chromosome Hi-C matrix.
Further, the step S102: performing TAD structure identification on each chromosome segment; the method specifically comprises the following steps:
For each chromosome segment, adopting a random restart migration algorithm (RWR, random WALK WITH RESTART) to acquire the similarity between all segment bins in the current chromosome segment;
dividing the similarity between the two-segment bins and the distance between the two-segment bins (punishment operation) to obtain punishment results;
and taking the division result as input data of a tag propagation algorithm, and performing a tag propagation process on the input data by adopting the tag propagation algorithm (Label Propagation), wherein the output content of the tag propagation process is a community structure, and the community structure corresponds to the TAD structure in meaning.
The community structure is defined by a label propagation algorithm as a region, and the internal correlation of the region is higher than the correlation of the region and other regions.
It should be understood that the random restart walk algorithm can fully utilize global data to quickly find the association degree between every two bins.
It should be understood that the division operation (penalty operation) is performed on the similarity between the two bins and the distance between the two bins, and this step is to prevent the fast propagation of super nodes (super nodes) during the subsequent tag propagation, and reduce the errors caused by factors such as chromosome variation. The accuracy is improved for subsequent unsupervised learning, and meanwhile, the influence caused by factors such as chromosome copy number variation (Copy number variations, CNV), chromosome translocation (transfer) and the like is eliminated; and (3) running an unsupervised learning algorithm to complete the community discovery process, wherein the algorithm used by the method is a label propagation algorithm.
Further, S103: identifying a false positive result according to the identified TAD structure; the method specifically comprises the following steps:
according to biological conclusions, the standard range of TAD structures is between 180Kb to 2 Mb.
The user input parameter is a Hi-C matrix resolution (resolution) value, and two end point values (180 Kb, 2 Mb) of the standard range are divided by the resolution value to obtain a bin number range contained in a topology association area under the Hi-C matrix resolution;
According to the number range of the bin contained in the topology association area, false positive results which are not in the range in the identified TAD structure are filtered out.
Further, the method further comprises: the false positives are further filtered according to the frequency of contact difference (the difference of average interaction frequency between intra-domainand the corresponding inter-domain,DIFF) and the pearson correlation coefficient (Pearson correlation coefficient, PCC) between the interior of the quality index topology correlation region and the adjacent topology correlation region.
The pearson correlation coefficient is a statistically significant indicator for measuring the correlation within a data, and is equally applicable to describing correlations within topologically related regions. It should be clear that the correlation inside the TAD is extremely high, so that results with pearson correlation coefficients below 0.6 will be considered as false positive results.
The frequency of contact Difference (DIFF) between the inside of the topology associated region and the adjacent topology associated region is a quality evaluation index in TAD identification studies. The index calculates the sum of the contacts of all the bins within one TAD, and calculates the sum of the contacts between the bins respectively falling within the adjacent TAD.
It should be clear that the frequency of contact between the bins within a TAD is extremely high, and the frequency of contact between the bins of different TADs is extremely low. DIFF calculates the frequency difference of contact between the inside of a topologically related area and the adjacent topologically related area, which index is below 20, which will be regarded as a false positive result.
In addition, the DIFF and PCC values are often used as TAD evaluation indexes in related researches, the index calculation process is included, meanwhile, the index calculation process is also used for filtering false positive results, and finally the residual communities are used as TAD recognition results.
The scheme of the disclosure provides a user-friendly recognition algorithm, simplifies the analysis difficulty of bioinformatics for researchers and clinicians without professional bioinformatics knowledge, and provides more accurate calculation results.
Example two
The present embodiment provides a TAD identification system based on Hi-C sequencing data;
A TAD identification system based on Hi-C sequencing data, comprising:
An acquisition module configured to: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
A TAD structure identification module configured to: performing TAD structure identification on each chromosome segment;
a false positive identification module configured to: based on the identified TAD structure, false positive results are identified.
It should be noted that the acquiring module, the TAD structure identifying module, and the false positive identifying module correspond to steps S101 to S103 in the first embodiment, and the modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.
Example III
The embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein the processor is coupled to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software.
The method in the first embodiment may be directly implemented as a hardware processor executing or implemented by a combination of hardware and software modules in the processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.
The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (4)
1. A TAD identification method based on Hi-C sequencing data, comprising:
obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
Performing TAD structure identification on each chromosome segment;
identifying a false positive result according to the identified TAD structure;
The Hi-C sequencing data of a single chromosome are obtained; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments; the method specifically comprises the following steps:
obtaining Hi-C sequencing data of a single chromosome; wherein Hi-C sequencing data of a single chromosome is of a rectangular structure; calculating the local contact frequency of each segment bin in a single chromosome; after each segment bin calculates a local contact frequency value, screening out the segment bin with the minimum value point; starting from the bin where the minimum value point is located, calculating the maximum boundary with strictly monotonically rising local contact frequency to the left and the right respectively; the left-right boundary difference of each minimum value is referred to as the maximum rising distance; sorting the maximum ascending distance according to the order from big to small, and taking a plurality of values which are sorted to the front as TAD boundaries; dividing the whole chromosome according to the TAD boundary to obtain a plurality of chromosome fragments;
calculating the local contact frequency of each segment bin in a single chromosome; the method comprises the following steps:
Where w is the resolution input by the user divided by 2MB, cont.freq is the frequency of contact between two bins, the value of which is the value of the matrix formed by Hi-C sequencing data; u, D refer to upstream up and downstream down regions, respectively; the local contact frequency value local density describes the sum of the contacts of a bin with its upstream and downstream distances w, the TAD center has a maximum value, and the TAD boundary has a minimum value;
Performing TAD structure identification on each chromosome segment; the method specifically comprises the following steps:
for each chromosome segment, obtaining the similarity between every two segments of the bin in the current chromosome segment by adopting a random restarting walk algorithm; dividing the similarity between the two fragment bins and the distance between the two fragment bins to obtain a punishment result; taking the division result as input data of a tag propagation algorithm, and performing a tag propagation process on the input data by adopting the tag propagation algorithm, wherein the output content of the tag propagation process is a community structure, and the community structure corresponds to the TAD structure in meaning;
The community structure is defined as a region with internal correlation higher than the correlation of the region with other regions by a label propagation algorithm;
identifying a false positive result according to the identified TAD structure; the method specifically comprises the following steps:
according to biological conclusion, the standard range of TAD structures is between 180Kb and 2 Mb;
The user input parameter is a Hi-C matrix resolution value, and two end point values of the standard range are divided by a resolution value to obtain a bin number range contained in a topological association area under the Hi-C matrix resolution; filtering false positive results which are not in the range in the identified TAD structure according to the number range of the bin contained in the topology association area; further filtering false positive according to the contact frequency difference between the inside of the quality index topological correlation area and the adjacent topological correlation area and the pearson correlation coefficient; and finally, taking the residual community as a TAD recognition result.
2. TAD identification system based on Hi-C sequencing data, based on a TAD identification method based on Hi-C sequencing data according to claim 1, characterized by comprising:
An acquisition module configured to: obtaining Hi-C sequencing data of a single chromosome; segmenting Hi-C sequencing data of a single chromosome to generate a plurality of chromosome fragments;
A TAD structure identification module configured to: performing TAD structure identification on each chromosome segment;
a false positive identification module configured to: based on the identified TAD structure, false positive results are identified.
3. An electronic device, comprising:
a memory for non-transitory storage of computer readable instructions; and
A processor for executing the computer-readable instructions,
Wherein the computer readable instructions, when executed by the processor, perform the method of claim 1.
4. A storage medium storing computer readable instructions non-transitory, wherein the instructions of the method of claim 1 are performed when the non-transitory computer readable instructions are executed by a computer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210512716.4A CN114944190B (en) | 2022-05-12 | 2022-05-12 | TAD (transcription activator) identification method and system based on Hi-C sequencing data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210512716.4A CN114944190B (en) | 2022-05-12 | 2022-05-12 | TAD (transcription activator) identification method and system based on Hi-C sequencing data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114944190A CN114944190A (en) | 2022-08-26 |
CN114944190B true CN114944190B (en) | 2024-04-19 |
Family
ID=82907880
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210512716.4A Active CN114944190B (en) | 2022-05-12 | 2022-05-12 | TAD (transcription activator) identification method and system based on Hi-C sequencing data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114944190B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113946730A (en) * | 2021-10-19 | 2022-01-18 | 四川大学 | Gene data-based visual method for analyzing chromatin hierarchical structure |
CN114444286A (en) * | 2022-01-19 | 2022-05-06 | 四川大学 | Chromatin topological association domain prediction method based on spectral clustering and electronic device |
CN114446384A (en) * | 2022-03-14 | 2022-05-06 | 中南大学 | Prediction method and prediction system of chromosome topological correlation structure domain |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8843356B2 (en) * | 2002-12-27 | 2014-09-23 | Merck Sharp & Dohme Corp. | Computer systems and methods for associating genes with traits using cross species data |
WO2019094636A1 (en) * | 2017-11-09 | 2019-05-16 | Dovetail Genomics, Llc | Structural variant analysis |
US11074991B2 (en) * | 2017-12-27 | 2021-07-27 | The Jackson Laboratory | Methods for multiplex chromatin interaction analysis by droplet sequencing with single molecule precision |
US20190295684A1 (en) * | 2018-03-22 | 2019-09-26 | The Regents Of The University Of Michigan | Method and apparatus for analysis of chromatin interaction data |
-
2022
- 2022-05-12 CN CN202210512716.4A patent/CN114944190B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113946730A (en) * | 2021-10-19 | 2022-01-18 | 四川大学 | Gene data-based visual method for analyzing chromatin hierarchical structure |
CN114444286A (en) * | 2022-01-19 | 2022-05-06 | 四川大学 | Chromatin topological association domain prediction method based on spectral clustering and electronic device |
CN114446384A (en) * | 2022-03-14 | 2022-05-06 | 中南大学 | Prediction method and prediction system of chromosome topological correlation structure domain |
Non-Patent Citations (2)
Title |
---|
LPAD:usingnetworkconstructionandlabelpropagationtodetecttopologicallyassociatingdomainsfromHi-Cdata;JianLiu et al;《Briefings in Bioinformatics》;20230503;第3卷(第24期);第1-11页 * |
Topological domains in mammalian genomes identified by analysis of chromatin interactions;Jesse R. Dixon et al;《LETTER》;20120517;第376-379页 * |
Also Published As
Publication number | Publication date |
---|---|
CN114944190A (en) | 2022-08-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bader et al. | An automated method for finding molecular complexes in large protein interaction networks | |
Li et al. | Interaction graph mining for protein complexes using local clique merging | |
CN110517729B (en) | Method for excavating protein compound from dynamic and static protein interaction network | |
Zhang et al. | Computational methods for analysing multiscale 3D genome organization | |
CN109545275B (en) | Uncertain PPI network function module mining method based on fuzzy spectral clustering | |
CN114550906B (en) | Cancer subtype identification system based on multi-view robust representation | |
Wang et al. | An ensemble learning framework for detecting protein complexes from PPI networks | |
CN113889181A (en) | Medical event analysis method and device, computer equipment and storage medium | |
CN114944190B (en) | TAD (transcription activator) identification method and system based on Hi-C sequencing data | |
Liu et al. | DeepChIA-PET: Accurately predicting ChIA-PET from Hi-C and ChIP-seq with deep dilated networks | |
Pasupuleti | Detection of protein complexes in protein interaction networks using n-clubs | |
CN113539479A (en) | Similarity constraint-based miRNA-disease association prediction method and system | |
CN116564418B (en) | Cell group correlation network construction method, device, equipment and storage medium | |
Stilianoudakis et al. | preciseTAD: a transfer learning framework for 3D domain boundary prediction at base-pair resolution | |
CN111177190B (en) | Data processing method, device, electronic equipment and readable storage medium | |
Zhan et al. | Conformational analysis of chromosome structures reveals vital role of chromosome morphology in gene function | |
Murua et al. | Biclustering via semiparametric Bayesian inference | |
CN116052762A (en) | Method and server for matching drug molecules with target proteins | |
Dazard et al. | ROCS: a reproducibility index and confidence score for interaction proteomics studies | |
CN114446384A (en) | Prediction method and prediction system of chromosome topological correlation structure domain | |
CN111383717B (en) | Method and system for constructing biological information analysis reference data set | |
CN114550832A (en) | Methods, systems, and media for holistic screening of proteomic clinical biomarkers | |
US20170024514A1 (en) | Distance maps using multiple alignment consensus construction | |
CN118038990B (en) | Multi-level chromatin topological structure domain identification method and system based on community discovery | |
Yu et al. | A hybrid clustering algorithm for identifying modules in Protein? Protein Interaction networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |