CN117737216A - Method for detecting genome information based on restriction enzyme - Google Patents

Method for detecting genome information based on restriction enzyme Download PDF

Info

Publication number
CN117737216A
CN117737216A CN202410122596.6A CN202410122596A CN117737216A CN 117737216 A CN117737216 A CN 117737216A CN 202410122596 A CN202410122596 A CN 202410122596A CN 117737216 A CN117737216 A CN 117737216A
Authority
CN
China
Prior art keywords
genome
dna
information
sequencing
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410122596.6A
Other languages
Chinese (zh)
Inventor
汤富酬
文路
王艳
陈怡珺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202410122596.6A priority Critical patent/CN117737216A/en
Publication of CN117737216A publication Critical patent/CN117737216A/en
Pending legal-status Critical Current

Links

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method for detecting genome information based on restriction enzymes. The method comprises the steps of cutting a genome of a sample by using restriction enzymes to obtain genome DNA fragments with different lengths, enriching the amplified or non-amplified DNA into long fragments, sequencing the enriched long fragment genome DNA fragments on a long length sequencing platform, and finally analyzing the sequenced data by a computer. The method of the invention obviously improves the probability of detecting two alleles at the same time, obviously reduces the allele deletion rate, can better detect heterozygous tumor mutation, and has important significance for early diagnosis and treatment of tumors.

Description

Method for detecting genome information based on restriction enzyme
Technical Field
The invention belongs to the technical field of genome detection, and particularly relates to a method for detecting genome information based on restriction enzymes.
Background
Cells are the fundamental building blocks of organisms, in each of which genetic information is stored in the form of chromosomes. It is generally considered that all cells of each individual have the same genome, so that genomic studies can be performed at the species or individual level, but in several cases, one needs to conduct genomic studies from the single cell scale: (1) Cells are at a premium and are rare in number, e.g., human oocytes, embryonic cells, and circulating tumor cells; (2) Different cells have their unique genomes, e.g., sperm cells of the same individual possess different genomes due to meiotic homologous recombination; (3) Cell lineage tracking, the change of the genome of a single cell with time, and the change of the genome can be used for reflecting the evolution of the cell with time; (4) Single cell genomes have heterogeneity such as tumors, nerves, immunity, and chimeras. Single cell genome sequencing techniques have therefore evolved.
The single cell genome has only about 6 pg of DNA, which is much less than the amount of DNA required for high throughput sequencing, so uniform amplification is required before sequencing can be performed. The development of Whole genome amplification (white-genome amplification, WGA) technology has enabled the amplification of sufficiently sequenced genomic DNA in Single cells to investigate the genetic heterogeneity of cells, including Single-nucleotide variations (Single-Nucleotide Variations, SNVs), copy-number variations (Copy-Number Variations, CNVs) and structural variations (Structural Variations, SVs). A variety of single cell whole genome amplification techniques have been developed based on the second generation sequencing (Next-Generation Sequencing, NGS) platform, such as degenerate oligonucleotide primer polymerase chain reaction (Degenerate Oligonucleotide-Primed Polymerase Chain Reaction, DOP-PCR), multiplex displacement amplification (Multiple Displacement Amplification, MDA), multiplex annealing and loop-based amplification (Multiple Annealing and Looping-Based Amplification Cycles, MALDBAC), transposon insertion linear amplification (Linear Amplification via Transposon Insertion, LIANTI), primary template directed amplification (primary template-directed amplification, PTA) and complementary strand multiplex end marker amplification (multiplexed end-tagging amplification of complementary strands, META-CS). The method of META-CS published by the group Xie Xiaoliang is used for eliminating almost all false positives in SNV detection by utilizing the complementarity of DNA, and only mutation sites supported by both a forward chain and a reverse chain are judged as SNVs, so that the highest precision is achieved up to now.
Because of the high sequencing accuracy of NGS platforms, these techniques are very powerful in the detection of CNVs and SNVs, but suffer from limitations in read length, and therefore have poor performance in the detection of SVs. SVs, including deletions, insertions, repeats, and translocations, are important types of variation for many heritable diseases (e.g., cancer). Therefore, studying SVs at single cell resolution is a critical issue.
Based on the long-read long sequencing platform, the Third Generation Sequencing (TGS) platform, the subject group developed long-fragment single-molecule real-time sequencing (single-molecule real-time sequencing of long fragments amplified through transposon insertion, SMOOTH-seq) amplified by transposon insertion, single-cell genomic DNA was randomly fragmented using low concentrations of Tn5 transposase to achieve relatively uniform genomic amplification. In addition to CNVs and SNVs, SMOOTH-seq can also effectively detect SVs. However, in diploid cells, the simultaneous coverage of both alleles is very limited, so that SMOOTH-seq has high false negatives when detecting heterozygous SNPs (hetSNPs).
Allele deletions are an important problem faced by single cell whole genome amplification techniques. When a diploid cell heterozygously mutates, either allele can lead to allele loss if it cannot be amplified and detected, which is a major cause of false negatives in SNVs. Previous single cell whole genome approaches either amplified by random primers or by random fragmentation of Tn5, these random disruption or amplification approaches do not favor simultaneous allele capture. For example, for a pair of alleles A and B, if the genome coverage is n%, i.e., there is a n% probability of capturing either gene A or gene B, then the probability of capturing both alleles is n%. Times.n%, i.e., n 2%o. The probability of capturing both alleles at the same time is very low.
Since both alleles in a diploid genome typically have identical restriction enzyme recognition sites, homologous DNA fragments generated by digestion typically have identical lengths, and are easier to amplify simultaneously than DNA fragments generated by random transposition of Tn5 or random primer amplification. Based on the above, the invention develops a single-cell long-reading long whole genome sequencing technology Refresh-seq (Restriction fragments ligation-based genome amplification and third-generation sequencing) based on a restriction enzyme cutting and connecting strategy, and the probability of simultaneous detection of two alleles is remarkably improved. The absence of alleles results in false negatives in prenatal diagnosis, as one of the alleles may be detected and the other may not be detected, and for a single-gene pathogenic abnormal embryo (heterozygous mutation), if only normal ones are detected, the absence of pathogenic mutant ones due to allele deletions may misjudge this embryo as a normal embryo. The method significantly reduces allele loss rate, can reduce such misjudgment, and can select healthy embryo to promote prenatal and postnatal care. In addition, tumor cells have higher mutation load, and the mutation is usually heterozygous, and the previous method is easy to underestimate the mutation condition of tumor genome due to allele deletion, so that the method can better detect heterozygous tumor mutation, and has important significance for early diagnosis and treatment of tumors.
Disclosure of Invention
The invention aims to provide a method for detecting genome information based on restriction enzymes.
A method for detecting genomic information based on restriction enzymes, comprising the steps of:
(1) Cutting the genome of the sample by adopting restriction enzyme to obtain genome DNA fragments with different lengths; alleles of homologous chromosomes are now typically cut into DNA fragments of the same length;
the invention deduces the distribution of genome fragments after enzyme digestion by performing enzyme section simulation on the genome of a target species, thereby selecting proper restriction enzymes (figure 3); cell lysis is carried out under a small volume system to release genomic DNA;
the restriction enzyme is a restriction enzyme recognizing a specific sequence of 4-10 bp, preferably,in order to recognize restriction enzymes of the specific sequences 6 bp, 8 bp, more preferably, the restriction enzymes areEcoR I、SacI andAsiS I。
for the human genome, the restriction enzyme fragments of the 6 bp recognition sequence were mostly distributed in the range of 1-8 kb, while the restriction enzyme fragments of the 8 bp recognition sequence were mostly distributed in the range of 15 kb-16 Mb (FIG. 3). Thus, it is desirable to obtain a higher coverage of restriction enzymes that select for the 6 bp recognition sequence, e.g EcoR I、SacI, selection of endonucleases with 8 bp recognition sequences when better enrichment is desired, e.gAsiS I. The enzyme sections are required to be distributed in a concentrated manner as much as possible when higher coverage is desired, i.e., the cut DNA fragments have similar lengths and are concentrated between 1 and 3 and kb, and thus better amplification uniformity is achieved, and both genome coverage and the detection rate of two alleles can be achieved.
In the above cell lysis step, the cells may be derived from any of human, animal, plant and microorganism;
(2) Carrying out long genome DNA fragment enrichment on the genome sample which is amplified or not amplified;
(3) Sequencing the enriched long genome DNA fragments on a sequencing platform by a machine;
(4) And carrying out computer analysis on the data obtained by sequencing, and obtaining the sequence information of the sample in the genome region through comparing and calculating by replying the long genome DNA fragment to the genome region. The sequence information includes genetic and epigenetic information.
The genomic samples are episomal DNA, DNA released from cells (e.g., embryos or eggs) in culture medium, one or more cells or nuclei, viruses, mitochondria, chloroplasts, and other sample genomes.
The restriction enzyme selected in the step (1) is selected by performing enzyme section simulation on the genome of the target species and deducing the distribution of genome fragments after enzyme cutting.
Preferably, the step (2) performs end repair, A addition and adaptor connection on the genome DNA fragments, performs PCR amplification, and enriches the long genome DNA fragments after the amplification. The linker used may be a non-bar coded linker or a bar coded linker.
Each PCR tube is independently carried out in the subsequent purification and library establishment process by using the joint without the bar code, and the joints of the 5 'end and the 3' end are arranged during PCR amplification; the joint with the bar code (namely, the joint is provided with a 5 'end joint when the joint is connected), and after the joint is connected, sample tubes with different bar codes are mixed and purified, amplified in one tube and amplified to form a 3' end joint.
The long genomic DNA fragment in step (2) refers to a fragment having a length of more than 700 nucleotide pairs, preferably a fragment having a length of more than 1000 nucleotide pairs.
And (3) amplifying the DNA into a polymerase chain reaction, and enriching long genomic DNA fragments by adopting the polymerase chain reaction and fragment screening, wherein the fragment screening is film-running fragment screening or magnetic bead fragment screening.
For large initial amounts of samples, genomic fragment screening was performed directly after restriction enzyme cleavage. By adopting the enzyme digestion method, the fixed region has a fixed fragment size, and then fragments with specific sizes are enriched through fragment screening, so that the fixed region can be enriched, and the sequencing region is concentrated in the genome region with specific size, that is, the sequencing depth of the genome region with specific fragment length is increased, and the sequencing depth of the genome region with non-specific fragment length is reduced or not detected. Allele information in these regions can thus be detected more sensitively.
For a small amount of initial samples and single cell samples, the method comprises the steps of connecting and amplifying after enzyme digestion, and screening film-running fragments or magnetic bead fragments. The adaptor ligation and PCR amplification also have an effect on fragment screening, i.e., the adaptor ligation efficiency of excessively long DNA fragments is reduced, and PCR amplification preferentially amplifies short fragments, thus filtering out excessively long fragments. Small fragments were filtered out by run-to-film or magnetic bead fragment screening, so the ability to screen sample fragments amplified by ligation junctions was greater, and the final fragment length of the library was distributed predominantly between 1-3 kb. In addition, PCR can further enrich for alleles, increasing the probability of simultaneous detection of both alleles, since the amplification efficiency of the allele regions tends to be consistent.
The sequencing platform in the step (3) is a long-reading long-sequencing platform, and optionally, the sequencing platform is a Nanopore sequencing platform or a PacBio sequencing platform and other long-reading long-sequencing platforms which are developed later.
The problem of sequencing quality due to amplification errors introduced by PCR during the construction of libraries in NGS sequencing, the read length limitations of the sequenced sequences (typically less than 500 bp) make NGS technology difficult to meet the higher requirements of some modern biological problems: such as the determination of longer repetitive fragments on DNA, the determination of DNA/RNA methylation modification problems, the determination of structural variants, and the like. The advent of long-read long sequencing technology has made up for the shortcomings of NGS. Currently, there are two types of platforms, namely Single-molecule real-time Sequencing (SMRT) of Pacific Biosciences (pacdio) and Single-nanopore sequencing of Oxford Nanopore Technologies (ONT) of the company. PacBio is an SMRT based on zero-mode waveguiding (ZMW). ZMW is a nanophotonic confinement structure consisting of circular holes in an aluminum coating placed on a transparent silica substrate. The ZMW holes were about 70 a nm a in diameter and about 100 a nm a in depth. Because of the small aperture of the ZMW, the optical field exhibits an exponential decay as light passes through the aperture of the ZMW. The activity of a DNA polymerase comprising a single nucleotide can be readily detected in the irradiated ZMW well. PacBIO SMRT sequencing technology uses topological circular DNA molecules as a template library (called SMRT bell), wherein the SMRT bell consists of two ends of an inserted double-stranded DNA fragment through a hairpin structure connected, and is a closed single-stranded circular DNA. Wherein the length of the inserted DNA fragment may vary from 1 to more than hundred thousand bases, such that long sequencing reads may be generated. After SMRTbell is assembled, it is bound by DNA polymerase and loaded onto SMRT Cell, which contains up to 800 tens of thousands of ZMW. In each ZMW, a single polymerase is immobilized at the bottom, which can bind to any of the hairpin junctions of SMRT bell and begin replication. In sequencing-by-synthesis, the polymerase processes with SMRT bell as a template to incorporate four fluorescent-labeled deoxynucleoside triphosphates that produce different emission spectra into the nascent strand. The fluorescence mark on the nucleotide substrate can be excited by the excitation light at the bottom of the small hole, and then the fluorescence signal is recorded by the monitoring system, so that the base information is obtained. The whole sequencing process DNA molecules do not need to be amplified by PCR, so that the individual sequencing of each DNA molecule is realized. Currently, pacBio sequencing has two common modes of sequencing, continuous long reads (Continuous Long Reads, CLR) and rolling circle sequencing (Circular Consensus Sequencing, CCS). ONT is an electrical signal based sequencing technology, the core of which is protein nanopores. The basic working principle of the nanopore is: a nanoscale pore is formed between the two electrolyte chambers, and a water impermeable membrane is provided between the electrolyte chambers, with the protein nanopores being embedded in a synthetic membrane, about hundreds to thousands of nanopores, which are immersed in the electrophysiological solution (the synthetic membrane has a very high electrical resistance, while the nature of the protein nanopores is such that channels are formed in the membrane). When a voltage is applied to the electrolyte chamber, a steady state ionic current is generated through the pores. The passage of macromolecules in the pores can cause transient changes in the ion flux through the pores, and thus monitoring the current through the pores can enable molecular sensing. These current fluctuations convey many characteristics of the sample, including biomolecule size, concentration, and structure. By controlling the size of the pores, their surface properties, the applied voltage and the solution conditions, one can tailor different nanopores to detect different types of biomolecules. Meanwhile, the nanopore sensing does not need biomolecule modification, labeling or surface fixation, so the technology can be used for detecting molecules and complexes with a wide range. ONT technology uses linear DNA molecules, which are typically one to hundreds of kilobases in length, but some can reach several megabases. ONT sequencing first ligates double stranded DNA molecules to sequencing adaptors preloaded with a motor protein which untwists the double stranded DNA and drives the negatively charged DNA through the well at a controlled rate with current. As DNA passes through the nanopore, it causes characteristic damage to the current flow, which is analyzed in real time to determine the base sequence in the DNA strand. The long-reading data can now be generated on any of three standard ONT platforms: minION, gridION and Promethion. Another type of reading generated by the ONT sequencing platform is an ONT ultra-long reading. These reads were first generated by Josh Quick, typically greater than 100 kb a long, but could be several megabases long.
Genomic Structural Variations (SVs) mainly include types of variations such as DNA deletions, insertions, and fragment duplications of large fragments on the genome. Studies have shown that SV is associated with a variety of complex genetic diseases such as cancer, autism, and neurodevelopmental disorders, and have been receiving attention in the fields of medicine and genetics in recent years. NGS is greatly limited in terms of SV detection due to read length limitations. Advances and popularity of long-reading long-genome sequencing technology have led to the continued discovery and study of a large number of structural variations, some of which are highly pathogenic, being increasingly validated. The method is based on a long-reading long-sequencing platform, can efficiently detect the SV, and can perform high-precision whole chromosome typing on the haploid SV.
Based on the long sequencing platform, the linkage information of SNP or other mutation can be better detected. Because the length of NGS sequencing read is shorter, most reads only have at most one mutation information, and the length of length sequencing can detect multiple mutation on the same reads, so that the method can be used for researching linkage of mutation such as SNP, SV and the like. Linkage information is critical to the judgment of a disease, for example, for recessive genetic disease, if two different loci of a gene are mutated, if the two mutations are located on the same chromosome, i.e., linked, then the gene on that chromosome will be disabled, and the other chromosome will have a normal copy, so the cell will not have a mutated phenotype; if these two mutations are located on different chromosomes, i.e., both alleles are mutated, they may be in a pathogenic state. In addition, linkage information helps to determine whether the genetic disease mutant gene is from the father or mother. Genomic imprinting refers to the occurrence of different clinical phenotypes due to differences in the relatedness of pathogenic genes (i.e., parent or maternal), some genes are transcriptionally active only from the father, genes from the mother are not expressed, and conversely, some genes are transcriptionally active only from the mother, and genes from the father are not expressed. At this time, whether the mutant gene is from the father or the mother can be judged whether the pathogenic gene is expressed or not, thereby judging the health state of the embryo.
Long-read long sequencing has the potential to directly acquire DNA/RNA modifications (without antibody or chemical treatment), and has important significance. Modification will alter the efficiency of nucleotide matching and SMRT sequencing calculates modifications of single nucleotide precision by detecting the time difference between different fluorescently labeled dntps/NTPs binding to the target nucleotide. At the same time, modifications will also alter the electrical signal of the nucleotide, while Nanopore sequencing calculates the carried modifications by detecting the electrical signal of the nucleotide through the Nanopore. Based on this principle, when the method is applied to a large number of initial samples, the apparent modification information can be retained as no amplification is required, and can be directly read through long-reading long sequencing. Thus, the method can realize detection and comparison of the apparent modification of the alleles in a large number of samples. Is not achievable by NGS sequencing. And linkage relationships between different types of genomic variations and apparent states can be explored.
The fragment information in step (4) comprises one or more of the following: 1) Fragment length information; 2) Fragment abundance information; 3) Heterozygous single nucleotide polymorphism information; 4) Genomic structural variation information including one or more of insertions, deletions, duplications, inversions, translocations; 5) Repeating sequence information including one or more of a short stroma element, a long terminal repeat element, a DNA repeat element, a simple repeat, a satellite dish, other repeat elements; 6) Genome copy number variation information; 7) Allele information; 8) Linkage of allele information; 9) Epigenetic information including DNA methylation and DNA methylolation.
The method can detect genome information of as low as single cells at the same time, has high sensitivity and high probability of detecting two alleles at the same time, and can analyze as few as single cells or cell nuclei. The method is named as a third generation single cell whole genome amplification method (Restriction fragments ligation-based genome amplification and third-generation sequencing, refresh-seq) based on restriction enzymes and fragment ligation, hereinafter abbreviated as Refresh-seq. Wherein the tagged linker is referred to as Refresh-seq (multiplexed).
The terms in the present invention:
restriction endonuclease (restriction endonuclease): an endonuclease capable of recognizing and cleaving a specific double-stranded DNA sequence in an organism, comprising a type I restriction enzyme and a type II restriction enzyme, the type I restriction enzyme catalyzing both methylation of a host DNA and hydrolysis of unmethylated DNA; while type II restriction enzymes only catalyze the hydrolysis of unmethylated DNA. Restriction enzymes are generally composed of the first letter of the genus name of the microorganism and the first two letters of the species name, the fourth letter representing the strain. For example, from Bacillus amylolique faciensThe restriction enzyme extracted from H is calledBamH, several enzymes of different specificities recognizing different base sequences obtained in the same strain of bacteria can be coded into different numbers, e.gHindII、HindIII、HpaI、HpaII, etc.
Homologous chromosome (homologous chromosomes): two chromosomes with the same length as the mitotic point position seen in the metaphase of the cell, or paired chromosomes seen in meiosis, one from the parent and one from the mother; their morphology, size and structure are generally the same.
Allele (allele): genes located at the same position on a pair of homologous chromosomes and controlling different morphologies of the same trait.
Allele information: allele information referred to in this patent includes all types of variation at alleles on homologous chromosomes, including SNPs, SVs, repeat information (short-locus elements, long-terminal repeat elements, DNA repeat elements, simple repeats, satellite foci), epigenetic information, and the like on alleles.
Epigenetic information: epigenetic modification refers to the regulation of gene expression, by chemical modification of DNA and proteins on chromosomes, thereby affecting gene expression. Such modifications can affect multiple layers of gene transcription, splicing, stability, translation, nucleosome assembly, and chromatin structure, thereby affecting the physiological and pathological processes of the cell, as well as the phenotype of the offspring. Common epigenetic modifications include DNA methylation, histone modifications, non-coding RNAs, RNA modifications, and chromatin remodeling, among others.
DNA methylation: DNA methylation refers to the addition of a methyl group to a DNA molecule, thereby altering the chemical nature and structure of the DNA and affecting gene expression. Typically on CpG dinucleotides, can inhibit gene expression. This modification is catalyzed by DNA methyltransferases (DNMTs). There are three major DNMTs in humans, DNMT1, DNMT3A and DNMT3B, respectively. Among them, DNMT1 is mainly responsible for maintaining methylation patterns, while DNMT3A and DNMT3B are responsible for new methylation.
DNA methylolation: DNA methylolation (DNA Hydroxymethylation) is the oxidation of 5-methylcytosine (5 mC) in DNA methylation to form 5-hydroxymethylcytosine (5 hmC) under the catalysis of a TET family enzyme. 5hmC has very important biological functions, and 5hmC not only participates in chromosome reprogramming and transcriptional control of gene expression, but also plays an important role in the DNA demethylation process. And studies showed that 5hmC is closely related to the occurrence of tumors.
And (3) joint: two DNA molecules or two ends of one DNA molecule can be paired by enzyme digestion and then covalently connected by ligase.
Magnetic bead fragment screening: the magnetic beads can be interacted with DNA under a certain condition to be adsorbed together, in a PEG and NaCl solution with higher concentration, PEG deprives water on a hydration layer outside DNA molecules, so that the hydration layer is damaged, the DNA molecules are aggregated and precipitated, negatively charged phosphate groups are exposed, a salt bridge is formed by sodium ions and carboxyl groups on the surface of the magnetic beads, or called as a bridge, so that the DNA is adsorbed on the surface of the magnetic beads, the longer the DNA is, the more the negatively charged phosphate groups are exposed on the surface, the more negative electricity is carried on the whole molecule, the easier the adsorption to the magnetic beads is, and the recovery can be realized only by PEG and NaCl with lower concentration; the shorter the DNA, the higher concentration of PEG and NaCl is needed, the more thoroughly the hydration layer on the surface is destroyed, and enough negatively charged phosphate groups are exposed and can be adsorbed by the magnetic beads, so that the phosphate groups are recovered; therefore, by controlling the concentration of PEG and NaCl and the amount of magnetic beads, DNA fragments of different lengths can be selected.
The invention has the beneficial effects that: the invention combines the restriction enzyme cutting and connecting strategy with the third generation single cell genome sequencing platform for the first time, and develops a long-reading genome sequencing technology Refresh-seq. Compared to SMOOTH-seq based on the random cut principle, refresh-seq increases genome coverage and uniformity. It increases the probability of simultaneous detection of both alleles of a diploid cell, and gives a considerable probability of detection even at very shallow sequencing depths, and thus has great potential for medical applications, such as pre-implantation genetic diagnosis. The method can be regulated according to different restriction enzymes to meet different requirements. In general, refresh-seq utilizes a restriction enzyme that recognizes the sequence of 6 bp (e.g.EcoR I and is provided withSacI) Relatively high genome coverage can be obtained during cutting, and the method is the first choice for sequencing whole genome of single cells; restriction enzymes using 8 bp recognition sequences (e.gAsiS I) the Refresh-seq can enrich reads into specific genomic regions (FIG. 1) with equal sequencing amounts, thereby enabling simplified genomic sequencing. Refresh-seq is based on a third generation sequencing platform, and can effectively detect structural variations and repetitive elements. Refresh-seq also has limitations. Because of the efficiency of the ligation reaction, the amplicon length is only 2-3 kb, much shorter than the total length of about 6 kb SMOOTH-seq. Refresh-seq therefore cannot capture very long insertion events due to the limitation of its amplicon length range. The library construction of Refresh-seq can be completed in one day, with the library construction cost of a single tube version being 20 yuan/cell and the library construction cost of a multiple tube version being 12 yuan/cell. The present invention successfully uses the Refresh-seq technique to study meiosis in single germ cells in male and female B6D2F1 mice. Average coverage of sperm, PG oocytes and PB2 by sequencing at 0.1-0.3 Xdepth About 5%, the average coverage of oocytes and PB2 was 7.7%. This is consistent with coverage of MALDBAC-amplified sperm and oocytes. The inventors obtained high resolution genetic maps of male and female meiotic recombination at low sequencing depth and revealed female and male differences. The Refresh-seq has the characteristics of high uniformity and low allele loss rate, and has good application prospect in aneuploid sperm and egg cell screening. It is also advantageous in detecting SVs of highly repetitive or low complexity genomic regions due to its longer read length compared to the NGS platform. The inventors successfully performed holochromosomal hetSV typing of sperm cells and female haploid germ cells using Refresh-seq data, respectively, and analyzed the repeat element characteristics of these SVs.
Drawings
Fig. 1 is a library-building flowchart of embodiment 1.
FIG. 2 is a graph showing the effect of a test for enrichment of long genomic DNA fragments using PCR.
FIG. 3 is a simulated view of an enzyme slice segment;
in the figure a-EcoR I the distribution of enzyme sections is simulated; b-SacI, simulating the distribution condition of enzyme sections; c-AsiS I the distribution of the enzyme sections was simulated.
FIG. 4 is a chart of the test for Refresh-seq (multiplexed) cross-contamination.
FIG. 5 shows the case where both alleles are detected simultaneously.
Sequencing amount of each HG002 cell and proportion of heterozygous SNP in the panel a; b-quantification of the proportion of heterozygous SNP of the three methods at a depth of 0.25 Xsequencing; c-allele loss rate calculated for the region covering 5 reads or more.
FIG. 6 shows the performance of Refresh-seq on different cell lines using different endonucleases;
the graph a-shows the Refresh-seq [ ]EcoR I/SacI) And Refresh-seq (multiplexed)%EcoR I/SacI/AsiS I) sequencing amount and genome coverage of expanded HG001 cells, wherein the SMOOTH-seq data is from HG002 cell line; b-display Refresh-seqEcoR I/SacI) And Refresh-seq (multiplexed)(EcoR I/SacI/AsiS I) sequencing amount of expanded HG001 cells and heterozygous SNP detection rate, wherein the smoothh-seq data is from HG002 cell line; c-display Refresh-seqEcoR I/SacI)、Refresh-seq(multiplexed)(EcoR I/SacI/AsiS I) sequencing amount and genome coverage of SMOOTH-seq expanded HG002 cells; d-display Refresh-seqEcoR I/SacI)、Refresh-seq(multiplexed)(EcoR I/SacI/AsiS I) sequencing amount and heterozygous SNP detection rate of SMOOTH-seq amplified HG002 cells; e-use of SMOOTH-seq and different restriction endonucleasesEcoR I/SacI/AsiS I) HG002 cells were subjected to the sequencing depth of Refresh-seq and Refresh-seq (multiplexed).
FIG. 7 is the application of Refresh-seq to sperm;
In the figure, a-hybrid mouse sperm meiosis process schematic diagram and single sperm Refresh-seq are obtained, mature sperm of B6D2F1 (B6XDBA F1 heterozygote) subjected to meiosis homologous recombination are obtained, and after flow sorting, each single sperm is subjected to Refresh-seq; b-displaying the sequencing data quantity and genome coverage rate of each sperm, selecting the sperm with the genome coverage rate of more than 1% for subsequent analysis, and marking boundaries by red dotted lines; c-showing the sequencing data amount and genome coverage of each sperm cell by quality control, fitting linear regression in a 95% confidence interval; d-Refresh-seq amplified single sperm average read length distribution; e-distribution of the number of hetsnps covered in each sperm; f-identifying a diploid cell (excluding the most frequent autosomes at this time) by a discontinuity score for each sperm, the discontinuity score of the diploid cell being much higher than that of a haploid sperm, the red dotted line marking an inflection point beyond which cells are marked as potential diploid cells; g-the number of reads on the X and Y chromosomes is used to distinguish X sperm from Y sperm.
FIG. 8 is a Refresh-seq identification of aneuploid sperm;
panel a-discontinuity scoring of all chromosomes in each sperm, diploid cells labeled D1-D12, aneuploid sperm cells labeled A1-A7; b-h-non-continuous scoring of specific chromosomes in each sperm. Diploid cells have higher non-continuity scores on most chromosomes, aneuploid sperm have higher non-continuity scores only on isolated abnormal chromosomes; i-comparing the proportion of hetSNPs in 19 autosomes of 7 aneuploid sperm cells to a gold standard, blue dots indicating loss of dye monomer, red dots indicating increase of dye monomer, size of dots indicating deviation from average ratio, verified aneuploid chromosomes highlighted with rectangles, sperm A7 more likely to be a non-uniformly amplified sample (technical error) than true aneuploid; the ratio of two alleles is covered in j-sperm cells. The heat map shows the heterozygosity of 19 autosomes from 12 diploid cells (2N), 7 aneuploidy sperm cells (1 n±m) and several haploid sperm cells (1N).
FIG. 9 is the identification of structural variations and chromosome typing by Refresh-seq;
panel a-true positive structural variability distribution for each sperm; b-length distribution of the identified true positive SVs, local peaks of SV length being indicated by orange dashed lines; accuracy of c-Refresh-seq detected SVs (deletions and insertions), percentage of true positives of SVs of different numbers of supporting cells; d-accuracy of whole genome typing of SV on a chromosome scale; e-recall rate of correctly typed SVs; f-proportion of different types of elements of the typed deletion event; the ratio of the different types of elements of the insertion event of g-typing.
FIG. 10 is a view of a Refresh-seq for use with an egg cell, polar body;
in the figure, a-hybrid female mice are sampled and shown as a schematic diagram, MII oocytes subjected to meiosis homologous recombination of B6D2F1 are fertilized with DBA male mice or parthenogenesis activated to induce PB2 extrusion, haploid PB2, parthenogenesis activated egg cells and diploid PB1, MII and fertilized eggs are obtained, and the haploid PB2, the parthenogenesis activated egg cells and the diploid PB1, the MII and the fertilized eggs are separated through capillaries; b-number and ploidy of different cell types; c-displaying sequencing data and genome coverage for each cell; d-cross-number distribution of haploid female germ cells; resolution of e-female haploid cell crossover assay; the cross-position density map of all chromosomes of f-male and female mice shows the cross-density from centromere to telomere.
Detailed Description
The present invention will be described more fully hereinafter in order to facilitate an understanding of the present invention. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Example 1
The simulation of fragment length was performed based on human genome sequences and selected restriction enzyme recognition sequences. As shown in the figure 3 of the drawings,Ecor I and is provided withSacI mimic cut fragments are mostly between 1-3 kb in length and are therefore suitable for whole genome amplification and sequencing.AsiS I recognition sequence 8 bp is more sparsely distributed across the genome and therefore when usedAsiS I simplified genomic sequencing can be achieved when refreshing-seq library construction, with deep sequencing of specific regions with the same amount of data.
The specific steps of the Refresh-seq are shown in FIG. 1: in this example, two human cell lines (HG 002 and HG 001) were used, and after washing the cells three times with PBS containing 0.1% BSA, single cells were sorted by a mouth pipette or flow cytometer and placed into eight consecutive PCR tubes containing 2.5. Mu.L of lysis buffer. Digesting the histone at 50 ℃ and 3-h, and inactivating protease at 70 ℃ for 30 min; the cell lysates were 10 mM Tris-EDTA (1M Tris+0.1M EDTA), 1 mg/mL Qiagen protease, 0.3% triton X-100, 20mM KCL, and 15 mM DTT.
After single cell lysis, single cell gDNA was digested with 0.5. Mu.L of 10 Xrestriction buffer, 1.9. Mu.L of water and 0.1. Mu.L of restriction enzyme. The reaction procedure is regulated according to the restriction enzyme used. For the followingEcoRI (NEW ENGLAND BioLabs, cat#R3101L) andSaci (NEW ENGLAND BioLabs, cat#R3156S), digested for 15 min at 37℃and then inactivated at 65℃for 20 min. For the followingAsiSI (NEW ENGLAND BioLabs, cat#R0630S), digested for 1 hour at 37℃and then inactivated at 80℃for 20 minutes. End repair and addition of a (Kapa Biosystems, kapa HyperPrep kit,cat#kk 8504), ligating dsDNA adaptors (NEBNext Singleplex Oligos for Illumina) to the end-plus-a molecules, and then adding USER enzyme (uracil specific excision reagent, NEW ENGLAND BioLabs, cat#m5505L) to cut the circular adaptor into "Y" adaptors. Each sample was purified with 1 XAMPure XP (BECKMAN COULTER, cat#A 63882) using Barcode-P5 (GCTA- [24 bp P5-Barcode 81-96)]- TACACTCTTTCCCT
ACACGACGCTCTTCCGATCT) and Barcode-P3 (ATCG- [24 bp P3-Barcode 1-24] -GACTGGAGTTCAGACGTGTGCT). The PCR procedure was 98℃45 s, 98℃15 s, then 98℃15 s, 65℃30s, 72℃5 min, 20 cycles. Thereafter, the cells were purified twice with 0.7×ampure XP (haploid cells were purified twice with 0.65×ampure XP). Purified amplicon was quantified using equalbert1× dsDNA HS Assay Kit. And mixing samples according to sequencing requirements, and then sequencing on a machine.
For Refresh-seq (multiplexed), this example uses a barcode-attached linker to increase flux. Paired single stranded oligonucleotides were first joined into "Y" junctions, NEB same-A:5' phosphorylation (GATCGGAAGAGCACACGTCTGAACTCCAGTC and Barcoded-B: ACACACT)
CTTTCCCTACACGAC- [24 bp adapter-barcode 31-46] -GCTCTTCCGATC. Mu.M in mother liquor is dissolved in water, 1:1 mixture is cooled to an anal to give barcoded adaptors at a concentration of 50. Mu.M. The restriction endonuclease-disrupted genomic DNA was subjected to end repair and A addition and then ligated to a barcoded adaptor, and after 1 XAMPure XP purification of cells ligated to different barcoded adaptors, amplified using Common-P5 (ACACTCTTTCCCTACACGAC) and Barcode-P3 (ATCG- [24 bp P3-Barcode 1-24] -GACTGGAGTTCAGACGTGTGCT). After that, purification was performed twice with 0.7×ampure XP. Purified amplicon was quantified using equalbert1× dsDNA HS Assay Kit. Mixing samples according to sequencing requirements, and then sequencing on a machine (figure 2).
TABLE 1 primer sequences involved in Refresh-seq
* Represents thiophosphoric acid.
After sequencing the Refresh-seq library using ONT nanopore sequencing technology and obtaining the raw sequencing data, the inventors' basic process of the data is to align reads to the reference genome, which comprises the following steps:
The raw data generated by ONT sequencing is converted to fastq format. According to the Refresh-seq double-ended barcode library structure, this example uses nanoplexer v0.1 to disassemble each single cell twice consecutively, and uses Cutadapt v3.4 to remove the linker sequences at both ends of reads and reads with a length less than 500 bp. These reads were then aligned to either human reference genome hg38 or mouse reference genome mm10 by minimap2 v 2.24. This example filters reads with mapping quality less than 30 with samtools v1.14 and removes PCR duplicates.
Cross-contamination assessment: to evaluate the cross-contamination of Refresh-seq (multiplexed), this example used the mixed genome localization strategy of hg38 and mm10 human mice, the mixed genome being indexed by minimap2, the parameter being '-I10G'. The number and proportion of reads per single cell aligned to the mm10 and hg38 genomes were calculated in this example. The reference genome species that are mainly aligned (greater than 90%) are judged to be the single cell species. Cross-contaminated cells were judged if the ratio of reads to the minor species genome was greater than 10%. The results are shown in FIG. 4: as a result, the quality-controlled cells were not judged as human-mouse mixed cells, indicating that Refresh-seq (multiplexed) cross-contamination was very small.
SNP site heterozygosity analysis: this example uses whotshap v.1.5 to calculate the likelihood of all three genotypes (0/0, 0/1, 1/1) at a given heterozygous SNP site and outputs them into the VCF file along with genotype predictions. The run command "whatshap genotype-reference ref. Fasta-o genetyed. Vcf derivatives, vcf reads. Bam". The variants. Vcf file is the HG002 or HG001 SNP benchmark set downloaded from the GIAB.
The results are shown in FIG. 5, which is based on the principle of amplification of Tn5 randomly cleaved genomic fragments compared with SMOOTH-seqEcoR I Refresh-seq has better amplification uniformity, higher genome-wide coverage and higher singlenessDouble allele detection rate of nucleotide diversity sites. At a sequencing depth of 0.25×Refresh-seq detected 1.64% of heterozygous SNPs, 5 times that of SMOOTH-seq (0.33%). Among more than 5 reads-covered heterozygous SNP sites, the average biallelic capture rate of Refresh-seq was 62%, which is significantly higher than the 10% capture rate of SMOOTH-seq.
FIG. 6 shows that Refresh-seq and Refresh-seq (multiplexed) are shown to be consistent on HG001 cells and HG002 cells. UsingEcoR I and is provided withSacI Refresh-seq and Refresh-seq (multiplexed) have a genome coverage higher than SMOOTH-seq and more heterozygous SNPs are detected. But use it AsiRefresh-seq (multiplexed) of S I gave a deeper sequencing depth with the same sequencing amount.
The above experiments demonstrate the universality and advantages of the Refresh-seq technique, which has better genome coverage and the probability of simultaneous detection of both alleles. And reads enrichment can be achieved using the Refresh-seq of long recognition sequence endonucleases, thereby simplifying genomic sequencing.
Example 2Refresh-seq technology applied to Single sperm sequencing
In the present embodiment use is made ofEcoR I Refresh-seq was performed and the specific library construction procedure was identical to example 1, except that the library was purified twice with 0.65 XAMPure XP purification. 676 sperm cells were amplified with Refresh-seq (single tube version) and 152 sperm cells were amplified with Refresh-seq (multiplexed). Since there is no difference in the detection of crossover events between Refresh-seq and Refresh-seq (multiplexed), there is no subsequent distinction between different versions of Refresh-seq.
As shown in FIG. 7, the results of the experiment are that Refresh-seq can obtain sufficient genome coverage at low sequencing amounts. 700 of 828 sperm passed quality control with a genome coverage of greater than 1% in 0.1-0.3 Xdepth sequencing (FIG. 7 b). Genome coverage increased approximately linearly with increasing sequencing data, with an average coverage of about 5% at 0.1-1 Gb sequencing amounts (fig. 7 c). The average reads length was 1.9 kb (fig. 7 d), and the average reads number per sperm was 143,914. Each sperm detected up to 250,000 hetSNPs on average (FIG. 7 e), with an accuracy of SNP detection exceeding 98.9%. By defining a discontinuity score (i.e., the frequency with which consecutive SNPs are shifted between parent and parent sources), refresh-seq can efficiently screen contaminating diploid cells and make accurate X-sperm and Y-sperm determinations. Of the 700 sperm cells that were quality controlled, there were 688 haploid sperm cells and 12 contaminated diploid cells (fig. 7 f). Diploid cells were labeled D1 to D12 (fig. 7 f) and the authenticity of these 12 diploid cells was verified using a profile. Then, X sperm cells and Y sperm cells were distinguished according to the number and proportion of reads mapped to the X and Y chromosomes (FIG. 7 g). A total of 344X sperm cells and 329Y sperm cells were identified, of which 8 sperm cells were indistinguishable (sex chromosome increase or decrease), with X sperm and Y sperm ratios approaching 1:1, conforming to mendelian's law of separation.
Example 3 application of Refresh-seq technology to the identification of aneuploidy
Because of the greater ability of Refresh-seq to detect both alleles at the same time, aneuploidy prime was first performed by calculating the discontinuity score for each chromosome. Then, the heterozygosity of the SNP locus is utilized to confirm a chromosome increase event, if one chromosome is subjected to copy number increase, the situation that two alleles are detected at the same time by the same SNP locus often occurs in one chromosome of heterozygous offspring, and the event of chromosome increase is confirmed according to the fact that the steep increase of the number of the two allele events is detected at the same time in a1 Mb interval; in the case of a chromosome deletion, the number of detectable SNP sites is suddenly reduced as compared with normal, and therefore, the occurrence of a chromosome deletion can be determined by increasing or decreasing the number of SNPs detected at 1 Mb.
As shown in FIG. 8, the Refresh-seq technique allows the identification of aneuploidy by a variety of methods. The genome of the haploid sperm only contains one of two sets of chromosomes from parents, when a chromosome increasing event occurs, the chromosomes simultaneously have two different sets of genes from parents, and ideally, each SNP locus can simultaneously detect genotypes of the parents, however, due to the ubiquitous phenomenon of allele loss, most SNP loci can only detect one genotype, the genotypes of the SNP loci in the chromosome increasing interval randomly alternate with the genotypes of the parents, and the frequency difference of the alternate genotypes of the parents exists between the chromosome increasing interval and the haploid interval, namely the discontinuity score is obviously increased. Thus, the first method screens single sperm cells A1, A2, A3, A4 and A6 for increased chromosomal occurrence by calculating the discontinuity score for each chromosome (FIGS. 8 a-h). In the case of random amplification, chromosomal addition means that more DNA fragments can be captured, chromosomal deletion means that the captured DNA fragments are reduced, there is more coverage of sequencing reads in the positions that appear as chromosomal addition in the sequencing data, more SNPs can be detected, and fewer reads in the positions of chromosomal deletion, fewer SNPs can be detected. The second method is therefore able to learn the events of chromosome increase and decrease by the deviation of the SNP number from the mean of all other chromosome SNP numbers (fig. 8 i). Consistent with the principle of method one, method three has the advantage of capturing two alleles simultaneously using Refresh-seq, and more double-genotype SNP sites can be detected in chromosomes where there is an increase in chromosome occurrence, i.e., an increase in heterozygosity, and shows a decrease in heterozygosity when the chromosome is lost (fig. 8 j). The aneuploid chromosomes found by these three methods are mutually verified and can be verified by chromosome profiling and CNV. Finally, 6 sperm with autosomal aneuploidy were found, wherein A1, A3 and A6 were increased in chromosome, A5 was deleted in chromosome, and A4 and A6 were increased and deleted simultaneously in chromosome (chr 3). Sperm A7 is more likely to be a non-uniformly amplified sample (technical error) than a true aneuploidy.
Example 4 application of Refresh-seq technology to the identification of structural variations in sperm
In this embodiment, the Refresh-seq uses a high-sensitivity, fast signaling software cutSV suitable for third generation sequencing data to detect Structural Variations (SVs) of long-reading long data generated by Nanopore. The parameters were set to default parameters specific to Nanopore and the minimum supported read number was set to 1 to achieve single cell single molecule resolution. In the analysis of multi-cell support accuracy of structural variation detection, SVs of all cells are first combined using surfivor, and SV accuracy of different cell supports is calculated according to formula accuracy = true positive/(true positive + false positive), and the reference set uses a large number of single sperm cell initiated third generation Nanopore sequencing data. In haplotyping of SVs, a 0/1 matrix is firstly established according to the parental genotype condition of SVs in a reference set in a single sperm, wherein 0 represents a C57 parent type, 1 represents a DBA parent type, whether the SVs in the reference set are consistent with the SVs in the reference set needs to meet the condition that the SVs are similar in length to the SVs in the reference set and are located within +/-100 bp, the genotypes consistent with the SVs in the reference set are marked if the SVs are consistent with the SVs in the reference set, and the genotypes inconsistent with the SVs in the reference set are not consistent with the SVs. The generated matrix is filtered by using a tool 'hapiFrame selection' in R package Hapi, wherein less than 5 SVs supported by sperms are filtered, and then 100 cells with the largest SVs are selected as a precursor frame for subsequent typing. To improve typing accuracy, it is necessary to perform HMM calibration on the precursor frame, and if more than half of the cells in each position support that an error occurs, the genotype is inverted. Thus, a basic framework is formed, and the deleted genotypes are iteratively filled in with reference to other cells using the 'inputtionFun 1' function. And then, carrying out haploid primary typing by using a 'hapi phase' function, and assembling the haplotype with high resolution and high consistency by using a 'hapi assembly' after 'hapi Block MPR' calibration.
As a result, as shown in FIG. 9, refresh-seq was able to identify structural variations in sperm, with an average of 973 structural variation events detected per cell (FIG. 9 a). In all detected structural variation length distributions 0, two peaks appear around 180 bp and 6 kb-7 kb, corresponding to the B1 element (equivalent to Alu in humans) and the LINE1 element, respectively (fig. 9B). The accuracy of the structural variation events detected by Refresh-seq can reach 80% with more than three cells (FIG. 9 c), while the accuracy of chromosome-scale haplotyping is as high as 98% (FIG. 9 d), and genome elements were successfully annotated for these typed structural variations (FIGS. 9 e-g).
EXAMPLE 5 application of Refresh-seq technology to identification of egg cells, polar bodies
In this example, the egg cells and polar bodies were obtained by fertilization and parthenogenesis, usingEcoR I Refresh-seq was performed and the specific pooling procedure was consistent with example 1. A total of 185 second poles, 87 parthenocarpic activated egg cells, 132 second poles, 33 second meiotic phase cells and 26 zygote cells were collected. Wherein the second diode and parthenocarpic activated egg are haploid cells, and an average of 14 crossover events are detected.
The experimental results are shown in FIG. 10, and in addition to the sperm cells applied to males, refresh-seq also gave better results in female germ cells. Refresh-seq can also achieve adequate genome coverage at low sequencing amounts, where diploid cells have higher genome coverage with equal sequencing amounts, and genome coverage increases with increasing sequencing data, indicating saturation of coverage is not achieved (FIG. 10 c). The second diode and parthenocarpic activated egg were able to detect an average of 14 crossover events (fig. 10 d), ranging from 6 to 25 single cell crossover events. The median of crossover resolution is 283 kb, so Refresh-seq can also be used to obtain high resolution crossover data in shallow sequencing in female haploid germ cells. The cross-distribution density plot shows that female mice have a relatively small cross-distribution near the centromeres relative to near the subterminals, while more cross-distribution near the subterminals, and that females are less enriched near the subterminals relative to males (fig. 10 f).
The above examples illustrate only a few embodiments of the invention, which are described in detail and are not to be construed as limiting the scope of the invention. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the invention, which are all within the scope of the invention. Accordingly, the scope of protection of the present invention is to be determined by the appended claims.

Claims (11)

1. A method for detecting genomic information based on restriction enzymes, comprising the steps of:
(1) Cutting the genome of the sample by adopting restriction enzyme to obtain genome DNA fragments with different lengths;
(2) Carrying out long genome DNA fragment enrichment on the genome sample which is amplified or not amplified;
(3) Sequencing the enriched long genome DNA fragment on a long reading and long sequencing platform;
(4) And carrying out computer analysis on the data obtained by sequencing, and obtaining the sequence information of the sample in the genome region through comparing and calculating by replying the long genome DNA fragment to the genome region.
2. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the restriction enzymes are those recognizing specific sequences of 4-10 bp, preferably 6 bp, 8 bp, more preferably the restriction enzymes are selected for higher coverageEcoR I、SacI, selecting when the enrichment effect is betterAsiS I; the goal is to obtain a higher coverage, the cut DNA fragments have similar lengths and are concentrated between 1-3 kb.
3. The method of restriction enzyme based detection of genomic information according to claim 1, wherein the genomic sample is episomal DNA, DNA released by cells in culture, one or more cells or nuclei, viruses, mitochondria or chloroplasts.
4. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the step (2) performs end repair, A addition, and adaptor ligation on genomic DNA fragments, performs PCR amplification, and enriches long genomic DNA fragments after amplification.
5. The method for detecting genomic information based on restriction enzymes according to claim 4, wherein the linker used in the amplification in the step (2) is a linker without a bar code or a linker with a bar code; each PCR tube is independently carried out in the subsequent purification and library establishment process by using the joint without the bar code, and the joints of the 5 'end and the 3' end are arranged during PCR amplification; and (3) using the joint with the bar code, mixing and purifying sample tubes with different bar codes after the joint is connected, amplifying in one tube, and connecting the 3' end joint on an amplified band.
6. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the sequencing platform in step (3) is a long-read long sequencing platform, optionally a Nanopore sequencing platform or a PacBio sequencing platform.
7. The method for detecting genomic information based on restriction enzymes according to claim 1, wherein the restriction enzymes selected in the step (1) are selected by performing an enzyme fragment simulation on the genome of the target species, and deducing the distribution of genome fragments after the enzyme cleavage.
8. The method for restriction enzyme based detection of genomic information according to claim 1, wherein the long genomic DNA fragment in step (2) refers to a fragment having a length of more than 700 nucleotide pairs, preferably a fragment having a length of more than 1000 nucleotide pairs.
9. The method of restriction enzyme-based detection of genomic information according to claim 1, wherein the amplification in step (2) is a polymerase chain reaction, and the long genomic DNA fragments are enriched using the polymerase chain reaction and fragment screening, which is a running film fragment screening or a magnetic bead fragment screening.
10. The method for restriction enzyme-based detection of genomic information according to claim 1, wherein the sequence information in step (4) comprises one or more of the following: 1) Fragment length information; 2) Fragment abundance information; 3) Heterozygous single nucleotide polymorphism information; 4) Genomic structural variation information including insertions, deletions, duplications, inversions, translocations; 5) Repeating sequence information including a short stroma element, a long terminal repeating element, a DNA repeating element, a simple repetition, and a satellite oven; 6) Genome copy number variation information; 7) Allele information; 8) Linkage of allele information; 9) Epigenetic information including DNA methylation and DNA methylolation.
11. The method for detecting genomic information based on restriction enzymes according to claim 10, wherein the allele information is a mutation type at an allele on a homologous chromosome, including SNP, SV, repeat information, epigenetic information on the allele; the repeated sequence information comprises a short scattered seat element, a long terminal repeated element, a DNA repeated element, simple repetition and a satellite cooker; the epigenetic information includes DNA methylation and DNA methylolation.
CN202410122596.6A 2024-01-30 2024-01-30 Method for detecting genome information based on restriction enzyme Pending CN117737216A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410122596.6A CN117737216A (en) 2024-01-30 2024-01-30 Method for detecting genome information based on restriction enzyme

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410122596.6A CN117737216A (en) 2024-01-30 2024-01-30 Method for detecting genome information based on restriction enzyme

Publications (1)

Publication Number Publication Date
CN117737216A true CN117737216A (en) 2024-03-22

Family

ID=90281614

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410122596.6A Pending CN117737216A (en) 2024-01-30 2024-01-30 Method for detecting genome information based on restriction enzyme

Country Status (1)

Country Link
CN (1) CN117737216A (en)

Similar Documents

Publication Publication Date Title
KR102427319B1 (en) Determination of base modifications of nucleic acids
US20190024141A1 (en) Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers
US11319589B2 (en) Methods of determining the presence or absence of a plurality of target polynucleotides in a sample
CA3096668A1 (en) Compositions and methods for cancer or neoplasia assessment
CN108624668A (en) The method of phase is determined for genome assembling and haplotype
CN103987857A (en) Sequencing small amounts of complex nucleic acids
CN103088120A (en) Large-scale genetic typing method based on SLAF-seq (Specific-Locus Amplified Fragment Sequencing) technology
CN116083605B (en) Genetic marker system containing 67 high-efficiency autosomal micro haplotypes and detection primer and application thereof
AU2021359279B2 (en) Nucleic acid library construction method and application thereof in analysis of abnormal chromosome structure in preimplantation embryo
Laufer et al. Applications of advanced technologies for detecting genomic structural variation
CN117737216A (en) Method for detecting genome information based on restriction enzyme
US20230235320A1 (en) Methods and compositions for analyzing nucleic acid
US20220136043A1 (en) Systems and methods for separating decoded arrays
Choo Loose Ends in Cancer Genome Structure
Payne Scalable Methods for In Situ Genomics
WO2022112751A1 (en) Methods for the accurate detection of mutations in single molecules of dna
CN105602937A (en) Methods for nucleic acid mapping and identification of fine-structural-variations in nucleic acids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination