CN117965748A - Identification method for screening synegg twins based on SNV and INDEL - Google Patents
Identification method for screening synegg twins based on SNV and INDEL Download PDFInfo
- Publication number
- CN117965748A CN117965748A CN202410029423.XA CN202410029423A CN117965748A CN 117965748 A CN117965748 A CN 117965748A CN 202410029423 A CN202410029423 A CN 202410029423A CN 117965748 A CN117965748 A CN 117965748A
- Authority
- CN
- China
- Prior art keywords
- snv
- indel
- library
- twins
- sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 238000012216 screening Methods 0.000 title claims abstract description 16
- 238000012163 sequencing technique Methods 0.000 claims abstract description 30
- 238000012070 whole genome sequencing analysis Methods 0.000 claims abstract description 13
- 238000000746 purification Methods 0.000 claims abstract description 11
- 238000012408 PCR amplification Methods 0.000 claims abstract description 8
- 210000005259 peripheral blood Anatomy 0.000 claims abstract description 8
- 239000011886 peripheral blood Substances 0.000 claims abstract description 8
- 238000007480 sanger sequencing Methods 0.000 claims abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 4
- 239000012634 fragment Substances 0.000 claims description 49
- 239000000203 mixture Substances 0.000 claims description 20
- 235000013601 eggs Nutrition 0.000 claims description 16
- 238000006243 chemical reaction Methods 0.000 claims description 13
- 239000000047 product Substances 0.000 claims description 12
- 238000002156 mixing Methods 0.000 claims description 9
- 238000000246 agarose gel electrophoresis Methods 0.000 claims description 6
- 239000002585 base Substances 0.000 claims description 6
- 230000035772 mutation Effects 0.000 claims description 4
- 239000012264 purified product Substances 0.000 claims description 4
- 238000012795 verification Methods 0.000 claims description 4
- 239000011324 bead Substances 0.000 claims description 3
- 238000007865 diluting Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000003205 genotyping method Methods 0.000 claims description 3
- 238000003908 quality control method Methods 0.000 claims description 3
- 239000003513 alkali Substances 0.000 claims description 2
- 239000011159 matrix material Substances 0.000 claims description 2
- 210000001082 somatic cell Anatomy 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 11
- 238000010200 validation analysis Methods 0.000 abstract description 2
- 108020004414 DNA Proteins 0.000 description 37
- 238000005516 engineering process Methods 0.000 description 8
- 108020005196 Mitochondrial DNA Proteins 0.000 description 6
- 230000000392 somatic effect Effects 0.000 description 4
- LSNNMFCWUKXFEE-UHFFFAOYSA-M Bisulfite Chemical compound OS([O-])=O LSNNMFCWUKXFEE-UHFFFAOYSA-M 0.000 description 3
- 238000011529 RT qPCR Methods 0.000 description 3
- 230000003321 amplification Effects 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 238000007403 mPCR Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000003199 nucleic acid amplification method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 102000053602 DNA Human genes 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008774 maternal effect Effects 0.000 description 2
- 230000011987 methylation Effects 0.000 description 2
- 238000007069 methylation reaction Methods 0.000 description 2
- 238000012164 methylation sequencing Methods 0.000 description 2
- 108090000623 proteins and genes Proteins 0.000 description 2
- 238000011002 quantification Methods 0.000 description 2
- 238000010206 sensitivity analysis Methods 0.000 description 2
- GUAHPAJOXVYFON-ZETCQYMHSA-N (8S)-8-amino-7-oxononanoic acid zwitterion Chemical compound C[C@H](N)C(=O)CCCCCC(O)=O GUAHPAJOXVYFON-ZETCQYMHSA-N 0.000 description 1
- 108700028369 Alleles Proteins 0.000 description 1
- 108091029430 CpG site Proteins 0.000 description 1
- 230000005778 DNA damage Effects 0.000 description 1
- 231100000277 DNA damage Toxicity 0.000 description 1
- 230000007067 DNA methylation Effects 0.000 description 1
- 238000012270 DNA recombination Methods 0.000 description 1
- 230000008265 DNA repair mechanism Effects 0.000 description 1
- KCXVZYZYPLLWCC-UHFFFAOYSA-N EDTA Chemical compound OC(=O)CN(CC(O)=O)CCN(CC(O)=O)CC(O)=O KCXVZYZYPLLWCC-UHFFFAOYSA-N 0.000 description 1
- 108010000912 Egg Proteins Proteins 0.000 description 1
- 102000002322 Egg Proteins Human genes 0.000 description 1
- 101001122597 Homo sapiens Ribonuclease P protein subunit p20 Proteins 0.000 description 1
- 108091093105 Nuclear DNA Proteins 0.000 description 1
- 238000010222 PCR analysis Methods 0.000 description 1
- ZYFVNVRFVHJEIU-UHFFFAOYSA-N PicoGreen Chemical compound CN(C)CCCN(CCCN(C)C)C1=CC(=CC2=[N+](C3=CC=CC=C3S2)C)C2=CC=CC=C2N1C1=CC=CC=C1 ZYFVNVRFVHJEIU-UHFFFAOYSA-N 0.000 description 1
- 102100028674 Ribonuclease P protein subunit p20 Human genes 0.000 description 1
- 101100495925 Schizosaccharomyces pombe (strain 972 / ATCC 24843) chr3 gene Proteins 0.000 description 1
- 239000011543 agarose gel Substances 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000010100 anticoagulation Effects 0.000 description 1
- 238000003149 assay kit Methods 0.000 description 1
- 230000037429 base substitution Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000001574 biopsy Methods 0.000 description 1
- 238000005251 capillar electrophoresis Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 239000003153 chemical reaction reagent Substances 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 230000009089 cytolysis Effects 0.000 description 1
- 238000012350 deep sequencing Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000010790 dilution Methods 0.000 description 1
- 239000012895 dilution Substances 0.000 description 1
- 238000004090 dissolution Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006862 enzymatic digestion Effects 0.000 description 1
- 230000001973 epigenetic effect Effects 0.000 description 1
- 210000002919 epithelial cell Anatomy 0.000 description 1
- 230000004720 fertilization Effects 0.000 description 1
- 210000004602 germ cell Anatomy 0.000 description 1
- 238000010438 heat treatment Methods 0.000 description 1
- 238000009396 hybridization Methods 0.000 description 1
- 230000000968 intestinal effect Effects 0.000 description 1
- 230000002438 mitochondrial effect Effects 0.000 description 1
- 230000036438 mutation frequency Effects 0.000 description 1
- 210000004681 ovum Anatomy 0.000 description 1
- 229920000642 polymer Polymers 0.000 description 1
- 102000054765 polymorphisms of proteins Human genes 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 239000002096 quantum dot Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 210000003296 saliva Anatomy 0.000 description 1
- 239000000243 solution Substances 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 239000008223 sterile water Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
- 238000002604 ultrasonography Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Chemical compound O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to the technical field of biological genetics, and particularly relates to an identification method for screening syngeneic twins based on SNV and INDEL; the method comprises the following steps: s1: extracting DNA in peripheral blood of the syngeneic twins; s2: preparing a whole genome sequencing pre-library of sample DNA; s3: whole genome sequencing; s4: processing whole genome sequencing data, and screening SNV and INDEL loci of difference between twins in the same egg; s5: designing a specific primer of the site in S4, and constructing a sequencing library through PCR amplification and purification; s6: multiple PCR targeted resequencing; s7: analyzing the targeted resequencing, and verifying the authenticity of the site detected by S4; s8: sanger sequencing was performed for sites of interest that failed validation in S7 or that the PCR product was greater than 350 bp. The identification method provided by the invention can be widely applied to judicial fields such as criminal case detection and the like.
Description
Technical Field
The invention belongs to the technical field of biological genetics, and particularly relates to an identification method for screening syngeneic twins based on SNV and INDEL.
Background
The syngeneic twins are fertilized eggs which are the products of fertilized egg division generated by fertilization of one egg with one sperm, and theoretically have the same genetic information, so that the traditional forensic DNA typing technology cannot be used for identifying individuals under complex conditions such as the syngeneic twins.
In the prior art, identification of synova twins mainly comprises three schemes:
The first scheme is as follows: collecting detection materials such as peripheral blood, oral swab, intestinal biopsy sample and the like of twins in the same egg, obtaining a whole genome DNA methylation level map by utilizing a traditional whole genome bisulfite methylation sequencing technology, and further obtaining a differential methylation region by utilizing a bioinformatic data analysis technology. Furthermore, spanners found differentially methylated CpG sites in saliva DNA of syngeneic twins in 2018 using PCR-high resolution lysis techniques. The technology detects DNA sequence variation by measuring the dissolution characteristic of DNA double chains in the heating process, and is more suitable for forensic laboratories.
The second scheme is as follows: copy Number Variation (CNV) refers to the increase or decrease in copy number of certain large fragments on the genome, and can regulate the plasticity of organisms by changing gene dosage, transcriptional structure, etc., and is one of the main genetic bases for the evolution of individual phenotype diversity and population adaptability. CNV is a structural variation of a gene, has strong polymorphism and relative instability, and has a mutation rate 100-10000 times higher than that of single base substitution. Several research teams have hitherto used algorithms such as comparative genomic hybridization chip or SNP chip technology in combination with Birdsuite, pennCNV, quantiSNP to detect somatic CNV typing differences in peripheral blood and oral epithelial cell DNA of syngeneic twins.
The third scheme is: compared with nuclear DNA, mitochondrial DNA (mtDNA) has the characteristics of high copy number, small genome volume, high mutation rate caused by lack of DNA repair mechanism and the like, so that tiny point mutation differences are successfully detected in mitochondrial genomes of syngeneic twins through an Illumina Hiseq 2000 sequencing platform or a long-reading long single-molecule real-time sequencing technology (SMRT).
However, any of the above existing technical solutions has certain drawbacks in practical applications for identifying twins in common eggs. In particular, if conventional whole genome bisulfite methylation sequencing techniques are employed, bisulfite treatment or enzymatic digestion of DNA samples is often required, which not only consumes a lot of manpower, but also involves the risk of DNA damage. Furthermore, there is growing evidence that such methylation signatures associated with in ovo twins may evolve (i.e., epigenetic drift) during their life due to environmental or aging factors. For CNV, if chip technology is used to detect the CNV typing differences of twins in the same egg, typically, the number of CNV differences is very small and only a very small number of fragments can pass qPCR verification. Furthermore, if mtDNA is used to distinguish syngeneic twins, since mtDNA is maternal and DNA recombination does not occur, mtDNA of the progeny is essentially all from egg cells, and this genetic exclusivity makes it difficult for mtDNA to distinguish syngeneic twins from the same maternal species.
Therefore, the identification method for identifying the synostoma twin, which can realize the detection and the identification of the DNA sample with lower content and has higher efficiency and more economy, has important significance.
Disclosure of Invention
The invention aims to provide an identification method for screening syngeneic twins based on SNV and INDEL so as to solve the problems in the background technology.
In order to solve the technical problems, the invention provides the following technical scheme:
an identification method for identifying twins in common eggs based on SNV and INDEL comprises the following steps: the method comprises the following steps:
S1: extracting DNA in peripheral blood of the twins of the same egg to obtain sample DNA;
S2: preparing a small fragment library A by using sample DNA, and detecting the library concentration and fragment size of the small fragment library A;
s3: performing whole genome sequencing on the small fragment library A obtained in the step S2 to obtain sequencing data;
S4: processing whole genome sequencing data, screening SNV and INDEL sites of difference between twins in the same egg, and taking the SNV and INDEL sites as target sites;
s5: designing a specific primer of the target site in the step S4, and constructing a sequencing library through PCR amplification and purification;
s6: multiple PCR targeted resequencing;
s7: analyzing the targeted resequencing, and verifying the authenticity of the target site in the step S4;
S8: for the target sites that failed validation or the PCR products were greater than 350bp in step S7, sanger sequencing was performed.
Preferably, S2 specifically comprises the following steps:
Firstly, fragmenting sample DNA, carrying out terminal repair and adding A tail, and screening to obtain a flat-terminal DNA fragment;
(II) connecting the joints to obtain a DNA fragment added with the joints;
And thirdly, carrying out PCR amplification and purification to obtain a small fragment library, and detecting the library concentration and fragment size of the small fragment library.
Preferably, S4 specifically comprises the following steps:
Firstly, performing quality control on the original sequencing data, removing low-quality reads, and comparing the reads with human genome sequences;
(II) marking repeated reads and performing alkali matrix weight recalibration;
(III) detection of somatic SNV and INDEL, removal of variant sites located in known genomic structures, such as: repeated structure, copy number variation region, long homomultimer rich in polymorphism and DNA sequence with extremely high GC content;
(IV) further screening SNV and INDEL sites containing at least 5 reads according to the filtering result of the last step, and taking the SNV and INDEL sites as target sites;
and fifthly, performing functional annotation by using ANNOVAR software.
Preferably, S5 specifically comprises the following steps:
Designing a specific primer of a target site by using Premier5 software, and verifying the specificity by using agarose gel electrophoresis;
(II) mixing all forward primers and reverse primers at equal concentrations to form a single forward primer mixture and a single reverse primer mixture;
Amplifying the fragment containing the target site, purifying the amplified PCR product by using magnetic beads, repairing the tail end of the PCR product, and screening to obtain a flat-end DNA fragment;
(IV) connecting the joints to obtain a DNA fragment with the joints;
And fifthly, carrying out PCR amplification and purification on the spliced DNA fragments to obtain a small fragment library B, detecting the library concentration and fragment size of the small fragment library B, and diluting to obtain a sequencing library.
Preferably, the concentration of the single forward primer mixture and the single reverse primer mixture is 10. Mu.M.
Preferably, in step (five) of S5, after detection of the library concentration and fragment size of the small fragment library B, the library is diluted to 8 μm to give the final sequencing library.
Preferably, S7 specifically includes the following steps:
converting the original sequencing data from BCL format to FASTQ format;
(II) removing sequence joints and 3' -end low-quality bases;
Thirdly, comparing the processed sequencing data to human genome, sequencing genome coordinates and establishing an index;
fourth, genotyping all sites of interest.
Preferably, S8 specifically comprises the following steps:
firstly, taking a site with failed verification or PCR product larger than 350bp in S7, designing a specific primer by using Premier5 software, and verifying the specificity by using agarose gel electrophoresis;
(II) mixing all forward primers and reverse primers at equal concentrations to form a single forward primer mixture and a single reverse primer mixture;
amplifying target sites by PCR reaction and purifying;
(IV) taking the purified product for Sanger sequencing.
Preferably, the identification method for identifying the twins in the same egg can be applied to judicial fields such as criminal case detection and the like.
Compared with the prior art, the invention has the following beneficial effects: the short fragment sequence variation identification method and the established sequencing data analysis flow provided by the invention have the advantages of high identification efficiency, more cost effectiveness and reliable results; in addition, the identification method can realize the detection of SNV with lower content, and gives consideration to the practical problem of low DNA content of trace detection materials, degradation samples and the like; therefore, the identification method of the synzygotic twins provided by the invention can be widely applied to judicial fields such as criminal case forensic detection.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate and together with the embodiments of the invention and do not constitute a limitation to the invention, and in which:
FIG. 1 is an experimental flow diagram of an embodiment;
FIG. 2 is the number of SNVs detected in an in ovo sample by whole genome sequencing in the examples;
FIG. 3 is a karyotype distribution of SNV detected in an in ovo sample by whole genome sequencing in the examples;
FIG. 4 is an overlay of SNV detected in an in ovo sample by whole genome sequencing in the examples;
FIG. 5 is an IGV diagram of multiplex PCR targeted resequencing in the examples, taken chr1:104081131, A > G as an example;
FIG. 6 is a verification of Sanger sequencing in the examples, taken as example for chr4:185690326, C > T;
FIG. 7 shows the results of DNA sensitivity analysis in examples, exemplified by chr20:38761028, G > A.
Detailed Description
What follows is a preferred implementation of the embodiments of the invention, it being apparent that the described embodiments are only some, but not all, of the embodiments of the invention. All other embodiments, which are apparent to those of ordinary skill in the art without undue burden, are within the scope of the invention, as would be within the skill of one of ordinary skill in the art without departing from the principles of the embodiments of the present invention.
The test methods used in the examples are conventional methods unless otherwise specified; materials, reagents and the like used, unless otherwise indicated, are all commercially available.
Example 1: the identification of the syngeneic twins is carried out by taking the free DNA in the peripheral blood of the pair of female syngeneic twins as a sample, and the specific process is as follows:
S1: extracting DNA in peripheral blood of the twins of the same egg to obtain sample DNA; the specific operation is as follows: peripheral Blood from syngeneic twins at age 27 and 33 years old were collected and stored in EDTA anticoagulation tubes, and sample DNA was extracted from Blood samples using a commercially available QIAAMP DNA Blood mini Kit (Qiagen, hilden, germany), and DNA concentration was determined using QubitTM DSDNA HS ASSAY KIT (ThermoFisher Scientific, carlsbad, USA);
S2: sample DNA was fragmented into fragments of average size 350bp using Covaris ultrasound equipment (Covaris, woburn, MA, USA); a small fragment library A was prepared using Illumina TruSeq Nano DNA Kit (Illumina, san Diego, calif., USA) according to the manufacturer's instructions, using an initial DNA dose of 100ng;
specifically, the broken DNA fragment is subjected to magnetic bead purification, and then end repair and A tail addition are carried out to obtain a blunt-end DNA fragment; connecting the joints, and carrying out PCR amplification and purification to obtain a small fragment library A; concentration of small fragment library a was detected using the Quant-iT PicoGreen dsDNA detection kit (Invitrogen, carlsbad, CA, USA), qPCR quantification was performed using the Universal KAPA library quantification kit, purity analysis was performed using Agilent 2100;
s3: whole genome sequencing: carrying out double-end sequencing on the small fragment library A in S2 by Illumina NovaSeq 6000,6000 to obtain sequencing data in the FASTQ format, wherein the average sequencing depth is not less than 30×, and the sequence length is 150bp;
S4: processing whole genome sequencing data, screening SNV and INDEL sites of difference between twins in the same egg, and taking the SNV and INDEL sites as target sites; the specific operation is as follows:
quality control was performed on the sequencing data obtained in S3 using FastQC to remove low quality bases, and four sample cases were as follows:
Table one:
total reads number (M) | %≥Q30 | |
Twin A_27 years old | 31.304 | 89.19 |
Twin B_27 years old | 31.109 | 89.41 |
Twin A_33 years old | 31.097 | 89.08 |
Twin B_33 years old | 31.099 | 89.27 |
The filtered sequences were aligned to human reference genome hg38 using BWA-MEM (0.7.17) to obtain BAM files; marking duplicate reads occurring in the BAM file using a command MarkDuplicate of Picard (2.19.0); the base quality scores were re-corrected using the command BaseRecalibrator of GATK (3.8.1) in combination with known SNV and INDEL sites from the thousand genome project (ftp:// ftp. Broadinstrument/hg 38/Mills_and 1000G_gold_standard. Indexes. Hg38.Vcf. Gz) and dbSNP database (ftp:// ftp. Broadinstrument. Org/bundle/hg38/dbSNP _146. Hcf. Gz);
Detection of somatic SNV and INDEL was performed according to the GATK standard procedure; specifically, a Panel of Normal (PoN) file is first created based on internal whole genome sequencing data using Mutect software; mutect2 was run in the turmor-only mode and somatic SNV and INDEL were detected in combination with PoN and germline polymorphism information (gnomAD); filtering high quality variant sites using FilterMutectCalls and further excluding variant sites located in known genomic structures, including repeat structures, copy number variations, long homomultimers rich in polymorphisms, and sequences with very high GC content; screening SNV and INDEL loci with at least 5 reads on each allele, and taking the SNV and INDEL loci as target loci for subsequent analysis; functional annotation using ANNOVAR;
S5: verifying the target site screened in the step S4 through targeted resequencing; specifically, the specific primers were designed using Premier5 software and their specificity was verified using agarose gel electrophoresis; mixing all forward and reverse primers at equal concentrations to form a single forward primer mixture and a single reverse primer mixture, the final concentration being 10 μm; amplifying target sites by adopting a two-step PCR reaction, connecting joints, and preparing a reaction system according to a second table;
And (II) table:
Fully and uniformly mixing the reaction systems, centrifuging briefly, putting the mixture into a PCR instrument, and operating according to the program of the third table;
Table three:
Combining four multiplex amplified PCR samples into a mixed sample of equimolar concentration to obtain a small fragment library B, detecting fragment size (< 350 bp) using 2% agarose gel, and quantifying by a Qubit fluorometer (ThermoFisher Scientific, carlsbad, USA); diluting the small fragment library B to 8 mu M according to the qPCR result of the last step to obtain a final sequencing library;
s6: multiplex PCR targeted resequencing: performing double-end deep sequencing on the sequencing library in S5 by using Illumina NovaSeq 6000,6000 to obtain sequencing data in a BCL format;
S7: targeted resequencing data analysis: converting the sequencing data of S6 (BCL format) into FASTQ format using BCL2FASTQ software; the adapter sequence and 3' low quality bases were removed using fastp software and four sample filters were as shown in Table IV;
table four:
The processed FASTQ file is compared with the human reference genome hg38 through a BWA-MEM algorithm to obtain a BAM file; finishing genome coordinate sequencing and establishing BAM file indexes by using samtools software to obtain a processed BAM file; genotyping all target sites through GATK HaplotypeCaller (- -genotyping-mode GENOTYPE _ GIVENALLELES) to obtain information such as genotype, sequencing depth and mutation frequency of each site;
S8: performing Sanger sequencing on the target site failed to verify in the S7 or the target site with the PCR product larger than 350 bp; specifically, a primer specific to a target site is designed by using Premier5 software, and the specificity of the primer is verified by using agarose gel electrophoresis; mixing all forward and reverse primers at equal concentrations to form a single forward primer mixture and a single reverse primer mixture, the final concentration being 10 μm; amplifying target sites by adopting a two-step PCR reaction, and preparing a reaction system according to a fifth table;
Table five:
Reaction system | Dosage (mu L) |
DNA(1ng) | 1 |
2 XPCR amplification mixture | 10 |
Site-specific forward primer (10. Mu.M) | 1 |
Site-specific reverse primer (10. Mu.M) | 1 |
PCR sterile water | 7 |
Total volume of | 20 |
Fully mixing the reaction systems, centrifuging briefly, and placing the mixture into GENEAMP PCR SYSTEM 9700:9700 thermal cycler (ThermoFisher Scientific, carlsbad, USA) to operate according to the procedure shown in Table six;
Table six:
The amplification products were purified using a QIAquick PCR purification kit (Qiagen, hilden, germany); the purified product was sequenced on an ABI 3730xl capillary sequencer according to standard protocols, equipped with a 50cm capillary and POP7 polymer;
in sum, 9 different sites of the syngeneic twin are added, and the specific cases are shown in Table seven:
table seven:
note that: * It was shown that SNV could be verified simultaneously by multiplex PCR targeted resequencing and Sanger sequencing.
Double blind testing to verify the efficacy of individual identification: from a DNA sample from a 27 year old syngeneic twin, 6 samples were randomly extracted, PCR reactions were performed based on the 9 SNV sites identified above, DNA preparation and PCR analysis were performed independently by two investigators, two persons had no knowledge of the sample information, and blind test results are shown in Table eight:
table eight:
Testing SNV | Sample_A | Sample_B | Sample_C | Sample_D | Sample_E | Sample_F |
chr1:104081131_A>G | A | A/G | A | A | A/G | A |
chr3:89767141_T>C | T | T/C | T | T | T/C | T |
chr4:185690326_C>T | C | C/T | C | C | C/T | C |
chr5:98825091_A>T | A | A/T | A | A | A/T | A |
chr9:89734622_G>C | G | G/C | G | G | G/C | G |
chr15:38160979_T>A | T | T/A | T | T | T/A | - |
chr17:68016720_T>G | T/G | T | T/G | T/G | T | T/G |
chr18:60663612_C>T | C/T | C | C/T | C/T | C | C/T |
chr20:38763028_G>A | G/A | G | G/A | G/A | G | G/A |
Common ovum twin | A | B | A | A | B | A |
DNA sensitivity analysis: genomic DNA from twins from 33 years old in ovo was diluted to produce 8 concentration gradients of 1,0.5,0.25,0.125,0.075,0.05,0.025,0.0125 ng/. Mu.L, respectively; 1 mu L of DNA of each dilution is taken to construct a library in sequence; specifically, a target site is amplified by adopting a two-step PCR reaction, and a reaction system is prepared according to a table nine;
Table nine:
Fully mixing the reaction systems, centrifuging briefly, and placing the mixture into GENEAMP PCR SYSTEM 9700 thermal cycler (ThermoFisher Scientific, carlsbad, USA) to operate according to the procedure of Table ten;
table ten:
The amplification products were purified using a QIAquick PCR purification kit (Qiagen, hilden, germany); the purified product was sequenced on an ABI 3730xl capillary sequencer according to instructions;
According to the capillary electrophoresis result, when the DNA content is as low as 0.25ng, all 9 SNV sites can be detected; when further down to 0.075ng, 7 SNV sites remain detectable.
In conclusion, according to the results, 9 SNVs can effectively identify the syngeneic twins in the examples, and the identification can still be successfully implemented under the condition of reduced DNA content.
Finally, it should be noted that: the foregoing description of the preferred embodiments of the present application is merely illustrative, and the scope of the present application is not limited thereto, since any changes or substitutions that would be easily contemplated by those skilled in the art within the scope of the present application shall fall within the scope of the present application; the embodiments of the present application and features in the embodiments may be combined with each other without conflict. Therefore, the protection scope of the application is subject to the protection scope of the claims.
Claims (9)
1. An identification method for identifying twins in common eggs based on SNV and INDEL is characterized in that: the method comprises the following steps:
S1: extracting DNA in peripheral blood of the twins of the same egg to obtain sample DNA;
S2: preparing a small fragment library A by using sample DNA, and detecting the library concentration and fragment size of the small fragment library A;
s3: performing whole genome sequencing on the small fragment library A obtained in the step S2 to obtain sequencing data;
S4: processing whole genome sequencing data, screening SNV and INDEL sites of difference between twins in the same egg, and taking the SNV and INDEL sites as target sites;
s5: designing a specific primer of the target site in the step S4, and constructing a sequencing library through PCR amplification and purification;
s6: multiple PCR targeted resequencing;
s7: analyzing the target resequencing sequence, and verifying the authenticity of the target site in the step S4;
S8: and (3) carrying out Sanger sequencing on the target site failed to verify in the step S7 or the target site with the PCR product larger than 350 bp.
2. The method for identifying the twins based on SNV and INDEL according to claim 1, wherein the method comprises the following steps: s2 specifically comprises the following steps:
Firstly, fragmenting sample DNA, carrying out terminal repair and adding A tail, and screening to obtain a flat-terminal DNA fragment;
(II) connecting the joints to obtain a DNA fragment added with the joints;
And thirdly, carrying out PCR amplification and purification to obtain a small fragment library, and detecting the library concentration and fragment size of the small fragment library.
3. The method for identifying the twins based on SNV and INDEL according to claim 1, wherein the method comprises the following steps: s4 specifically comprises the following steps:
Firstly, performing quality control on the original sequencing data, removing low-quality reads, and comparing the reads with human genome sequences;
(II) marking repeated reads and performing alkali matrix weight recalibration;
(III) detecting SNV and INDEL of the somatic cells, and removing mutation sites located in known genome structures;
(IV) further screening SNV and INDEL sites containing at least 5 reads according to the filtering result of the last step, and taking the SNV and INDEL sites as target sites;
and fifthly, performing functional annotation by using ANNOVAR software.
4. The method for identifying the twins based on SNV and INDEL according to claim 1, wherein the method comprises the following steps: s5 specifically comprises the following steps:
Designing a specific primer of a target site by using Premier5 software, and verifying the specificity by using agarose gel electrophoresis;
(II) mixing all forward primers and reverse primers at equal concentrations to form a single forward primer mixture and a single reverse primer mixture;
Amplifying the fragment containing the target site, purifying the amplified PCR product by using magnetic beads, repairing the tail end of the PCR product, and screening to obtain a flat-end DNA fragment;
(IV) connecting the joints to obtain a DNA fragment with the joints;
And fifthly, carrying out PCR amplification and purification on the spliced DNA fragments to obtain a small fragment library B, detecting the library concentration and fragment size of the small fragment library B, and diluting to obtain a sequencing library.
5. The method for identifying the twins based on SNV and INDEL according to claim 4, wherein the method comprises the following steps: the concentration of the single forward primer mixture and the single reverse primer mixture was 10. Mu.M.
6. The method for identifying the twins based on SNV and INDEL according to claim 4, wherein the method comprises the following steps: in step (five) of S5, after detecting the library concentration and fragment size of the small fragment library B, the library was diluted to 8. Mu.M, to obtain the final sequencing library.
7. The method for identifying the twins based on SNV and INDEL according to claim 1, wherein the method comprises the following steps: s7 specifically comprises the following steps:
converting the original sequencing data from BCL format to FASTQ format;
(II) removing sequence joints and 3' -end low-quality bases;
Thirdly, comparing the processed sequencing data to human genome, sequencing genome coordinates and establishing an index;
fourth, genotyping all sites of interest.
8. The method for identifying the twins based on SNV and INDEL according to claim 1, wherein the method comprises the following steps: s8 specifically comprises the following steps:
firstly, taking a site with failed verification or PCR product larger than 350bp in S7, designing a specific primer by using Premier5 software, and verifying the specificity by using agarose gel electrophoresis;
(II) mixing all forward primers and reverse primers at equal concentrations to form a single forward primer mixture and a single reverse primer mixture;
amplifying target sites by PCR reaction and purifying;
(IV) taking the purified product for Sanger sequencing.
9. The use of any one of the SNV and INDEL based methods for identifying syngeneic twins according to claims 1-8, wherein: can be used in the judicial field.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410029423.XA CN117965748A (en) | 2024-01-09 | 2024-01-09 | Identification method for screening synegg twins based on SNV and INDEL |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410029423.XA CN117965748A (en) | 2024-01-09 | 2024-01-09 | Identification method for screening synegg twins based on SNV and INDEL |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117965748A true CN117965748A (en) | 2024-05-03 |
Family
ID=90862510
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410029423.XA Pending CN117965748A (en) | 2024-01-09 | 2024-01-09 | Identification method for screening synegg twins based on SNV and INDEL |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117965748A (en) |
-
2024
- 2024-01-09 CN CN202410029423.XA patent/CN117965748A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11530446B2 (en) | Methods and compositions for DNA profiling | |
US20220316005A1 (en) | Safe sequencing system | |
CA3063750C (en) | Universal short adapters with variable length non-random unique molecular identifiers | |
US10612096B2 (en) | Methods for determining fraction of fetal nucleic acids in maternal samples | |
CA2983935C (en) | Error suppression in sequenced dna fragments using redundant reads with unique molecular indices (umis) | |
EP3191993B1 (en) | Detecting repeat expansions with short read sequencing data | |
US20230340590A1 (en) | Method for verifying bioassay samples | |
EP3329010B1 (en) | Nucleic acids and methods for detecting chromosomal abnormalities | |
DK2513339T3 (en) | PROCEDURE FOR DETERMINING FRACTION OF Fetal NUCLEIC ACID IN MATERNAL SAMPLES | |
JP5389638B2 (en) | High-throughput detection of molecular markers based on restriction fragments | |
EP2663655B1 (en) | Paired end random sequence based genotyping | |
WO2018208699A1 (en) | Universal short adapters for indexing of polynucleotide samples | |
CA3060369A1 (en) | Optimal index sequences for multiplex massively parallel sequencing | |
CA3114759A1 (en) | Sequence-graph based tool for determining variation in short tandem repeat regions | |
CN105950707A (en) | Method and system for determining nucleic acid sequence | |
CN115989544A (en) | Method and system for visualizing short reads in repetitive regions of a genome | |
CN117965748A (en) | Identification method for screening synegg twins based on SNV and INDEL | |
US20230340609A1 (en) | Cancer detection, monitoring, and reporting from sequencing cell-free dna | |
EP3409788B1 (en) | Method and system for nucleic acid sequencing | |
CN114507707B (en) | Method for constructing haplotype by enrichment of target region and enzyme digestion | |
WO2019108014A1 (en) | Method for measuring integrity of uid nucleic acid sequence in nucleic acid sequencing analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |