CN118086534A - Eriocheir sinensis DNA fingerprint and construction method and application thereof - Google Patents
Eriocheir sinensis DNA fingerprint and construction method and application thereof Download PDFInfo
- Publication number
- CN118086534A CN118086534A CN202410435641.3A CN202410435641A CN118086534A CN 118086534 A CN118086534 A CN 118086534A CN 202410435641 A CN202410435641 A CN 202410435641A CN 118086534 A CN118086534 A CN 118086534A
- Authority
- CN
- China
- Prior art keywords
- eriocheir sinensis
- dna
- snp
- dna fingerprint
- sequencing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 241000371997 Eriocheir sinensis Species 0.000 title claims abstract description 108
- 238000010276 construction Methods 0.000 title abstract description 7
- 238000012163 sequencing technique Methods 0.000 claims abstract description 54
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 claims abstract description 43
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000012216 screening Methods 0.000 claims abstract description 14
- 239000002773 nucleotide Substances 0.000 claims abstract description 11
- 125000003729 nucleotide group Chemical group 0.000 claims abstract description 11
- 238000001914 filtration Methods 0.000 claims description 22
- 230000002068 genetic effect Effects 0.000 claims description 14
- 238000004458 analytical method Methods 0.000 claims description 10
- 210000000349 chromosome Anatomy 0.000 claims description 9
- 238000001514 detection method Methods 0.000 claims description 9
- 230000035772 mutation Effects 0.000 claims description 9
- 239000012634 fragment Substances 0.000 claims description 8
- 238000003908 quality control method Methods 0.000 claims description 8
- 238000007621 cluster analysis Methods 0.000 claims description 7
- 210000003205 muscle Anatomy 0.000 claims description 6
- 238000011144 upstream manufacturing Methods 0.000 claims description 6
- 238000012217 deletion Methods 0.000 claims description 5
- 230000037430 deletion Effects 0.000 claims description 5
- 101100240528 Caenorhabditis elegans nhr-23 gene Proteins 0.000 claims description 4
- 239000011324 bead Substances 0.000 claims description 4
- 102000004190 Enzymes Human genes 0.000 claims description 3
- 108090000790 Enzymes Proteins 0.000 claims description 3
- 102000003960 Ligases Human genes 0.000 claims description 3
- 108090000364 Ligases Proteins 0.000 claims description 3
- 238000012408 PCR amplification Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 claims description 3
- 239000002077 nanosphere Substances 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 3
- 230000010076 replication Effects 0.000 claims description 3
- 238000005096 rolling process Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 230000007812 deficiency Effects 0.000 claims description 2
- 238000013441 quality evaluation Methods 0.000 claims description 2
- 230000011218 segmentation Effects 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000007363 ring formation reaction Methods 0.000 claims 1
- 238000009395 breeding Methods 0.000 abstract description 3
- 230000001488 breeding effect Effects 0.000 abstract description 3
- 108020004414 DNA Proteins 0.000 description 60
- 239000004615 ingredient Substances 0.000 description 4
- 239000003147 molecular marker Substances 0.000 description 4
- 235000016709 nutrition Nutrition 0.000 description 4
- 241000238557 Decapoda Species 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 230000000877 morphologic effect Effects 0.000 description 3
- 238000007894 restriction fragment length polymorphism technique Methods 0.000 description 3
- 108091092878 Microsatellite Proteins 0.000 description 2
- 238000011109 contamination Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000010899 nucleation Methods 0.000 description 2
- 238000001303 quality assessment method Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 239000003643 water by type Substances 0.000 description 2
- 241000371986 Eriocheir Species 0.000 description 1
- 241000733943 Hapalogaster mertensii Species 0.000 description 1
- 241001251758 Varunidae Species 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000009360 aquaculture Methods 0.000 description 1
- 244000144974 aquaculture Species 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 235000014113 dietary fatty acids Nutrition 0.000 description 1
- 229930195729 fatty acid Natural products 0.000 description 1
- 239000000194 fatty acid Substances 0.000 description 1
- 150000004665 fatty acids Chemical class 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 241000894007 species Species 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/80—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
- Y02A40/81—Aquaculture, e.g. of fish
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention discloses a eriocheir sinensis DNA fingerprint and a construction method and application thereof, wherein after re-sequencing and site screening are carried out on 46 eriocheir sinensis samples in Yangtze river water area and Liaohe water area, 78 high-specificity Single Nucleotide Polymorphism (SNP) sites which can be used for eriocheir sinensis germplasm resource identification are obtained, and the eriocheir sinensis DNA fingerprint is constructed. The constructed eriocheir sinensis DNA fingerprint can be used for rapidly and accurately identifying the water area of Yangtze river and the water area of Liaohe, and the invention constructs the DNA fingerprint by utilizing SNP locus information for the first time, thereby providing a more effective and more accurate method for the aspects of eriocheir sinensis variety identification, group source, molecular auxiliary breeding and the like.
Description
Technical Field
The invention belongs to the field of molecular biology, and particularly relates to a eriocheir sinensis DNA fingerprint spectrum and a construction method and application thereof.
Background
Eriocheir sinensis (Eriocheir sinensis) belongs to the genus Eriocheir sinensis (Decapoda), the family Eriocheir sinensis (Varunidae), the genus Eriocheir sinensis (Eriocheir), also called river crab and hairy crab, etc., and is mainly distributed in water areas such as Yangtze river and Liaoning river in China, and is an important aquaculture economic variety. The eriocheir sinensis resources are degenerated due to factors such as excessive fishing, water environment deterioration, habitat damage and the like. Therefore, the phenomenon of unordered seeding and blind seeding of different water systems is more frequent, which inevitably leads to degradation of the germplasm resources of the river crabs, and the yield and the quality are reduced, thus severely restricting the green development of the river crab industry. In addition, the cultivation scale of the Eriocheir sinensis in China is continuously enlarged, and the problems of irregular seed industry protection measures and cultivation management are increasingly prominent. In addition, river crabs are easy to escape during the cultivation and transportation processes, and the mixing of germplasm resources is aggravated. Therefore, a method for accurately identifying different varieties of eriocheir sinensis is urgently needed, and the method is particularly important for subsequent germplasm resource identification and genetic diversity research.
The DNA fingerprint constructed by utilizing the SNP locus with high polymorphism has the characteristics of strong polymorphism, difficult influence of environmental conditions and subjective factors, more accurate identification of genetic information of different varieties and the like. The DNA fingerprint has wide application prospect in variety germplasm resource identification and genetic diversity, but is not applied to the research of Eriocheir sinensis germplasm resource identification.
The prior art for identifying the Eriocheir sinensis variety mainly has the following problems: 1. the existing method for identifying varieties mainly uses molecular markers such as RFLP, ISSR, SSR and the like, and has long required experimental period and low accuracy. 2. The prior art utilizes morphological characteristics of the head and the chest armor and the like to identify the eriocheir sinensis germplasm resources, and is greatly influenced by human factors and environmental conditions. 3. In the prior art, the eriocheir sinensis from different geographical sources is distinguished by utilizing the composition and the content of nutritional ingredients such as fatty acid, the influences of feed composition, cultivation environment and the like on the composition of the nutritional ingredients of the eriocheir sinensis cannot be eliminated, and germplasm resources cannot be identified from genetic essence.
Disclosure of Invention
The invention aims to: aiming at the problems existing in the prior art, the invention provides the eriocheir sinensis DNA fingerprint, and the DNA fingerprint can be used for rapidly, efficiently and accurately identifying the germplasm resources of different water areas of eriocheir sinensis. The invention provides the DNA fingerprint spectrum for identifying the eriocheir sinensis for the first time and has high accuracy.
The invention also provides a construction method and application of the eriocheir sinensis DNA fingerprint.
The technical scheme is as follows: in order to achieve the above purpose, the Eriocheir sinensis DNA fingerprint comprises 78 SNP sites, wherein the SNP sites and specific nucleotides are as follows:
Sequence number | Chromosome of the human body | Position of | Nucleotide(s) | Sequence number | Chromosome of the human body | Position of | Nucleotide(s) |
1 | Chr1 | 99217 | T/C | 40 | Chr32 | 2328181 | A/T |
2 | Chr1 | 38720448 | G/A | 41 | Chr33 | 161249 | C/A |
3 | Chr2 | 405837 | G/A | 42 | Chr34 | 1666626 | A/G |
4 | Chr2 | 25617819 | G/A | 43 | Chr35 | 567310 | T/G |
5 | Chr3 | 70473 | A/G | 44 | Chr36 | 672423 | G/A |
6 | Chr3 | 25107852 | T/C | 45 | Chr37 | 424371 | A/C |
7 | Chr4 | 1558939 | C/T | 46 | Chr38 | 323667 | A/G |
8 | Chr4 | 26621963 | C/T | 47 | Chr39 | 135012 | A/T |
9 | Chr5 | 584213 | G/A | 48 | Chr40 | 53631 | C/A |
10 | Chr5 | 25646473 | T/C | 49 | Chr41 | 131332 | A/T |
11 | Chr6 | 215261 | G/C | 50 | Chr42 | 193854 | C/T |
12 | Chr6 | 25243217 | T/C | 51 | Chr43 | 2957 | C/T |
13 | Chr7 | 128231 | C/T | 52 | Chr44 | 374945 | A/G |
14 | Chr7 | 25130734 | A/G | 53 | Chr45 | 841188 | T/C |
15 | Chr8 | 499625 | G/A | 54 | Chr46 | 142856 | C/T |
16 | Chr9 | 486990 | A/G | 55 | Chr47 | 162792 | G/A |
17 | Chr9 | 25518156 | G/A | 56 | Chr48 | 61003 | G/A |
18 | Chr10 | 292307 | A/G | 57 | Chr49 | 243340 | T/G |
19 | Chr11 | 793246 | A/T | 58 | Chr50 | 166519 | C/T |
20 | Chr12 | 145168 | C/G | 59 | Chr51 | 23060 | T/A |
21 | Chr13 | 1008227 | T/G | 60 | Chr52 | 55932 | G/A |
22 | Chr14 | 54272 | A/G | 61 | Chr53 | 2239866 | T/A |
23 | Chr15 | 179693 | A/G | 62 | Chr54 | 14560 | A/G |
24 | Chr16 | 2699367 | G/A | 63 | Chr55 | 23007 | G/A |
25 | Chr17 | 73954 | C/G | 64 | Chr56 | 189487 | G/A |
26 | Chr18 | 192820 | T/C | 65 | Chr57 | 254078 | C/T |
27 | Chr19 | 116958 | G/C | 66 | Chr58 | 65370 | T/A |
28 | Chr20 | 282617 | A/T | 67 | Chr59 | 157051 | A/G |
29 | Chr21 | 419905 | T/G | 68 | Chr60 | 86713 | C/T |
30 | Chr22 | 769751 | C/T | 69 | Chr61 | 587207 | A/G |
31 | Chr23 | 11879 | G/T | 70 | Chr62 | 10220 | G/C |
32 | Chr24 | 22105 | A/C | 71 | Chr63 | 390533 | C/T |
33 | Chr25 | 4578 | T/G | 72 | Chr64 | 278579 | T/A |
34 | Chr26 | 118662 | C/T | 73 | Chr65 | 546629 | C/A |
35 | Chr27 | 688832 | C/T | 74 | Chr66 | 108002 | G/C |
36 | Chr28 | 45286 | G/A | 75 | Chr67 | 114412 | A/G |
37 | Chr29 | 1000395 | A/G | 76 | Chr68 | 15155 | T/C |
38 | Chr30 | 72370 | C/T | 77 | Chr69 | 1028650 | A/G |
39 | Chr31 | 41525 | G/A | 78 | Chr70 | 962197 | A/G |
。
The eriocheir sinensis DNA fingerprint comprises a eriocheir sinensis DNA fingerprint in the Yangtze river area and a eriocheir sinensis DNA fingerprint in the Liaoning river area, wherein the two DNA fingerprints comprise 78 SNP sites, and the SNP sites and specific nucleotides of the eriocheir sinensis DNA fingerprint in the Yangtze river area are as follows:
The SNP locus and specific nucleotide of the Eriocheir sinensis DNA fingerprint in the Liaoning river water area are as follows:
the construction method of the Eriocheir sinensis DNA fingerprint comprises the following steps:
(1) Collecting Eriocheir sinensis in Yangtze river and Liaohe water;
(2) Extracting DNA of eriocheir sinensis muscle tissues;
(3) Carrying out resequencing on the DNA fragments with qualified quality control on a machine;
(4) Filtering the obtained sequencing data, detecting pollution of the sequencing data and evaluating the quality of the sequencing data;
(5) The readss of each sample is compared with a reference genome by Sentieon software and mutation detection is carried out, so that a specific SNP locus database of the eriocheir sinensis is constructed;
(6) And further filtering according to the deficiency rate, MAF value, single copy, site heterozygosity and depth, and screening high polymorphism SNP sites for constructing DNA fingerprint.
Wherein, the step (3) of mechanically re-sequencing the qualified DNA fragments comprises the following specific steps: performing enzyme slicing and sectioning on the DNA sample, linking a sequencing joint with the sectioning DNA together by using a ligase, performing PCR amplification on a connection product, and performing fragment screening on the PCR product by using magnetic beads; then, the linear library is denatured into single strands, cyclized to form a single-strand circular library, the single-strand circular library is subjected to rolling circle replication to form DNA Nanospheres (DNB), and finally DNB is loaded into a sequencing chip by using loading equipment MGIDL-T7, and re-sequencing is performed on the machine by combining probe-anchored polymerization technology.
The sequencing data obtained in the step (4) is filtered, the pollution of the sequencing data is detected, and the quality evaluation of the sequencing data comprises summarizing the sequencing data and quality indexes, the base content of the sequencing data and the pollution of the sequencing data. Wherein, the reference genome in the step (5) is TXID95602 eriocheir sinensis whole genome at NCBI. And (3) performing mutation detection in the step (5) to obtain gVCF of each sample, performing joint-calling by using Sentieon, and performing joint analysis on gVCF of all samples to obtain a mutation result of each individual. And performing preliminary filtration (SNP hard filtration standard: QD < 2.0|FS > 60.0|MQ < 40.0|SOR > 3.0| MQRankSum < -12.5| ReadPosRankSum < -8.0) on the SNP locus obtained after the joint analysis to obtain a specific SNP locus database of Eriocheir sinensis.
Wherein, the screening standard in the step (6) is that the deletion rate is 0; MAF value is more than or equal to 0.1; extracting sequences 100bp upstream and downstream of the locus to perform copy number analysis, and reserving the locus of the upstream and downstream sequences which is unique on the genome; site heterozygosity <0.15 and site depth >10 was further filtered.
The DNA fingerprint of Eriocheir sinensis is applied to identification of germplasm resources of different water areas of Eriocheir sinensis.
The application process comprises the following steps:
(1) Extracting DNA from a Eriocheir sinensis sample to be detected and resequencing;
(2) Performing quality control on the test data, performing preliminary filtration according to SNP hard filtration standards, further filtering according to the deletion rate, the MAF value, single copy, site heterozygosity and depth, and finally uniformly distributing screening sites on chromosomes according to intervals of more than 10Mb to obtain SNP sites capable of completely separating all samples;
(3) Calculating genetic distances among samples and constructing a phylogenetic tree by using Plink software for the screened SNP loci with high polymorphism;
(4) Comparing the screened SNP loci with DNA fingerprints of eriocheir sinensis in different water areas, determining that the sample to be detected is a eriocheir sinensis group when the coincidence rate is more than or equal to 95%, carrying out cluster analysis on genotypes of the SNP loci with high polymorphism, and judging that the eriocheir sinensis in the unknown water area belongs to the Liaohe water area group or the Yangtze river water area group according to the cluster condition of the phylogenetic tree.
Preferably, the Eriocheir sinensis sample to be detected in the step (1) is muscle tissue; the Eriocheir sinensis groups are randomly caught in the Yangtze river water area and the Liaohe river water area respectively.
And (3) and (4) comparing the distances of the genetic relationships among the samples, and performing group cluster analysis to identify the water area group in the Yangtze river or the water area group in the Liaohe by calculating the genetic distance among the samples.
The eriocheir sinensis is mainly distributed in the Yangtze river water area and the Liaohe river water area, and the regional river crab germplasm resources are rich, so that a high-quality breeding material can be provided for subsequent cultivation of new varieties. Therefore, the DNA fingerprint is constructed according to the specific SNP locus combination of the eriocheir sinensis in the Yangtze river area and the Liaohe river area. The invention provides a brand new eriocheir sinensis DNA fingerprint, which is constructed by carrying out re-sequencing and site screening on 46 eriocheir sinensis samples in Yangtze river water areas and Liaohe water areas to obtain 78 high-specificity SNP sites which can be used for eriocheir sinensis germplasm resource identification. The invention can rapidly and accurately identify the water area of Yangtze river and the water area of Liaohe by utilizing the DNA fingerprint of Eriocheir sinensis. The invention constructs DNA fingerprint by utilizing SNP locus information, and provides a more effective and more accurate method for the aspects of eriocheir sinensis variety identification, population source, molecular auxiliary breeding and the like.
The invention constructs the eriocheir sinensis DNA fingerprint by utilizing the SNP locus with high polymorphism for the first time. The invention utilizes SNP locus to identify eriocheir sinensis in different waters from genetic level, and avoids errors caused by various factors when water area identification is carried out by indexes such as morphological characteristics, nutritional ingredients and the like. In addition, because SNP loci have the advantages of polymorphism, wide distribution, high stability and the like, compared with DNA fingerprint constructed by using other molecular markers, the identification result of the invention has higher accuracy. Based on the DNA fingerprint constructed by the invention, the Eriocheir sinensis in the unknown water area can be identified as belonging to the Liaohe water area or the Yangtze river water area by only selecting 78 high polymorphism SNPs screened by the invention for genotype cluster analysis without carrying out all SNP locus detection. Features and advantages of different molecular markers are carefully compared, and the first generation molecular marker is represented by RFLP, so that the cost is high, the experimental steps are more, the period is long and the marker stability is poor. If the second generation molecular marker microsatellite marker can not be directly searched from the DNA database, the second generation molecular marker microsatellite marker must be sequenced first, and then the primer is designed, so that the development cost is high. The invention constructs DNA fingerprint by using SNP molecular markers, has rich content in genome of all organisms, low mutation rate and low acquisition cost.
The beneficial effects are that: compared with the prior art, the invention has the remarkable advantages that:
1. the invention provides an identification method for constructing Eriocheir sinensis DNA fingerprint by obtaining specific SNP molecular markers through a whole genome resequencing technology.
2. The third generation molecular marker SNP has a large number and is widely and uniformly distributed on the genome; the stability is high; the method is suitable for rapid and large-scale screening and the like, utilizes SNP molecular markers to construct DNA fingerprint, and overcomes the defects of long sequencing time, high price, smaller sequencing flux and the like of seed resource identification by utilizing first-generation and second-generation molecular markers such as RFLP, ISSR, SSR and the like.
3. According to the invention, the fingerprint is drawn based on the high polymorphism SNP obtained by the whole genome resequencing of the eriocheir sinensis, and the genetic distance between samples is calculated, so that the eriocheir sinensis of different varieties is accurately identified from the genetic level, and the influence on classification results due to subjective factors, environmental conditions and the like when variety identification is carried out by means of morphological characteristics, nutritional ingredient content and the like is avoided.
4. The high polymorphism SNP loci are subjected to genotype conversion, so that the Eriocheir sinensis SNP molecular identity card can be manufactured, and the variety classification can be rapidly performed by only carrying out cluster analysis on genotypes of SNP loci corresponding to DNA fingerprint patterns when the Eriocheir sinensis is subjected to variety identification.
Drawings
FIG. 1is a base mass distribution diagram of sample sequencing data;
FIG. 2 is a plot of sample sequencing base content;
FIG. 3 is a distribution diagram of 78 SNPs on a chromosome;
FIG. 4 is a phylogenetic tree constructed using 78 SNP pairs for 46 Eriocheir sinensis samples;
FIG. 5 is a graph showing a cluster analysis of 30 Eriocheir sinensis genotypes by using 78 SNP loci.
Detailed Description
For a better understanding of the present invention, the following description will make clear and complete description of the technical solution of the present invention with reference to the accompanying drawings in the embodiments. It is evident that the embodiments described are only some of the embodiments of the invention and that all other embodiments obtained by a person skilled in the art without making any inventive effort fall within the scope of protection of the invention.
Example 1
Construction of eriocheir sinensis DNA fingerprint
1. The invention collects 46 eriocheir sinensis samples from the Yangtze river area and the Liaoriver area at random, wherein 28 Yangtze river areas are marked as CJ-1-CJ-28; 18 Liaohe waters are marked as LH-1-LH-18.
2. And taking muscle tissues of the sample to be detected, extracting DNA and performing quality control.
And (3) extracting DNA from the sample tissue by using a magnetic bead method, detecting the concentration and the integrity of the DNA sample, and reserving the sample with single and clear DNA band and no dragging phenomenon under a gel imaging system. In the embodiment, the quality control of the DNA of 46 eriocheir sinensis muscle tissues is qualified and can be re-sequenced in a subsequent machine.
3. And (5) mechanically resequencing the DNA fragments with qualified quality control.
The DNA sample is subjected to enzyme section segmentation, a sequencing joint and the segmented DNA are linked together by using ligase, the connected product is subjected to PCR amplification, and magnetic beads are used for screening according to the size of the PCR product fragments. Then, the linear library was denatured into single strands, and then cyclized to form a single-stranded circular library, which was subjected to rolling circle replication to form DNA Nanospheres (DNB). Finally, DNB is loaded into the sequencing chip using loading device MGIDL-T7 and resequencing is performed on-machine by a combined probe-anchored polymerization technique.
4. And (5) preprocessing resequencing data and controlling quality.
(1) And (5) summarizing sequencing data and quality indexes. Raw sequencing data were filtered and quality assessed. To ensure SNP accuracy, the sequencing data is subjected to preliminary hard filtration (SNP hard filtration criteria: QD <2.0||FS >60.0||MQ <40.0||SOR >3.0|| MQRankSum < -12.5|| ReadPosRankSum < -8.0). After filtering out the low quality data, quality assessment is performed on the sequencing data. The base sequencing quality value can reflect a sequencing error rate, which corresponds to a sequencing Phred value (Qphred) of: when the Phred score is 20, the correct recognition rate of the base is 99%, and the Q-score is Q20; when the Phred score was 30, the correct recognition rate of the base was 99.9%, and Q-score was Q30. Quality assessment of sequencing yield data (CLEAN DATA) for all samples found CleanQ to be greater than 97.08% and CleanQ to be greater than 91.63%. In order to reflect the stability of the sequencing quality during the sequencing process, the base position of CLEAN READS is taken as an abscissa, and the average sequencing quality value of each position is taken as an ordinate, so that a sample sequencing quality distribution map (figure 1) is obtained. As can be seen from FIG. 1, the average homogeneity value is greater than 30, indicating higher stability of the sequencing quality.
(2) Sequencing the base content of the data. The corresponding base content distribution map was obtained with the base position in CLEAN READS as the abscissa and the proportion of ATCGN bases at each position as the ordinate (FIG. 2). Normally, the A and T bases and the G and C bases should be in equal proportion on each sequencing cycle, based on the base complementary pairing principle and the randomness of the sequencing. However, due to random primer amplification bias and the like, the front ten bases of each read obtained by sequencing have larger fluctuation and then tend to be stable.
(3) And (5) detecting pollution of sequencing data. Randomly selected 10,000 sequences from fastq files for each sample were evaluated for contamination with the NCBI NT database using blastn (Table 1). The comparison result shows that all sample sequences have no obvious pollution condition of other species.
TABLE 1 sequencing data contamination detection results
5. GVCF of each sample was obtained, joint-calling was performed using Sentieon, and gVCF of all samples were subjected to joint analysis to obtain a mutation result for each individual. The data comparison index is shown in table 2. In order to ensure the accuracy of SNP, the SNP loci obtained after the combination analysis are subjected to preliminary filtration (SNP hard filtration standard) to obtain a specific SNP locus database of Eriocheir sinensis. The full genome of TXID95602,95602 Eriocheir sinensis at NCBI was used as the reference genome. CLEAN READS was aligned with the reference genome to assess the quality of the sample, pooling, sequencing, and reference sequences.
Table 2 data alignment index
/>
/>
/>
6. And (5) screening finger print loci.
In mutation detection analysis, the VCF obtained after hard filtration of 46 Eriocheir sinensis samples obtains 61,760,064 SNP loci in total. Filtering according to the site deletion rate of 0 and MAF value of more than or equal to 0.1 to obtain 3,743,841 SNP sites; then carrying out copy number analysis on the sequences of 100bp upstream and downstream of the locus, and reserving the locus of which the sequence upstream and downstream is unique on the genome to obtain 2,808,092 SNP loci; secondly, reserving loci with heterozygosity rate <0.15 and average depth >10, and carrying out total of 20,145 SNP; finally, the screening was performed according to intervals >10Mb and evenly distributed on the chromosome, and 78 SNP loci were obtained (see Table 3). The distribution of 78 SNPs on the chromosome is shown in fig. 3, and all samples can be completely separated based on these 78 sites.
TABLE 3 DNA finger print of Eriocheir sinensis
/>
7. And (5) constructing a phylogenetic tree.
Based on the 78 SNP sites in table 3, genetic distances between samples were calculated using Plink software and a phylogenetic tree was constructed (fig. 4). The method can distinguish the difference degree of samples according to the distance of the genetic relationship, and can intuitively identify the eriocheir sinensis in different water areas by utilizing the constructed phylogenetic tree to cluster the water areas according to the genetic relationship.
8. Constructing DNA finger print of Eriocheir sinensis in different water areas.
And constructing DNA finger print of Eriocheir sinensis in Yangtze river and Liaohe river according to the 78 specific SNP loci. The results are shown in tables 4 and 5.
TABLE 4 DNA fingerprint of Eriocheir sinensis in Yangtze river area
/>
TABLE 5 DNA fingerprint of Eriocheir sinensis in Liaoning river water area
/>
In the embodiment, 78 specific SNP loci are finally selected, and the DNA fingerprint of Eriocheir sinensis, the DNA fingerprint of Yangtze river area and Liaohe river area are summarized by utilizing genotypes of the 78 SNP loci.
Example 2
In order to verify the accuracy of the eriocheir sinensis DNA fingerprint, germplasm resources of 30 eriocheir sinensis randomly selected were identified using 78 specific SNP loci in example 1.
Randomly collecting 20 Eriocheir sinensis crabs from the Yangtze river, and recording as A-1-A-20; 10 eriocheir sinensis crabs are randomly collected from the Liaohe water area and are marked as B-1-B-10. DNA was extracted from muscle tissue, and after quality control was confirmed, the DNA was sequenced on the machine as in example 1. The 78 high polymorphism SNP loci at the same position are compared with the eriocheir sinensis DNA fingerprint, and the coincidence rate reaches 99.49%. The genotypes of the 78 SNP loci are subjected to cluster analysis, so that the eriocheir sinensis populations in different water areas can be identified. The clustering result is shown in figure 5, A-1-A-20 is gathered into one branch, B-1-B-10 is gathered into one branch, and the Eriocheir sinensis groups in two large water areas are successfully identified. The eriocheir sinensis DNA fingerprint can effectively identify eriocheir sinensis groups in the Yangtze river and the Liaohe river, and the accuracy is high.
Claims (10)
1. The eriocheir sinensis DNA fingerprint is characterized by comprising 78 SNP loci, wherein the SNP loci and specific nucleotides are as follows:
。
2. The eriocheir sinensis DNA fingerprint is characterized by preferably comprising a eriocheir sinensis DNA fingerprint in the Yangtze river and a eriocheir sinensis DNA fingerprint in the Liaohe river, wherein the two DNA fingerprints comprise 78 SNP sites, and the SNP sites and specific nucleotides of the eriocheir sinensis DNA fingerprint in the Yangtze river are as follows:
The SNP locus and specific nucleotide of the Eriocheir sinensis DNA fingerprint in the Liaoning river water area are as follows:
3. A method for constructing a eriocheir sinensis DNA fingerprint according to claim 1 or 2, comprising the steps of:
(1) Collecting Eriocheir sinensis in Yangtze river and Liaohe water;
(2) Extracting DNA of eriocheir sinensis muscle tissues;
(3) Carrying out resequencing on the DNA fragments with qualified quality control on a machine;
(4) Filtering the obtained sequencing data, detecting pollution of the sequencing data and evaluating the quality of the sequencing data;
(5) Adopting Sentieon software to compare reads of each sample to a reference genome and performing mutation detection to construct a specific SNP locus database of Eriocheir sinensis;
(6) And further filtering according to the deficiency rate, MAF value, single copy, site heterozygosity and depth, and screening high polymorphism SNP sites for constructing DNA fingerprint.
4. The method for constructing the eriocheir sinensis DNA fingerprint spectrum according to claim 3, wherein the specific step of mechanically re-sequencing the quality-controlled qualified DNA fragment in the step (3) is as follows: the DNA sample is subjected to enzyme slicing and segmentation, a sequencing joint and segmented DNA are linked together by using ligase, PCR amplification is carried out on the connection product, the PCR product is subjected to fragment screening by using magnetic beads, then, the linear library is denatured into single chains, cyclization is carried out to form a single-chain annular library, DNA Nanospheres (DNB) are formed through rolling circle replication, finally, DNB is loaded into a sequencing chip by using loading equipment MGIDL-T7, and re-sequencing is carried out by combining a probe-anchored polymerization technology.
5. The method for constructing the eriocheir sinensis DNA fingerprint according to claim 3, wherein the filtering, the sequencing data pollution detection and the sequencing data quality evaluation of the sequencing data obtained in the step (4) comprise the summarization of sequencing data and quality indexes, the base content of the sequencing data and the sequencing data pollution detection.
6. The method for constructing a DNA fingerprint of Eriocheir sinensis according to claim 3, wherein the reference genome in the step (5) is TXID95602 Eriocheir sinensis whole genome at NCBI.
7. The method for constructing a DNA fingerprint of Eriocheir sinensis according to claim 3, wherein the mutation detection is performed in the step (5) to obtain gVCF of each sample, joint-calling is performed by Sentieon, gVCF of all samples are subjected to joint analysis to obtain mutation results of each individual, and the SNP loci obtained after joint analysis are subjected to preliminary filtration (SNP hard filtration standard: QD < 2.0|FS > 60.0|MQ < 40.0|SOR > 3.0| MQRankSum < -12.5| ReadPosRankSum < -8.0) to obtain a specific SNP locus database of Eriocheir sinensis.
8. The method for constructing a eriocheir sinensis DNA fingerprint according to claim 3, wherein the screening standard in the step (6) is that the deletion rate is 0; MAF value is more than or equal to 0.1; extracting sequences 100bp upstream and downstream of the locus to perform copy number analysis, and reserving the locus of the upstream and downstream sequences which is unique on the genome; site heterozygosity <0.15uolv and site depth >10 was further filtered.
9. An application of the eriocheir sinensis DNA fingerprint in identifying germplasm resources of different water areas of eriocheir sinensis.
10. The application according to claim 9, wherein the process of the application is:
(1) Extracting DNA from a Eriocheir sinensis sample to be detected and resequencing;
(2) Performing quality control on the test data, performing preliminary filtration according to SNP hard filtration standards, further filtering according to the deletion rate, the MAF value, single copy, site heterozygosity and depth, and finally uniformly distributing screening sites on chromosomes according to intervals of more than 10Mb to obtain SNP sites capable of completely separating all samples;
(3) Calculating genetic distances among samples and constructing a phylogenetic tree by using Plink software for the screened SNP loci with high polymorphism;
(4) Comparing the screened SNP loci with DNA fingerprints of the eriocheir sinensis in different water areas, and determining that the sample to be detected is a eriocheir sinensis group when the coincidence rate is more than or equal to 95%; and carrying out cluster analysis on genotypes of the SNP loci with high polymorphism, and judging that the Eriocheir sinensis in the unknown water area belongs to the Liaohe water area group or the Yangtze river water area group according to the cluster condition of the phylogenetic tree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410435641.3A CN118086534A (en) | 2024-04-11 | 2024-04-11 | Eriocheir sinensis DNA fingerprint and construction method and application thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410435641.3A CN118086534A (en) | 2024-04-11 | 2024-04-11 | Eriocheir sinensis DNA fingerprint and construction method and application thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN118086534A true CN118086534A (en) | 2024-05-28 |
Family
ID=91153338
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410435641.3A Pending CN118086534A (en) | 2024-04-11 | 2024-04-11 | Eriocheir sinensis DNA fingerprint and construction method and application thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118086534A (en) |
-
2024
- 2024-04-11 CN CN202410435641.3A patent/CN118086534A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11578365B2 (en) | Chicken whole-genome SNP chip and use thereof | |
US20080082273A1 (en) | Computer algorithm for automatic allele determination from fluorometer genotyping device | |
CN105506111B (en) | Method for detecting CNV (CNV) marker of MAPK10 gene of Nanyang cattle and application of CNV marker | |
CN115198023B (en) | Hainan cattle liquid-phase breeding chip and application thereof | |
CN111778353B (en) | SNP molecular marker for identifying common wheat variety and SNP molecular marker detection method | |
CN102618630A (en) | Application of Y-STR (Y chromosome-short tandem repeat) | |
CN109321665B (en) | Method for screening molecular markers of Jinhu black-bone chicken and application thereof | |
CN107217091A (en) | A kind of detection method of milch goat Fecundity Trait related gene SNP | |
CN111088327B (en) | Method for detecting cattle body size characters under assistance of SIKE1 gene CNV marker and application thereof | |
CN114921572B (en) | SNP molecular marker for identifying Taihe black-bone chicken variety and application thereof | |
CN118086534A (en) | Eriocheir sinensis DNA fingerprint and construction method and application thereof | |
CN115927731A (en) | SNP (Single nucleotide polymorphism) site combination for constructing litchi SNP fingerprint, application and identification method | |
CN107885972A (en) | It is a kind of based on the fusion detection method of single-ended sequencing and its application | |
CN116144794A (en) | Bovine 12K SV liquid phase chip and design method and application thereof | |
CN114530200A (en) | Mixed sample identification method based on calculation of SNP entropy | |
CN104573409B (en) | The multiple check method of the assignment of genes gene mapping | |
CN108304693B (en) | Method for analyzing gene fusion by using high-throughput sequencing data | |
CN112359120A (en) | Method for detecting cattle MFN1 gene CNV marker and application thereof | |
CN105543235B (en) | Gene and its application | |
CN116516024A (en) | DNA fingerprint of pelteobagrus fulvidraco and application thereof | |
Wainer-Katsir et al. | BIRD: identifying cell doublets via biallelic expression from single cells | |
CN118064607A (en) | DNA fingerprint of leiocassis longirostris, construction method and application thereof | |
CN117587159B (en) | Chilli SNP molecular marker combination, SNP chip and application thereof | |
CN116590435B (en) | Causal candidate gene related to pig backfat thickness and identification method and application thereof | |
CN115948521B (en) | Method for detecting aneuploidy deletion chromosome information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination |