CN111154849B - Method for identifying size and ploidy of Acipenser dabryanus genome - Google Patents
Method for identifying size and ploidy of Acipenser dabryanus genome Download PDFInfo
- Publication number
- CN111154849B CN111154849B CN202010136936.2A CN202010136936A CN111154849B CN 111154849 B CN111154849 B CN 111154849B CN 202010136936 A CN202010136936 A CN 202010136936A CN 111154849 B CN111154849 B CN 111154849B
- Authority
- CN
- China
- Prior art keywords
- mer
- genome
- size
- depth
- ploidy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A40/00—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
- Y02A40/80—Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
- Y02A40/81—Aquaculture, e.g. of fish
Abstract
The invention discloses a method for identifying the size and ploidy of a Acipenser dabryanus genome, which comprises the steps of carrying out end repair, A tail adding, sequencing joint adding, purification and PCR amplification on the total DNA of the Acipenser dabryanus to complete the preparation of a whole library, carrying out PE sequencing, filtering the obtained original data to obtain effective data, estimating the size of the genome by adopting an analysis method based on a K-mer, and calculating the heterozygosity rate. The method of the invention provides a reference scheme for identifying the size and ploidy of the polyploid species genome.
Description
Technical Field
The invention belongs to the technical field of biological detection, and particularly relates to a method for identifying the size and ploidy of a Acipenser dabryanus genome.
Background
The acipenser dabryanus is a fresh water colonizing fish, usually moves in the middle and lower layers of rivers, is fond of inhabiting in sandy bottom or gulf or deep Tuo of a pebble moraine dam rich in saprophyte and benthic organisms with slow flow speed, has high growth speed, generally has the body length of 0.8-1.0 meter and the body weight of 5-10 kilograms, and is used for protecting animals at the first level in China.
Zhang Siming et al, 1999, have used cellular DNA content determination methods to predict the chromosome number of Acipenser dabryanus. The somatic genome size (DNA content) of Acipenser dabryanus is determined by adopting a Feulgen-microspectrophotometer method and taking chicken red blood cells as standard DNA (3.22pg 2/c). The result shows that the DNA content of the Acipenser dabryanus is 8.26pg, and the Acipenser dabryanus is preliminarily judged to be an octaploid species. And then Rajkov et al (2014) utilize the principle of microsatellite co-dominant inheritance mode to evaluate the ploidy of the acipenser dabryanus by calculating the number of alleles at different loci of an individual. Rajkov et al analyzed the polyploidy of 9 Acipenser dabryanus individuals by 20 locus alleles using microsatellite genetic markers, and considered that they should belong to tetraploid species.
The two modes of cell DNA content measurement and microsatellite measurement have corresponding limitations, so that the research results are inconsistent. For cellular DNA assay, standard DNA is human lymphocytes, chicken erythrocytes, etc. Since chicken erythrocytes have different absolute DNA contents, the measurement results are not always the same for the same fish, and the results vary somewhat. In addition, the acipenser dabryanus is a polyploid species, and on one hand, the greater nucleus of the acipenser dabryanus can cause uneven nuclear mass distribution; on the other hand, sturgeons have minichromosomes which account for 1/2 to 1/3 of the total number of macrochromosomes, but are also in a doubling relationship like macrochromosomes, and have not been studied thoroughly, so that the accuracy of ploidy identification of polyploid species by cellular DNA content measurement is not high.
Methods for identifying ploidy of species using microsatellite markers remain to be questioned. On the one hand, there are many repetitive regions on the animal chromosome, and if the microsatellite used is exactly in the repetitive region of the chromosome, the result of ploidy determination is doubled. On the other hand, the number of markers used in the identification of ploidy using microsatellite markers is very limited, and as a result, there is a certain chance and the true ploidy of a species cannot be accurately reflected.
Disclosure of Invention
The invention aims to provide a method for identifying the size and the ploidy of the Acipenser dabryanus genome, which identifies the size of the Acipenser dabryanus genome from the whole genome level and can obtain the accurate size and the ploidy of the genome.
The invention is realized by the following technical scheme:
a method for identifying the size and ploidy of a Acipenser dabryanus genome comprises the following steps:
1) Carrying out total DNA extraction on the Acipenser dabryanus muscle sample;
2) Randomly breaking the sample DNA into fragments with the length of 350bp, and completing the preparation of the whole library through end repair, tail A addition, sequencing joint addition, purification and PCR amplification;
3) Performing PE sequencing on the constructed library, and filtering the obtained original data to obtain effective data;
4) Estimating genome size using a K-mer based analysis method;
5) Calculating the heterozygosity rate by the following formula;
wherein, a 1/2 Is the percentage of the number of heterozygous K-mer species, n kspecies Is the number of all K-mer species, and K is the length of the K-mer.
Further, the filtering specifically comprises the following steps:
1) Filtering out reads containing the linker sequence;
2) When the content of N contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, removing the pair of paired reads;
3) The pair of paired reads are removed when the number of low mass (less than 5) bases contained in the single ended sequencing read exceeds 20% of the length proportion of the read.
Further, in the step (4), it is assumed that all K-mers taken out from reads base by base can traverse the whole genome, and the K-mer depth frequency distribution obeys Poisson distribution, that is, the K-mer frequency distribution can be counted from all sequencing reads, and a K-mer depth estimation value is obtained through calculation and used for estimating the genome size.
The genome calculation formula is as follows: genome size = K-mer number/depth, where K-mer number represents the total number of K-mers obtained using Soapdenov software and depth represents K-mer depth.
The invention has the beneficial effects that:
the detection results from sequencing data on Acipenser dabryanus genome 468.89G: the size of the eight acipenser dabryanus genome is about 8.25Gbp, the corrected genome size is 8.09Gbp, the heterozygosis rate of the genome is 0.23%, and the proportion of the repetitive sequence is 92.47%. Namely, the genome size of the Acipenser dabryanus is 8.09Gbp, and the ploidy is octaploid. The research method provides a reference scheme for identifying the size and the ploidy of the polyploid species genome. Meanwhile, the detection result provides basic information for the subsequent research of the evolution, classification status and chromosome karyotype of the Acipenser dabryanus; on the other hand, the surfy result of the Acipenser dabryanus genome lays a foundation for mastering and utilizing potential gene resources of the Acipenser dabryanus, and clarifying functional genes, biosynthetic pathways and regulation and control mechanisms of the Acipenser dabryanus, so that targeted genetic improvement of the Acipenser dabryanus is realized by utilizing a molecular biology means.
Drawings
FIG. 1 is an agarose gel electrophoresis of a DNA sample of the present invention; wherein M-1 is Trans 2K plus, and 2 mu l of sample is loaded; m-2 is Trans 15K plus, and 2 μ l of sample is loaded; s is 5. Mu.l (10 ng/. Mu.l) of standard load; 1 is diluted 2 times of stock solution and is loaded with 1 mu l of sample;
FIG. 2 is a plot of the number of Kmer =19Depth and K-mers of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to specific embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a method for identifying the size and ploidy of a Acipenser dabryanus genome, which specifically comprises the following steps:
(1) And (3) carrying out total DNA extraction on the Acipenser dabryanus muscle sample.
The DNA sample of the Acipenser dabryanus is extracted by adopting a DNA extraction kit, the integrity and purity of the extracted sample DNA are checked by quality control agarose gel electrophoresis, and the concentration of the extracted sample DNA is detected by a Qubit Fluorometer. The specific DNA detection results are shown in Table 1, and FIG. 1 shows that the DNA main band is above 30K, the sample quality meets the requirements of library construction and sequencing, and the total amount meets the requirements of 2 times or more than 2 times of library construction.
TABLE 1 summary of DNA test results
Sample numbering | The concentration of Qubit (ng/ul) | Volume (ul) | Total volume (ug) |
AD | 98 | 100 | 9.8 |
(2) And randomly breaking a qualified DNA sample into fragments with the length of 350bp by a Covaris ultrasonic disrupter, and completing the preparation of the whole library by the steps of end repair, tail A addition, sequencing joint addition, purification, PCR amplification and the like. The constructed library was subjected to PE sequencing by Illumina Hiseq, and the sequencing results are shown in Table 2.
(3) Filtering the original data obtained by sequencing by the following steps to obtain clean data:
a. the reads containing the linker sequence need to be filtered out;
b. when the content of N contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, the pair of paired reads needs to be removed;
c. this pair of paired reads needs to be removed when the number of low mass (less than 5) bases contained in the single ended sequencing read exceeds 20% of the length proportion of the read.
The obtained high quality data is used for subsequent information analysis of genome size, heterozygosity, GC content and the like, and specific quality control results are shown in Table 2.
TABLE 2 sequencing data yield statistics
(4) Heterozygosity rate and repeat sequence prediction
For a hybrid genome, all K-mers can be divided into two categories: heterozygous K-mers and homozygous K-mers. The heterozygous sites are scattered throughout the genome relative to the entire genome. For each heterozygous locus, there are 2 xK heterozygous K-mers covered, so the expected depth of the heterozygous K-mer is C/2 compared to the expected depth of the homozygous K-mer. Under this assumption, a can pass through 1/2 (percentage of the number of heterozygous K-mer species is a) 1/2 Percentage of homozygous K-mer species a 1 ) And n kspecies (number of species of all K-mers) the number of heterozygous loci and the size of the genome were estimated, and the heterozygous ratio was calculated from equation (1).
(5) K-mer analysis of genome size
The genome calculation formula is as follows: genome size = (K-mer number)/depth, wherein K-mer number represents the total number of kmers obtained by Soapdenov software, and depth represents the K-mer depth. Specific acipenser dabryanus genomic sequencing data statistics are shown in table 3.
TABLE 3 statistical conditions of genomic characteristics of Acipenser dabryanus
Wherein, 1) K-mer: analyzing the size of the selected k-mer; 2) K-mer number: the total number of k-mers is obtained by adopting Soapdenovo software; 3) K-mer Depth: k-mer depth, which is the expected value corresponding to the poisson distribution; 4) Genome size: calculating the obtained genome size, namely K-mer number/K-mer Depth, and taking M (million) as a unit; 5) Revise Genome size: the corrected genome size eliminates the error influence caused by wrong k-mers, and takes M (million) as a unit; 6) Heteozygoos ratio: the heterozygosity is calculated according to the formula (1) in the step (4); (7) Repeat: the repetition rate obtained by k-mer analysis was calculated as the percentage of the total number of k-mers 1.8 times after the main peak to all k-mers.
FIG. 2 shows the horizontal axis represents the depth of k-mers, i.e., the number of occurrences of k-mers, and the vertical axis represents the number of k-mers corresponding to the depth; based on the results of the surveyy analysis, there are 4 more distinct peaks in the k-mer analysis plot of FIG. 2.
When the first peak of depth =43 is estimated, the size of the complete set of genome obtained by (K-mer number)/depth is about 8.25Gbp, the corrected genome size is 8.09Gbp, the heterozygosis rate of the genome is 0.23%, and the proportion of the repetitive sequence is 92.47%. Namely, the genome size of the Acipenser dabryanus is 8.09Gbp, and the ploidy is octaploid.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (2)
1. A method for identifying the size and ploidy of a Acipenser dabryanus genome is characterized by comprising the following steps:
1) Carrying out total DNA extraction on the Acipenser dabryanus muscle sample;
2) Randomly breaking the sample DNA into fragments with the length of 350bp, and completing the preparation of the whole library through end repair, tail A addition, sequencing joint addition, purification and PCR amplification;
3) Performing PE sequencing on the constructed library, and filtering the obtained original data to obtain effective data;
4) Estimating genome size with a K-mer based analysis method, genome size = K-mer number/depth;
wherein the K-mer number represents the total number of the K-mers obtained by adopting Soapdenovo software, and depth represents the depth of the K-mers;
5) Calculating the heterozygosity rate by the following formula;
wherein, a 1/2 Is the percentage of the number of heterozygous K-mer species, n kspecies Is the number of all K-mer types, and K is the length of the K-mer;
in the step 4), it is assumed that all K-mers taken out from reads base by base can traverse the whole genome, and the K-mer depth frequency distribution obeys Poisson distribution, that is, the K-mer frequency distribution can be counted from all sequencing reads, and a K-mer depth estimation value is obtained through calculation and used for estimating the genome size.
2. The method for identifying the genomic size and ploidy of Acipenser dabryanus according to claim 1, wherein the filtering specifically comprises the following steps:
1) Filtering out reads containing the linker sequence;
2) When the content of N base contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, removing the pair of paired reads;
3) The pair of paired reads are removed when the number of low quality bases in the single ended sequencing read below 5 exceeds 20% of the length proportion of the read.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010136936.2A CN111154849B (en) | 2020-03-02 | 2020-03-02 | Method for identifying size and ploidy of Acipenser dabryanus genome |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010136936.2A CN111154849B (en) | 2020-03-02 | 2020-03-02 | Method for identifying size and ploidy of Acipenser dabryanus genome |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111154849A CN111154849A (en) | 2020-05-15 |
CN111154849B true CN111154849B (en) | 2023-04-07 |
Family
ID=70566825
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010136936.2A Active CN111154849B (en) | 2020-03-02 | 2020-03-02 | Method for identifying size and ploidy of Acipenser dabryanus genome |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111154849B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106434949A (en) * | 2016-10-26 | 2017-02-22 | 四川省农业科学院水产研究所 | Acipenser dabryanus microsatellite marker as well as screening method and application of acipenser dabryanus microsatellite molecular marker |
CN107153777A (en) * | 2017-05-03 | 2017-09-12 | 武汉菲沙基因信息有限公司 | A kind of method for the diplodization degree for estimating tetraploid species gene group |
-
2020
- 2020-03-02 CN CN202010136936.2A patent/CN111154849B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106434949A (en) * | 2016-10-26 | 2017-02-22 | 四川省农业科学院水产研究所 | Acipenser dabryanus microsatellite marker as well as screening method and application of acipenser dabryanus microsatellite molecular marker |
CN107153777A (en) * | 2017-05-03 | 2017-09-12 | 武汉菲沙基因信息有限公司 | A kind of method for the diplodization degree for estimating tetraploid species gene group |
Non-Patent Citations (2)
Title |
---|
Huamei Yue等.Sequencing and De Novo Assembly of the Gonadal Transcriptome of the Endangered Chinese Sturgeon(Acipenser sinensis).《PLOS ONE》.2015,第1-22页. * |
张小敏等.利用微卫星遗传标记探讨达氏鳇的多倍体倍性.《动物学杂志》.2013,第48卷(第4期),第507-512页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111154849A (en) | 2020-05-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111705140B (en) | SNPs molecular marker related to weight traits and application thereof in Hu sheep assisted breeding | |
CN112342302B (en) | Method for identifying candidate gene markers of milk production traits of buffalos and application | |
CN110951889B (en) | Haplotype molecular marker related to chicken body weight and application thereof | |
Zhang et al. | Human-mediated introgression of haplotypes in a modern dairy cattle breed | |
CN110289048A (en) | QTL relevant to buffalo milk production trait and its screening technique and application | |
CN110438242B (en) | Portunus trituberculatus microsatellite marked primer and application thereof | |
CN116516029A (en) | Golden pomfret whole genome breeding chip and application | |
Karimi et al. | Linkage disequilibrium, effective population size and genomic inbreeding rates in American mink using genotyping-by-sequencing data | |
EP3775279B1 (en) | Methods involving nucleic acid analysis of milk | |
CN113789394B (en) | Molecular marker C13 for identifying ammonia nitrogen tolerance character of portunus trituberculatus and application thereof | |
CN106755371A (en) | Method and its application using PCR RFLP detection sheep PCNP gene mononucleotide polymorphisms | |
CN111154849B (en) | Method for identifying size and ploidy of Acipenser dabryanus genome | |
CN107868832A (en) | The SNP marker related to the multiple economic characters of pig and its application | |
CN105671189A (en) | Molecular breeding method based on single nucleotide polymorphism of cattle Angpt18 genes | |
CN106701930B (en) | Method for detecting sheep FTH-1 gene insertion deletion polymorphism by using PCR-SSCP (polymerase chain reaction-single strand conformation polymorphism) and application thereof | |
CN105543362B (en) | Detection method and molecular breeding method for single nucleotide polymorphism of cattle PPAR β gene | |
CN103789407A (en) | Method for quickly detecting polymorphism of cattle APOA2 gene and application thereof | |
CN110628916B (en) | Composite molecular marker for detecting lean meat type meat duck and application thereof | |
Timoshkina et al. | Molecular-genetic markers in studies of intra-and interspecies polymorphism in sturgeon (acipenseriformes) | |
CN110452990B (en) | SNP molecular marker for selecting laying rate of hens in later laying period and application thereof | |
CN113186299A (en) | Trachinotus ovatus cryptocaryon irritans disease associated SNP molecular marker, primer and application thereof | |
CN111926091A (en) | Method for identifying relationship of black bear in northeast China by using microsatellite markers | |
CN115961035B (en) | Molecular marker for detecting susceptibility to cervical cancer, kit and application | |
CN113930518B (en) | Molecular marker C49 for identifying ammonia nitrogen tolerance character of portunus trituberculatus and application thereof | |
CN113897443B (en) | SNP molecular marker related to milk fat percentage of southern Holstein cows, kit and application and breeding method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |