CN111154849B - Method for identifying size and ploidy of Acipenser dabryanus genome - Google Patents

Method for identifying size and ploidy of Acipenser dabryanus genome Download PDF

Info

Publication number
CN111154849B
CN111154849B CN202010136936.2A CN202010136936A CN111154849B CN 111154849 B CN111154849 B CN 111154849B CN 202010136936 A CN202010136936 A CN 202010136936A CN 111154849 B CN111154849 B CN 111154849B
Authority
CN
China
Prior art keywords
mer
genome
size
depth
ploidy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010136936.2A
Other languages
Chinese (zh)
Other versions
CN111154849A (en
Inventor
陈叶雨
龚全
赖见生
刘亚
宋明江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute Of Aquaculture Sichuan Academy Of Agricultural Sciences
Original Assignee
Institute Of Aquaculture Sichuan Academy Of Agricultural Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute Of Aquaculture Sichuan Academy Of Agricultural Sciences filed Critical Institute Of Aquaculture Sichuan Academy Of Agricultural Sciences
Priority to CN202010136936.2A priority Critical patent/CN111154849B/en
Publication of CN111154849A publication Critical patent/CN111154849A/en
Application granted granted Critical
Publication of CN111154849B publication Critical patent/CN111154849B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/80Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in fisheries management
    • Y02A40/81Aquaculture, e.g. of fish

Abstract

The invention discloses a method for identifying the size and ploidy of a Acipenser dabryanus genome, which comprises the steps of carrying out end repair, A tail adding, sequencing joint adding, purification and PCR amplification on the total DNA of the Acipenser dabryanus to complete the preparation of a whole library, carrying out PE sequencing, filtering the obtained original data to obtain effective data, estimating the size of the genome by adopting an analysis method based on a K-mer, and calculating the heterozygosity rate. The method of the invention provides a reference scheme for identifying the size and ploidy of the polyploid species genome.

Description

Method for identifying size and ploidy of Acipenser dabryanus genome
Technical Field
The invention belongs to the technical field of biological detection, and particularly relates to a method for identifying the size and ploidy of a Acipenser dabryanus genome.
Background
The acipenser dabryanus is a fresh water colonizing fish, usually moves in the middle and lower layers of rivers, is fond of inhabiting in sandy bottom or gulf or deep Tuo of a pebble moraine dam rich in saprophyte and benthic organisms with slow flow speed, has high growth speed, generally has the body length of 0.8-1.0 meter and the body weight of 5-10 kilograms, and is used for protecting animals at the first level in China.
Zhang Siming et al, 1999, have used cellular DNA content determination methods to predict the chromosome number of Acipenser dabryanus. The somatic genome size (DNA content) of Acipenser dabryanus is determined by adopting a Feulgen-microspectrophotometer method and taking chicken red blood cells as standard DNA (3.22pg 2/c). The result shows that the DNA content of the Acipenser dabryanus is 8.26pg, and the Acipenser dabryanus is preliminarily judged to be an octaploid species. And then Rajkov et al (2014) utilize the principle of microsatellite co-dominant inheritance mode to evaluate the ploidy of the acipenser dabryanus by calculating the number of alleles at different loci of an individual. Rajkov et al analyzed the polyploidy of 9 Acipenser dabryanus individuals by 20 locus alleles using microsatellite genetic markers, and considered that they should belong to tetraploid species.
The two modes of cell DNA content measurement and microsatellite measurement have corresponding limitations, so that the research results are inconsistent. For cellular DNA assay, standard DNA is human lymphocytes, chicken erythrocytes, etc. Since chicken erythrocytes have different absolute DNA contents, the measurement results are not always the same for the same fish, and the results vary somewhat. In addition, the acipenser dabryanus is a polyploid species, and on one hand, the greater nucleus of the acipenser dabryanus can cause uneven nuclear mass distribution; on the other hand, sturgeons have minichromosomes which account for 1/2 to 1/3 of the total number of macrochromosomes, but are also in a doubling relationship like macrochromosomes, and have not been studied thoroughly, so that the accuracy of ploidy identification of polyploid species by cellular DNA content measurement is not high.
Methods for identifying ploidy of species using microsatellite markers remain to be questioned. On the one hand, there are many repetitive regions on the animal chromosome, and if the microsatellite used is exactly in the repetitive region of the chromosome, the result of ploidy determination is doubled. On the other hand, the number of markers used in the identification of ploidy using microsatellite markers is very limited, and as a result, there is a certain chance and the true ploidy of a species cannot be accurately reflected.
Disclosure of Invention
The invention aims to provide a method for identifying the size and the ploidy of the Acipenser dabryanus genome, which identifies the size of the Acipenser dabryanus genome from the whole genome level and can obtain the accurate size and the ploidy of the genome.
The invention is realized by the following technical scheme:
a method for identifying the size and ploidy of a Acipenser dabryanus genome comprises the following steps:
1) Carrying out total DNA extraction on the Acipenser dabryanus muscle sample;
2) Randomly breaking the sample DNA into fragments with the length of 350bp, and completing the preparation of the whole library through end repair, tail A addition, sequencing joint addition, purification and PCR amplification;
3) Performing PE sequencing on the constructed library, and filtering the obtained original data to obtain effective data;
4) Estimating genome size using a K-mer based analysis method;
5) Calculating the heterozygosity rate by the following formula;
Figure BDA0002397651900000031
wherein, a 1/2 Is the percentage of the number of heterozygous K-mer species, n kspecies Is the number of all K-mer species, and K is the length of the K-mer.
Further, the filtering specifically comprises the following steps:
1) Filtering out reads containing the linker sequence;
2) When the content of N contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, removing the pair of paired reads;
3) The pair of paired reads are removed when the number of low mass (less than 5) bases contained in the single ended sequencing read exceeds 20% of the length proportion of the read.
Further, in the step (4), it is assumed that all K-mers taken out from reads base by base can traverse the whole genome, and the K-mer depth frequency distribution obeys Poisson distribution, that is, the K-mer frequency distribution can be counted from all sequencing reads, and a K-mer depth estimation value is obtained through calculation and used for estimating the genome size.
The genome calculation formula is as follows: genome size = K-mer number/depth, where K-mer number represents the total number of K-mers obtained using Soapdenov software and depth represents K-mer depth.
The invention has the beneficial effects that:
the detection results from sequencing data on Acipenser dabryanus genome 468.89G: the size of the eight acipenser dabryanus genome is about 8.25Gbp, the corrected genome size is 8.09Gbp, the heterozygosis rate of the genome is 0.23%, and the proportion of the repetitive sequence is 92.47%. Namely, the genome size of the Acipenser dabryanus is 8.09Gbp, and the ploidy is octaploid. The research method provides a reference scheme for identifying the size and the ploidy of the polyploid species genome. Meanwhile, the detection result provides basic information for the subsequent research of the evolution, classification status and chromosome karyotype of the Acipenser dabryanus; on the other hand, the surfy result of the Acipenser dabryanus genome lays a foundation for mastering and utilizing potential gene resources of the Acipenser dabryanus, and clarifying functional genes, biosynthetic pathways and regulation and control mechanisms of the Acipenser dabryanus, so that targeted genetic improvement of the Acipenser dabryanus is realized by utilizing a molecular biology means.
Drawings
FIG. 1 is an agarose gel electrophoresis of a DNA sample of the present invention; wherein M-1 is Trans 2K plus, and 2 mu l of sample is loaded; m-2 is Trans 15K plus, and 2 μ l of sample is loaded; s is 5. Mu.l (10 ng/. Mu.l) of standard load; 1 is diluted 2 times of stock solution and is loaded with 1 mu l of sample;
FIG. 2 is a plot of the number of Kmer =19Depth and K-mers of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to specific embodiments of the present invention, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a method for identifying the size and ploidy of a Acipenser dabryanus genome, which specifically comprises the following steps:
(1) And (3) carrying out total DNA extraction on the Acipenser dabryanus muscle sample.
The DNA sample of the Acipenser dabryanus is extracted by adopting a DNA extraction kit, the integrity and purity of the extracted sample DNA are checked by quality control agarose gel electrophoresis, and the concentration of the extracted sample DNA is detected by a Qubit Fluorometer. The specific DNA detection results are shown in Table 1, and FIG. 1 shows that the DNA main band is above 30K, the sample quality meets the requirements of library construction and sequencing, and the total amount meets the requirements of 2 times or more than 2 times of library construction.
TABLE 1 summary of DNA test results
Sample numbering The concentration of Qubit (ng/ul) Volume (ul) Total volume (ug)
AD 98 100 9.8
(2) And randomly breaking a qualified DNA sample into fragments with the length of 350bp by a Covaris ultrasonic disrupter, and completing the preparation of the whole library by the steps of end repair, tail A addition, sequencing joint addition, purification, PCR amplification and the like. The constructed library was subjected to PE sequencing by Illumina Hiseq, and the sequencing results are shown in Table 2.
(3) Filtering the original data obtained by sequencing by the following steps to obtain clean data:
a. the reads containing the linker sequence need to be filtered out;
b. when the content of N contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, the pair of paired reads needs to be removed;
c. this pair of paired reads needs to be removed when the number of low mass (less than 5) bases contained in the single ended sequencing read exceeds 20% of the length proportion of the read.
The obtained high quality data is used for subsequent information analysis of genome size, heterozygosity, GC content and the like, and specific quality control results are shown in Table 2.
TABLE 2 sequencing data yield statistics
Figure BDA0002397651900000051
Figure BDA0002397651900000061
(4) Heterozygosity rate and repeat sequence prediction
For a hybrid genome, all K-mers can be divided into two categories: heterozygous K-mers and homozygous K-mers. The heterozygous sites are scattered throughout the genome relative to the entire genome. For each heterozygous locus, there are 2 xK heterozygous K-mers covered, so the expected depth of the heterozygous K-mer is C/2 compared to the expected depth of the homozygous K-mer. Under this assumption, a can pass through 1/2 (percentage of the number of heterozygous K-mer species is a) 1/2 Percentage of homozygous K-mer species a 1 ) And n kspecies (number of species of all K-mers) the number of heterozygous loci and the size of the genome were estimated, and the heterozygous ratio was calculated from equation (1).
Figure BDA0002397651900000062
(5) K-mer analysis of genome size
The genome calculation formula is as follows: genome size = (K-mer number)/depth, wherein K-mer number represents the total number of kmers obtained by Soapdenov software, and depth represents the K-mer depth. Specific acipenser dabryanus genomic sequencing data statistics are shown in table 3.
TABLE 3 statistical conditions of genomic characteristics of Acipenser dabryanus
Figure BDA0002397651900000063
Wherein, 1) K-mer: analyzing the size of the selected k-mer; 2) K-mer number: the total number of k-mers is obtained by adopting Soapdenovo software; 3) K-mer Depth: k-mer depth, which is the expected value corresponding to the poisson distribution; 4) Genome size: calculating the obtained genome size, namely K-mer number/K-mer Depth, and taking M (million) as a unit; 5) Revise Genome size: the corrected genome size eliminates the error influence caused by wrong k-mers, and takes M (million) as a unit; 6) Heteozygoos ratio: the heterozygosity is calculated according to the formula (1) in the step (4); (7) Repeat: the repetition rate obtained by k-mer analysis was calculated as the percentage of the total number of k-mers 1.8 times after the main peak to all k-mers.
FIG. 2 shows the horizontal axis represents the depth of k-mers, i.e., the number of occurrences of k-mers, and the vertical axis represents the number of k-mers corresponding to the depth; based on the results of the surveyy analysis, there are 4 more distinct peaks in the k-mer analysis plot of FIG. 2.
When the first peak of depth =43 is estimated, the size of the complete set of genome obtained by (K-mer number)/depth is about 8.25Gbp, the corrected genome size is 8.09Gbp, the heterozygosis rate of the genome is 0.23%, and the proportion of the repetitive sequence is 92.47%. Namely, the genome size of the Acipenser dabryanus is 8.09Gbp, and the ploidy is octaploid.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (2)

1. A method for identifying the size and ploidy of a Acipenser dabryanus genome is characterized by comprising the following steps:
1) Carrying out total DNA extraction on the Acipenser dabryanus muscle sample;
2) Randomly breaking the sample DNA into fragments with the length of 350bp, and completing the preparation of the whole library through end repair, tail A addition, sequencing joint addition, purification and PCR amplification;
3) Performing PE sequencing on the constructed library, and filtering the obtained original data to obtain effective data;
4) Estimating genome size with a K-mer based analysis method, genome size = K-mer number/depth;
wherein the K-mer number represents the total number of the K-mers obtained by adopting Soapdenovo software, and depth represents the depth of the K-mers;
5) Calculating the heterozygosity rate by the following formula;
Figure QLYQS_1
wherein, a 1/2 Is the percentage of the number of heterozygous K-mer species, n kspecies Is the number of all K-mer types, and K is the length of the K-mer;
in the step 4), it is assumed that all K-mers taken out from reads base by base can traverse the whole genome, and the K-mer depth frequency distribution obeys Poisson distribution, that is, the K-mer frequency distribution can be counted from all sequencing reads, and a K-mer depth estimation value is obtained through calculation and used for estimating the genome size.
2. The method for identifying the genomic size and ploidy of Acipenser dabryanus according to claim 1, wherein the filtering specifically comprises the following steps:
1) Filtering out reads containing the linker sequence;
2) When the content of N base contained in the single-ended sequencing read exceeds 10% of the length proportion of the read, removing the pair of paired reads;
3) The pair of paired reads are removed when the number of low quality bases in the single ended sequencing read below 5 exceeds 20% of the length proportion of the read.
CN202010136936.2A 2020-03-02 2020-03-02 Method for identifying size and ploidy of Acipenser dabryanus genome Active CN111154849B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010136936.2A CN111154849B (en) 2020-03-02 2020-03-02 Method for identifying size and ploidy of Acipenser dabryanus genome

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010136936.2A CN111154849B (en) 2020-03-02 2020-03-02 Method for identifying size and ploidy of Acipenser dabryanus genome

Publications (2)

Publication Number Publication Date
CN111154849A CN111154849A (en) 2020-05-15
CN111154849B true CN111154849B (en) 2023-04-07

Family

ID=70566825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010136936.2A Active CN111154849B (en) 2020-03-02 2020-03-02 Method for identifying size and ploidy of Acipenser dabryanus genome

Country Status (1)

Country Link
CN (1) CN111154849B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106434949A (en) * 2016-10-26 2017-02-22 四川省农业科学院水产研究所 Acipenser dabryanus microsatellite marker as well as screening method and application of acipenser dabryanus microsatellite molecular marker
CN107153777A (en) * 2017-05-03 2017-09-12 武汉菲沙基因信息有限公司 A kind of method for the diplodization degree for estimating tetraploid species gene group

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106434949A (en) * 2016-10-26 2017-02-22 四川省农业科学院水产研究所 Acipenser dabryanus microsatellite marker as well as screening method and application of acipenser dabryanus microsatellite molecular marker
CN107153777A (en) * 2017-05-03 2017-09-12 武汉菲沙基因信息有限公司 A kind of method for the diplodization degree for estimating tetraploid species gene group

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Huamei Yue等.Sequencing and De Novo Assembly of the Gonadal Transcriptome of the Endangered Chinese Sturgeon(Acipenser sinensis).《PLOS ONE》.2015,第1-22页. *
张小敏等.利用微卫星遗传标记探讨达氏鳇的多倍体倍性.《动物学杂志》.2013,第48卷(第4期),第507-512页. *

Also Published As

Publication number Publication date
CN111154849A (en) 2020-05-15

Similar Documents

Publication Publication Date Title
CN111705140B (en) SNPs molecular marker related to weight traits and application thereof in Hu sheep assisted breeding
CN112342302B (en) Method for identifying candidate gene markers of milk production traits of buffalos and application
CN110951889B (en) Haplotype molecular marker related to chicken body weight and application thereof
Zhang et al. Human-mediated introgression of haplotypes in a modern dairy cattle breed
CN110289048A (en) QTL relevant to buffalo milk production trait and its screening technique and application
CN110438242B (en) Portunus trituberculatus microsatellite marked primer and application thereof
CN116516029A (en) Golden pomfret whole genome breeding chip and application
Karimi et al. Linkage disequilibrium, effective population size and genomic inbreeding rates in American mink using genotyping-by-sequencing data
EP3775279B1 (en) Methods involving nucleic acid analysis of milk
CN113789394B (en) Molecular marker C13 for identifying ammonia nitrogen tolerance character of portunus trituberculatus and application thereof
CN106755371A (en) Method and its application using PCR RFLP detection sheep PCNP gene mononucleotide polymorphisms
CN111154849B (en) Method for identifying size and ploidy of Acipenser dabryanus genome
CN107868832A (en) The SNP marker related to the multiple economic characters of pig and its application
CN105671189A (en) Molecular breeding method based on single nucleotide polymorphism of cattle Angpt18 genes
CN106701930B (en) Method for detecting sheep FTH-1 gene insertion deletion polymorphism by using PCR-SSCP (polymerase chain reaction-single strand conformation polymorphism) and application thereof
CN105543362B (en) Detection method and molecular breeding method for single nucleotide polymorphism of cattle PPAR β gene
CN103789407A (en) Method for quickly detecting polymorphism of cattle APOA2 gene and application thereof
CN110628916B (en) Composite molecular marker for detecting lean meat type meat duck and application thereof
Timoshkina et al. Molecular-genetic markers in studies of intra-and interspecies polymorphism in sturgeon (acipenseriformes)
CN110452990B (en) SNP molecular marker for selecting laying rate of hens in later laying period and application thereof
CN113186299A (en) Trachinotus ovatus cryptocaryon irritans disease associated SNP molecular marker, primer and application thereof
CN111926091A (en) Method for identifying relationship of black bear in northeast China by using microsatellite markers
CN115961035B (en) Molecular marker for detecting susceptibility to cervical cancer, kit and application
CN113930518B (en) Molecular marker C49 for identifying ammonia nitrogen tolerance character of portunus trituberculatus and application thereof
CN113897443B (en) SNP molecular marker related to milk fat percentage of southern Holstein cows, kit and application and breeding method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant