CN114255822A - Screening method and application of Pinus massoniana high-fat-production functional SNP marker - Google Patents

Screening method and application of Pinus massoniana high-fat-production functional SNP marker Download PDF

Info

Publication number
CN114255822A
CN114255822A CN202210079030.0A CN202210079030A CN114255822A CN 114255822 A CN114255822 A CN 114255822A CN 202210079030 A CN202210079030 A CN 202210079030A CN 114255822 A CN114255822 A CN 114255822A
Authority
CN
China
Prior art keywords
pinus massoniana
snp
lipid
seq
base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210079030.0A
Other languages
Chinese (zh)
Other versions
CN114255822B (en
Inventor
白青松
何波祥
汪迎利
连辉明
陈杰连
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Academy of Forestry
Original Assignee
Guangdong Academy of Forestry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Academy of Forestry filed Critical Guangdong Academy of Forestry
Priority to CN202210079030.0A priority Critical patent/CN114255822B/en
Publication of CN114255822A publication Critical patent/CN114255822A/en
Application granted granted Critical
Publication of CN114255822B publication Critical patent/CN114255822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/13Plant traits
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biotechnology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Botany (AREA)
  • General Engineering & Computer Science (AREA)
  • Mycology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a screening method of a Pinus massoniana high-lipid-production functional SNP marker and application thereof. The method comprises the following steps: collecting pine resin of clone pinus massoniana, respectively measuring the size of the lipogenic power of the pine resin, collecting tissue materials from a sample of clone pinus massoniana with high, medium and low lipogenic power, extracting RNA, then carrying out PacBio full-length transcription group sequencing, simultaneously carrying out second-generation transcriptome, metabolome and proteome sequencing, carrying out differential expression analysis and screening candidate genes, and developing the SNP functional marker. The method only needs to perform transcriptome and other multigroup sequencing on a few samples, has higher development efficiency of SNP sites and lower development cost, and the developed SNP markers are beneficial to realizing the early selection of the lipid-producing property of the pinus massoniana and can be used in the field of molecular assisted selection breeding.

Description

Screening method and application of Pinus massoniana high-fat-production functional SNP marker
Technical Field
The invention belongs to the technical field of forestry, and particularly relates to a screening method of a masson pine high-yield lipid function SNP marker and application thereof.
Background
The development of target character functional markers is a common method for molecular-assisted breeding of plants and is also an important means for realizing early selection of excellent individuals. The SNP development techniques commonly used today include: simplified genome sequencing based on enzyme digestion (SLAF-seq, GBS, RAD, etc.), genome re-sequencing (genome re-sequencing), etc. These methods usually require collecting a large amount of population genetic resources, after extracting genomic DNA, sequencing all individual DNAs of the population by means of enzyme digestion or random disruption, and developing population SNP molecular markers by means of sequence comparison. Subsequently, the GWAS method is used for developing target trait association markers.
Since SNP development techniques require sequencing of each individual within a population, significant development costs are typically incurred. SNP sites developed by simplified genome sequencing based on enzyme digestion are usually located in non-coding regions, and are not beneficial to excavating functional genes. Whole genome re-sequencing is charged according to genome size and sequencing depth, and has high development cost and wide applicability when facing species with larger genome (usually above 20 GB) such as masson pine. In addition, the association marker obtained by using the GWAS method generally needs to be verified by a colony PCR method, and the false positive rate of the obtained association marker is high.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art and provides a screening method of a Pinus massoniana high-lipid-production functional SNP marker.
The invention also aims to provide application of the screening method of the Pinus massoniana high-lipid-production functional SNP marker.
The purpose of the invention is realized by the following technical scheme:
a screening method of a Pinus massoniana high-lipid-production functional SNP marker comprises the following steps:
(1) selection of materials based on phenotype
Collecting turpentine of m clone pinus massoniana, selecting n plants from each clone pinus massoniana, respectively calculating the lipogenic power of each clone pinus massoniana according to a lipogenic power calculation formula, and calculating the average lipogenic power of each plant; then, respectively taking the plant with the largest lipogenic power of each clone pinus massoniana as a high-lipogenic plant, taking the plant with the smallest lipogenic power as a low-lipogenic plant, and taking the plant with the lipogenic power equal to or close to the average lipogenic power as a medium-lipogenic plant; respectively collecting secondary xylem tissue materials of high, medium and low-lipid-producing plants of each clone pinus massoniana; wherein m is more than or equal to 3, and n is more than or equal to 3; the calculation formula of the fat-producing power RYC is as follows:
Figure BDA0003485177170000011
in the formula: wt represents the total weight of rosin; d represents the number of times each tree cuts resin; wd represents the width of the cutting surface; c represents the trunk circumference where the bark is cut;
(2) screening candidate genes based on multigroup data joint analysis
Respectively carrying out transcriptome and metabolome sequencing on the tissue materials of the high, medium and low-lipid-producing plants of each clone masson pine acquired in the step (1), and simultaneously carrying out PacBio full-length transcriptome sequencing after mixing all the tissue materials; then carrying out differential expression analysis by utilizing multiple groups of mathematical data, and carrying out preliminary screening according to the combined analysis of the transcriptome and the metabolome to obtain candidate genes;
(3) qRT-PCR verification of gene expression level
Respectively extracting RNA of tissue materials of high, medium and low-lipid-producing plants of each clone pinus massoniana, reversely transcribing the RNA into cDNA, verifying the expression quantity of the candidate gene obtained by screening in the step (2) by adopting a qRT-PCR method, and selecting a gene with differential expression for developing SNP sites;
(4) development of SNP sites
Extracting DNA of tissue materials of high-, medium-and low-adipogenic plants of each clone pinus massoniana respectively, then carrying out PCR full-length amplification (gene full-length amplification), directly carrying out mutual comparison of sequences of the high-, medium-and low-adipogenic plants to develop SNP molecular markers, and screening to obtain SNP loci (SNP/Indel allelic variation loci);
(5) development of lipogenic trait functional SNP (single nucleotide polymorphism) site
And (4) carrying out correlation analysis on the genotypes of the high, medium and low lipid-producing plants of each clone pinus massoniana according to the SNP loci screened in the step (4) to obtain SNP markers which are obviously related to lipid-producing traits, and further screening to obtain the high lipid-producing function SNP markers of pinus massoniana.
The number (m value) of the clone pinus massoniana in the step (1) can be selected according to actual needs; preferably m is more than or equal to 20; more preferably m is not less than 50; still more preferably m.gtoreq.150.
The number of the plants (the value of n) of each clone pinus massoniana in the step (1) can be selected according to actual needs; preferably n is more than or equal to 5; more preferably: n is more than or equal to 5 and less than or equal to 20.
The high, medium and low-lipid-producing plants in the step (1) can be biologically repeated according to actual needs, preferably more than 3 times (namely, more than 3 individuals of each clone pinus massoniana are taken as biological repetition).
The collection time of the turpentine in the step (1) is preferably 7-9 months per year.
The height of the tissue material collecting part in the step (1) is the same as that of the fat cutting part (about 15 grams of the tissue material is collected by each individual plant), and the collecting time is preferably 12 to 1 am in clear weather.
The clonal masson pine in the step (1) is grown for more than 10 years in age or reaches more than 16cm in breast diameter; preferably a 29 year old clone masson pine.
The plant with middle lipogenic power in the step (1) is preferably a plant with middle lipogenic power corresponding to the selected median as the plant with middle lipogenic power.
The tissue material in the step (1) comprises secondary xylem, mature coniferous leaves, mature branches or immature branches; preferably the tissue material of the secondary xylem.
Step (ii) of(2) The transcriptome screening conditions for differentially expressed genes as described in (1) are: log (log)2 RatioNot less than 1.0 and P-value<0.05。
The condition for screening differential metabolites by the metabolome in the step (2) is as follows: log (log)2 (fold change)>0.5 and q-value<0.01。
The conditions for screening differential protein by the proteome in the step (2) are as follows: the Fold change is more than or equal to 1.2 and P is less than 0.5.
The condition for screening the differential genes by the combined analysis in the step (2) is the differential multiple correlation R of different omics data2>0.9。
The candidate gene in step (2) may be selected from transcription factors and other genes, preferably 3 or more candidate genes; further preference is given to selecting 5 or more candidate genes; more preferably MG7, MG25, MG26, MG27 and MG36 genes, and the nucleotide sequences are shown in SEQ ID NO. 1-5.
The screening method of the Pinus massoniana high-lipid-production functional SNP marker further comprises at least one of the following steps after the step (2) and before the step (3):
a. performing function and metabolic pathway annotation on the differential genes by utilizing databases such as GO, KEGG and the like, selecting the functional genes and performing expression analysis by using the FPKM value of the candidate genes;
b. collecting other tissue materials of high, medium and low-adiposity plants of each clone pinus massoniana, analyzing the expression quantity of candidate genes, namely performing expression verification on the candidate genes;
c. and obtaining a full-length sequence of the candidate gene according to a PacBio full-length transcriptome sequencing database, and analyzing and determining the gene family of the differential gene by using NCBI conserved domain search software and a Pfam conserved domain database.
The primer sequence used in the qRT-PCR method in the step (3) is preferably shown as SEQ ID NO. 19-28.
The amplification reaction procedure in the qRT-PCR method in the step (3) is as follows: denaturation temperature 95 ℃ for 10 seconds, annealing temperature 57-62 ℃ for 30 seconds, and extension temperature 72 ℃ for 20 seconds.
The reaction procedure of PCR full-length amplification (gene full-length amplification) in the step (4) is as follows: the pre-denaturation temperature is 94 ℃ for 5 minutes, the denaturation temperature is 94 ℃ for 1 minute, the annealing temperature is 55-63 ℃ for 30 seconds, and the extension temperature is 72 ℃ for 1-3 minutes (about 1 kb/minute).
The sequence of the primer used for PCR full-length amplification in the step (4) is preferably shown as SEQ ID NO. 29-38.
The development of the SNP molecular marker described in the step (4) is preferably carried out on a transcription factor gene sequence.
The screening method of the Pinus massoniana high-lipid-production functional SNP marker further comprises the step of verifying the polymorphism of SNP sites in a population after the step (4) and before the step (5), and specifically comprises the following steps: primers are specially designed in sequence at two ends of a site where the SNP marker is located, then genome DNA of masson pine germplasm resources is extracted, PCR amplification and sequence comparison are carried out, and the correlation between the SNP marker and the lipogenic property is verified.
The screening method of the Pinus massoniana high-lipid-production functional SNP marker is applied to screening of the Pinus massoniana high-lipid-production functional SNP marker or Pinus massoniana breeding, and the Pinus massoniana high-lipid-production functional SNP marker obtained by the method is beneficial to early selection of Pinus massoniana high-lipid-production traits (such as screening of high-lipid-production Pinus massoniana), and can be used in the field of molecular assisted selection breeding.
A Pinus massoniana high-lipid-production functional SNP marker is any one of the following SNP markers:
(1) the nucleotide sequence shown in SEQ ID NO.6, wherein the allele in the 76 th base from the 5' end is T/C;
(2) the nucleotide sequence shown in SEQ ID NO.7, wherein alleles in the 78 th base and the 1073 rd base from the 5' end are both T/C;
(3) the nucleotide sequence shown in SEQ ID NO.8, wherein the allele in the 169 th base from the 5' end is C/T;
(4) the nucleotide sequence shown in SEQ ID NO.9, wherein the allele in the 128 th base from the 5' end is T/G;
(5) a nucleotide sequence shown in SEQ ID NO.10 in which the allele in the 103 th base from the 5' end is G/T or the allele in the 194 th base is A/G;
(6) the nucleotide sequence shown in SEQ ID NO.11, wherein the allele in the 130 th base from the 5' end is G/T;
(7) the nucleotide sequence shown in SEQ ID NO.12, wherein the allele in the 29 th base from the 5' end is C/T;
(8) the nucleotide sequence shown in SEQ ID NO.13, wherein the allele in the 73 rd base from the 5' end is T/A, or the allele in the 486 th base is G/C;
(9) the nucleotide sequence shown in SEQ ID NO.14, wherein the allele in the 109 th base from the 5' end is A/G;
(10) the nucleotide sequence shown in SEQ ID NO.15, wherein the allele in the 143 th base from the 5' end is C/T;
(11) a nucleotide sequence shown in SEQ ID NO.16, wherein the allele at base 125 from the 5' end is G/T, the allele at base 527 is G/T, the allele at base 895 is G/C, or the allele at base 1087 is T/G;
(12) the nucleotide sequence shown in SEQ ID NO.17, wherein the allele in the 138 th base from the 5' end is A/G;
(13) the nucleotide sequence shown in SEQ ID NO.18, wherein the allele in the 97 th base from the 5' end is A/C.
In the SNP markers, the grease producing capability of the masson pine clone plant of which the allele in the 194 th base in (2), (3), (6), (7) and (5) is A/G is relatively higher than that of the masson pine clone plant in other SNP markers.
Compared with the prior art, the invention has the following advantages and effects:
(1) the invention provides a technology for developing a functional SNP site of a target character, which takes the high-lipid-content character of pinus massoniana as an example, develops a functional marker associated with the target character, realizes the effective combination of a multigroup chemical sequencing technology and an extreme character phenotype, greatly reduces the development cost, only needs to select pinus massoniana strains with highest, medium and lowest lipid-production capacity to collect tissue materials in material selection, extracts RNA and mixes the RNA in equal quantity to perform PacBio full-length transcriptome sequencing, simultaneously performs second-generation transcriptome, metabolome and proteome sequencing and performs differential expression analysis, has lower sequencing cost, is suitable for different characters of different species, has higher applicability, and particularly has obvious advantages in the case of larger genome species.
(2) The invention simplifies the functional SNP research and development technical process, and the prior art usually adopts: phenotype determination + whole sample resequencing + population development SNP site + whole genome association analysis development SNP functional marker method, and the invention adopts: phenotype determination, sequencing of a small amount of samples, screening of candidate genes and development of SNP functional markers.
(3) The invention aims at the target character difference phenotype, accurately positions candidate genes by utilizing the multiomic, verifies the difference expression by qRT-PCR, and then carries out group verification on specific fragments by utilizing the PCR, namely, firstly selects the difference genes and then carries out sequence amplification comparison, thereby having higher development efficiency of SNP sites.
(4) When the SNP is developed for a new selection gene, SNP loci related to target characters can be developed only by selecting 3 clones of masson pine with high, medium and low lipid production capacity, the SNP research and development cost is reduced, the technology only needs to perform transcriptome and other multiunit chemical sequencing on a few samples, the cost is lower, and the researched SNP marker is beneficial to realizing the early selection of the masson pine lipid production capacity.
(5) The invention has wider applicability and lower research and development cost, the prior art usually adopts whole genome re-sequencing to develop group SNP markers, the number of sequencing individuals is large, the required research and development cost is high, the technology is not suitable for species with larger genome, the simplified genome sequencing technology commonly used for large genome species is not beneficial to excavating functional genes, the invention only needs to carry out multigroup sequencing on a small number of individuals with extreme character phenotype to directly position candidate functional genes, the required research and development cost is lower, and the invention is suitable for species with various characters and genomes with different sizes, in particular to species with larger genome.
Drawings
FIG. 1 is a flow chart of the present invention technique.
FIG. 2 is a graph showing the results of quantitative differences between candidate genes located in a multiomic (transcriptome, metabolome, proteome).
FIG. 3 is an analysis diagram of candidate gene FPKM expression.
FIG. 4 is a diagram showing the results of identifying conserved domains of candidate genes.
FIG. 5 is a graph of the results of qRT-PCR validation of candidate genes.
FIG. 6 is a diagram of SNP marker allele types.
FIG. 7 is a diagram showing the genotype frequencies of SNP sites.
Detailed Description
The present invention will be described in further detail with reference to examples, but the embodiments of the present invention are not limited thereto. Reagents, methods and apparatus used in the present invention are conventional in the art unless otherwise indicated. The following examples are given without reference to specific experimental conditions, and are generally in accordance with conventional experimental conditions.
Example 1
(1) Technical process
The invention mainly adopts multigroup science to screen candidate genes, then uses PCR to develop SNP loci on the candidate genes, and obtains functional markers through group verification, and the technical flow chart is shown in figure 1.
(2) Selection of materials based on phenotype
The method utilizes the germplasm resources of 150 pinus massoniana clones collected and stored in a pinus massoniana seed orchard (total 150 clones, each clone has 5-20 individuals, the individuals are equivalent to biological repetition, the experimental materials are all conventional materials and are stored in the seed orchard of the institute of forestry and science, Guangdong province) to carry out calculation of Resin Yield (RYC), Resin can be collected aiming at each age (generally 10 years or more than 16cm of pinus massoniana), the experiment adopts data measured by 29-year pinus massoniana, although the chest diameters of the trees are different in size, the difference can be eliminated by a Resin yield calculation formula), and the circumference of a cutting position can be measured according to the Resin collection characteristics of the pinus massoniana (the collection position is generally lower than the chest diameter, the circumference of the cutting position can be measured when the Resin is collected) for calculating the Resin yield (specific reference documents: Zhang, Hakkai, Hakkah, Hakkawa and Hakkawa, annual change of the fatness of the pinus massoniana free pollinator system and genetic analysis [ J ] forestry science, 2013,49(001):48-52.), the determination of the fatness and the sample collection time are optimized in 7-9 months per year, and the RYC calculation formula is as follows:
Figure BDA0003485177170000051
in the formula: wt represents the total weight of the rosin, D represents the number of times the resin is cut per tree, Wd represents the width of the cut, and C represents the trunk circumference where the bark is cut.
The clones with extreme phenotype (high, medium and low adipogenic power, remarkable phenotypic difference (F test, P <0.05)) were selected based on the average adipogenic power of each clone calculated above, wherein the plant with the highest adipogenic power of each clone pinus massoniana was taken as the high adipogenic plant, the plant with the lowest adipogenic power was taken as the low adipogenic plant, and the plant with the adipogenic power equal to or close to the average adipogenic power was taken as the medium adipogenic plant (the experiment selects the median of the number). The 3 clones screened in this experiment were GW2 (high lipid production), GW9 (medium lipid production), GW92 (low lipid production), and were described in the reference (Qingsong Bai, Yanling Cai, Boxiang He, et al. core set construction and association analysis of Pinus massoniana from Guangdong provide in southern China SLAF-seq. scientific reports.2019 (9):13157.doi:10.1038/s 41598-019-49737-2)), 3 individuals are selected to collect secondary xylem tissue materials (3 individuals of high-, medium-and low-lipid-producing masson pine, which is equivalent to 3 biological repetitions), the height of the material collecting part is the same as that of the lipid cutting part, the bark is peeled off when the material is collected, the tissue materials are scraped by a knife when the secondary xylem tissue is exposed, about 15 g of the tissue materials are collected for each individual, and the collecting time is suitable for clear weather of 12 o 'clock to 1 o' clock at noon.
Subsequently, performing multigroup sequencing, and sequencing the collected inferior wood tissue material of 9 individuals with high, medium and low lipid-producing capability of the 3 clones respectively, wherein the sequencing comprises the following steps: sequencing transcriptome, metabolome and proteome, and taking 3 single strains of each clone as biological repeated independent sequencing; meanwhile, 9 individual RNA samples of 3 clones were equally mixed to perform PacBio full-length transcription set sequencing. The invention entrusts Beijing Nuo cereal induced biotechnology limited company to carry out transcriptome, metabolome and proteome sequencing.
(3) Screening of accurate positioning candidate genes based on multigroup data
Carrying out differential expression analysis and joint analysis by utilizing multiple groups of chemical data to screen candidate genes, wherein: the conditions for screening transcriptome for differentially expressed genes are log2 RatioNot less than 1.0 and P-value<0.05; the conditions for the metabolome screening for differential metabolites are log2 (fold change)>0.5 and q-value<0.01; the conditions for screening the differential protein by the proteome are that the Fold change is more than or equal to 1.2 and P<0.5; the condition for screening the differential genes by the combined analysis is the differential multiple correlation R of different omics data2>0.9. Due to the fact that the number of candidate genes located by combining the transcriptome, the metabolome and the proteome is greatly reduced (figure 2), 28075 differentially expressed genes can be obtained only by relying on the transcriptome, the number of the differentially expressed genes is reduced to 2821 by the combined analysis of the transcriptome and the proteome, and the number of the differentially expressed genes by the combined analysis of the transcriptome and the metabolome is 226.
Functional and metabolic pathway annotation is carried out on differential genes by utilizing databases such as GO, KEGG and the like, functional genes such as transcription factors MG7, MG25, MG26, MG27 and MG36 are selected (the 5 genes are mainly used for explaining that high diversity exists among groups), and FPKM values of candidate genes obtained by a transcriptome are used for expression analysis (figure 3). When analyzing the expression amount, other tissues such as developed mature needle leaves, immature branches and mature branches without diseases and insect pests are selected from the 9 single plants with high, medium and low lipogenic power of the 3 clones to be used for the expression verification of candidate genes, and the candidate genes such as MG7, MG25, MG26, MG27, MG36 and the like are differentially expressed in different tissues of the masson pine with high, medium and low lipogenic power. Wherein the candidate gene sequence is shown as follows:
MG7(SEQ ID NO.1):
ATGCAGAAGATTGTGGATCAAACTGATGCAACTATAAGAAAAGCCCGAGTTTCAGTCCGAGCTAGAACCGAGTCACCCATGATAAGTGATGGTTGCCAATGGAGGAAATATGGACAAAAGATGGCCAAGGGCAATCCATGCCCCAGAGCTTACTACAGATGCACCATGTCACCGTGCTGTCCAGTGCGCAAGCAGGTGCAACGCCTGGCCGAAGACAGATCGATTCTGATAACAACATACGAGGGCAGCCATAACCACATGCTTCCTCCAGCAGCCACAGCCATGGCATCCACTACAGCAGCAGCTGCCTCAATGCTTCTGTCCGGTTCTTCCACATCAGCTGATAACATCGCACTAAATGCAAGCTTCATGGCAGGTGCCCTCATGCAGCATCCTTGCAACACATCAACTGCTAGCATCTCGGCCTCAGCTCCATTCCCCACTATCACACTGGATCTCACACACAATCCCAACCAAATGGCCAACGCTCCAGGTCACATGGCAGCCTCAAACCCAAGAGCACTAGCAGGACTCCCAGCTCATGCCATGCCCTTTGTAGGCATGCCACACCAATTCCCCACCAACACACCCCAGGGGGCATTATTCCATGGGCAATCAATTTACAATGCTCCTTCCATGTTTGCACCCTTAGCAGGCCAGCTGCAACGCCCCCAACAACCCATGATGCCCACACCACCAAAGCTCAATATTCAAGCTGGCCAACCACCAACACCACCACAACAACAACCATCATTCATAGACACAGTCAGTGCAGCTACAGCTGCCATAACTTCTGATCCCAATTTCACAGCAGCCCTTGCAGCAGCCATCACATCCCTTATGAACAACAACAACAATGCTGCCAATGCCATTTCTAAACCCAACTCCTCTCAGGCACTCAATCCTAGTCCCCACGTTGCAGCTCATGTCAAGACAGATCACACCCAA;
MG25(SEQ ID NO.2):
ATGGAAAACCTCCCCAATCAGCAACCTGACCTTGAAATTGCTCAAACACACGAGGATCCCGGGTGCCGCCGATTTAAGGGAATTCGACTGCGAAAATGGGGAAGGTGGGTATCGGAAATCCGGATGCCAAAATCTCGAGAGAAAATATGGCTGGGCTCTTATACGACTCCCGAGCAGGCTGCCCGTGCTTACGACGCCGCAGTGTATTGTCTGAGAGGGCCCAACGCCAAATTTAACTTTCCGGAATCCGTGCACGACATTCCGTCTGTGACTTCTGTTTCCCGTCAGGAAATTCAGCACGCCGCCTTCAAATATGCCTTGGGCCAGCCCCCTCCGAGTTTGCAGTCTCTGGAAGGGCACGCCGCCCTCAAATATGCCTTGGGCCAGCCCCCTCCGAGTTTGCAGTCTCTGGAAGAAGGGCATGCCGCCCTCAAATATGCCTTGGGCCAGCCTCCTCCGAGTTTGCAGTCTCTGGAAGGGCACGCGTCGCCGTCACAGTCGTCTACGGTTTCGGAAACGGAGTTATCGGGAGAACAGCTGAAGATATCGGAAGAGTGCCCCGACTTAGCACTGTGTTGGTCGCTGTTTGCGGCAGACGACACTGGGATTCCCAATTCGGAAAAAGTCCCGTCGATTGACGAATATTTCAGTGCGACTTTGCAGGAGCAGCGGGAGGAGGGTTACATTTTCACAGATTTGTGGAATTTCCAAGATCAAGATGTT;
MG26(SEQ ID NO.3):
ATGGAGAAATCCTCCCAACAGGAGGATGACCATGCCCATACTCCAGAAGAAGAAGTTCGCGGGCAAAAGTGCCGTCAATTTAAGGGAACCCGATTGCGAAAGTGGGGGAGATGGGTAGCAGAAATTCGAATGCCCAAATCTCGAGAGAAGTTATGGCTGGGGTCATACAAAAAGCCCGAGCAGGCCGCCCGCGCCTACGACGCCGCAGTGTATTGTCTGAGAGGGCCGAACGCCAAATTCAATTTACCCAATTCTCTACCTGACATTCCGTCTGCGTCTTCTCTTTCCCGCCGGCAGATTCAACTCGCTGCTGCCAAATGTGCGTTGGATCAATTCCCTTCGAGTGCGCCCCCTCTGCAGAATTTTAATAATAAGGCCATGGACGAGGCCGCATCGCCGTCAAGACTGGATCCGGTATCAGAAACTGAGTTGTCGAGCGATGGTCATCAAATATCAGAGGAAGGGGAGTTGGATTTGTGGGAAAGTCCGTTTGAGGTATCAGGCGGCAATTATGAAGGGCGCATGAACCTGAATTTAGAGAGAATGCCATCGATTGAGGAGTTCTCGGCCTTGGAAATTATTTACAGTATTTGTCAGCAGCATGAGGAGGAGGAACACATAAACCTTTTCCTCGACCCCACAGAGTTGTGGAACTTT;
MG27(SEQ ID NO.4):
ATGGATAATGGATCCTCTCTCGTGCCCATCGCCATGCCCAATTCCTTGACAGACATTGAAGCAATAAGCAATTCTCCTTTTGCGGATAAGGGTGGAAACAAGAGAATTCGAACGCAAGACGAAGCTGCCTCTTCGCCTTCACAGCAAAGCAGGCTGAACCTGCAAAATCCAGCATACAGAGGCGTGCGCCGTCGTAGCTGGGGCAAATGGGTGTCTGAAATTCGGGAACCAAAAAAGAAAAACCGAATCTGGCTCGGCTCCTACGATACACCAGAGATGGCCGCTCGAGCTCACGATGTCGCTGCATTCTACTTGAAAGGAAAGAAACATTCGTTGCTCAATTTTCCAGAGCTCATTGATCAACTTCCAGAACCAATTTCTTCGGCTCCGCCCCACATTCAAGCTGCCGCGGCAGCAGCAGCCGTCGCTTTCAATTCTGCATCCCGTAGTGCACAGAACTCTGGAATGTCAAGCGATATCAATAAACAAGACAGCGGAAGGCCAAGAAATACTCCAGTAAGCAATGAGCACGCAGGTATCTCCTCTTCGAATCAACTATCGGCAGTAATTAGCTCAGAGTTAAACTTGGAGACGGCTAATGTCGAGTTATTAGGGAAACCGAATTATACTAATTCATCATCAATGGAGATCGTAACAGAGGAGGACTTGTTCGAATCCACCAACTTCTATACGAATTTGGCAGAAGGCCTTATGCTTCCTCCACCTCTGTTCAGTATTCCTGAATTAGACATAGAGGAGCAGAGATTGGAGGAAGGATTTCTTTGGTCTGGTTTT;
MG36(SEQ ID NO.5):
ATGGCTGGCATGGATGACGGAGACATTAATTTTAGCAGCAATATAGTAGATGATTTTGGCAATGGGTCCTCCATGGAAAGCTTTTTCGAGGAGATTTTGAGGGATACTACTCATGCCTGCACTCATACACACACCTGCAACCCTCCCGGGCCAGATAACACACATACACACACATGCTTCCACACGCACACAAAAATCCTTGCTGCTCCCGATGACGAGAAGTCTGCAGACACTGCTGAGTCTCCACAAAACAGCTCTTCCAAGCCAAAAAAACGACCAGTAGGTAATCGAGAGGCAGTTAGAAAATACAGGGAAAAAAAAAAGGCCCGGACGGCCTCCCTGGAGGAGCAGGTTGTTCAACTGACCACTGTTAATCAGCAATTGCATAGGAGATTACAGGGTCAAGCAGCTTTAGAGGCTGAGATTGCAAGATTGAAGTGTCTGCTGGCTGACTTTAGGGGCCGGATCGATGGGGAATTGGGGTCCTATCCTTACCAAAAGTCAATTAGAATGGATAAAACTTGCAATGATGCACCATTCCGGCAACCGATGCCTGGGGGATATGTCCTGGATCCCTGCAATATCTGGTGCAATGCAGATGCAGCTTGCCGTGAACCGACTCTGGCATCCAACAGTGAGGGTGGTGTGCAGCACGAACGTGATAGTGCTGCACGTTGGAATGGCGATTGTGGCCAGATCGCAGGCCATTGTCAAGGTTTGAAGGGTGATATGGCAGTAACCTCTAGTGGACTTTCTGGATGCTCAGAGGGTACAGCAACAAAAACAGTGCCTGCTGCCATGGCTTCTTCTGGAAAAGAGAGAAAAGGTGCATTTGGTGTG。
except for GO and KEGG annotations, the full-length sequences of candidate genes are obtained according to a PacBio full-length transcription group sequencing database, and the gene families of the difference genes are analyzed and determined by using NCBI conserved domain search software and a Pfam conserved domain database (figure 4), so that a basis is provided for the subsequent research of the biological functions of the difference genes.
(4) qRT-PCR verification of gene expression level
Respectively extracting RNAs of different tissues (mature needle leaves, mature branches, immature branches and secondary xylem) of the masson pine with high, medium and low lipid production capacity by using a TriZol method, obtaining cDNA by using a reverse transcription kit, and verifying the expression quantity of the primary screening differential genes by using a qRT-PCR method. Primers were designed based on the full-length sequence of the candidate gene obtained and using Primer 3 (Table 1). Actin is selected as an internal reference gene, and an amplification reaction program is set as follows: denaturation temperature 95 ℃ for 10 seconds, annealing temperature 57-62 ℃ for 30 seconds, extension temperature 72 ℃ for 20 seconds, each gene in each sample was three replicates.
TABLE 1 candidate genes qRT-PCR primers
Figure BDA0003485177170000081
After the reaction is completed, use 2-ΔΔCtThe method analyzes relative expression quantity of candidate genes, takes the masson pine with high lipid production of different tissues such as secondary xylem and the like as a control, analyzes the expression quantity of MG7, MG25, MG26, MG27 and MG36 in the masson pine with medium and low lipid production (figure 5), and selects differentially expressed genes for SNP locus development.
(5) Development of SNP sites
SNP molecular markers were developed for differentially expressed genes MG7, MG25, MG26, MG27, MG36 verified by qRT-PCR (FIG. 6). Collecting mature conifers of pinus massoniana of different clones, extracting genome DNA (respectively extracting DNA from a single plant in high, medium and low modes), obtaining a 5 'end sequence, a coding sequence and a 3' end sequence of a candidate gene by using transcriptome data, carrying out PCR full-length amplification, firstly designing an amplification primer (table 2), wherein a PCR reaction program comprises the following steps: the pre-denaturation temperature is 94 ℃ for 5 minutes, the denaturation temperature is 94 ℃ for 1 minute, the annealing temperature is generally 55 ℃ to 63 ℃ for 30 seconds according to the characteristics of the primers, and the extension temperature is 72 ℃ for a duration (about 1 kb/minute) of generally 1 to 3 minutes according to the length of the sequence. According to different sequence characteristics, some sequences need to be amplified and spliced to complete sequences through multiple PCR. Preliminary analysis shows that SNP molecular markers are developed in transcription factor gene sequences of WRKY, AP2/ERF, bZIP and the like, and the allele types comprise C/G, G/T, A/C, C/T and the like.
TABLE 2 candidate Gene amplification primers
Figure BDA0003485177170000091
(6) Polymorphism verification of SNP (Single nucleotide polymorphism) site in population
Primers (table 2) are designed according to special sequences at two ends of a site where an SNP marker is located, PCR amplification and sequence comparison are carried out on genomic DNA of 150 pinus massoniana germplasm resources, and analysis shows that the SNP sites of 5 genes such as MG7, MG25, MG26, MG27 and MG36 have high amplification efficiency and success rate. All of the above 5 genes contained both homozygous and heterozygous alleles (FIG. 7). Wherein, the genotype of the MG7 gene comprises C/G, C/C, G/G; the genotype of the MG25 gene includes G/T, G/G, T/T; the genotype of the MG26 gene comprises A/C, A/A, C/C; the genotype of the MG27 gene includes C/T, C/C, T/T; the genotype of the MG36 gene includes G/C, C/C, G/G. Genotype frequency analysis of each gene shows that the heterozygosity of the MG7 and the MG36 is higher, and the ratio of the C/G genotype in the population is 48 percent and 57 percent respectively; the purity of the MG25 and MG26 genes is high, and the percentage of homozygous genotypes (G/G, T/T) and (A/A, C/C) in the population is 83% and 94%, respectively. Therefore, the development of SNP sites and the development of important trait associated markers on important difference genes of masson pine has extremely high feasibility.
(7) Research and development of lipogenesis ability character functional SNP locus
1014 Single Nucleotide Polymorphism (SNP)/insertion deletion (Indel) (SNP/Indel for short) allelic variation sites containing are obtained by co-screening the masson pine with high, medium and low lipid-producing power based on multigroup chemical sequencing data, wherein 10 are Indel variation sites. Primers are designed by utilizing sequence information and PCR amplification and sequence comparison verification are carried out on the primers, and the results are jointly developed in 13 genes screened according to differential expression information and sequence comparison information to obtain 19 SNP sites (table 3), wherein the allelic variation types of the SNP sites comprise C/T, T/G, C/G, A/G, A/C, T/A and the like. Wherein, the SNP sites of Cluster-35640.0 are at most 4, then the Cluster-19799.0, the Cluster-24038.0 and the Cluster-32834.3 all contain 2 SNP sites, and the remaining 9 genes only contain 1 SNP site.
3 strains of Pinus massoniana with high, medium and low lipid production are selected respectively, and the genotype of the SNP molecular marker is verified by utilizing a PCR and sequence comparison method (Table 3). The result shows that the genotypes of the other genes in the high-lipid-yielding pinus massoniana are homozygotic except that the two genes of Cluster-19799.0(C/T) and Cluster-31423.0(T/C) are heterozygous in the high-lipid-yielding pinus massoniana; 15 SNP loci are heterozygous genotypes in the low-fat-production masson pine. The genotype correlation analysis of the masson pine with high, medium and low lipogenesis power shows that SNP loci developed by the technology have higher correlation with lipogenesis power. SNP markers which are positioned in a plurality of variation sites of 4 genes such as Cluster-19799.0 (2), Cluster-24038.0 (2), Cluster-32834.3 (2), Cluster-35640.0 (4) and the like and are linked heredity; wherein, Cluster-24038.0, Cluster-29744.0, Cluster-32834.3 and Cluster-35640.0 are homozygous genotypes in high-fat-yield masson pine and heterozygous genotypes in low-fat-yield masson pine. Therefore, the SNP functional markers closely related to the lipogenic traits obtained by the research have high application value in the field of molecular assisted selection breeding.
TABLE 3 19 functional SNP sites closely related to the lipogenic trait
Figure BDA0003485177170000101
Cluster-19758.0
The SNP locus is a nucleotide sequence shown in Cluster-19758.0, wherein an allelic gene in 76 th base from the 5' end is T/C, and the nucleotide is shown as follows (SEQ ID NO. 6):
GTTTTGGACATATCCAGCAATCATGACAGTCCAGGACACAACACTCCTCTCAGACATTTTGTCAAACAATTGTCT(T/C)GCAATTTCTATATATCCACATCTAGCATACATGTCAATAAGAGCACCCCTAACATAGATATCGGACTCAAAACCAGTTTTGATTACATAATCATGAACTTCTTTGCCCTTTTCCAGGGACTTGAAGTAAGCACATGCTGAGAG;
Cluster-19799.0:
the SNP locus is a nucleotide sequence shown in Cluster-19799.0, wherein alleles in 78 th base and 1073 th base from the 5' end are both T/C, and the nucleotides are shown as follows (SEQ ID NO. 7):
GTTTCTGCTCTTTCTAACAGTTTCCAAGCTTCATCAACATTGCATGCCTTACAGAAGCCACCAATAAGTGTATTAAA(T/C)AAAGAAGCATTAGGGGGGATGCCTTTCTGGTACATCTCATTTAAAAGGGTATGAGCTTCATATACTCTGCCTCCTTCACAAAGACAGTTCATTAGCAGACTATAGCTAAAGGCATTAGGCACACAACCCTTCTTCTGCATGTAATTGAGTAATTTATAGGCCTTATCTATTTTGCCTCCAATACTAAGGCCTCGCATCAGTAAATTGTAGGTTCCAGTTGTAGGAGCACACTCCCTTGTTTGCATCTCACTTAAGAGTTTGAAAGCCATTGACAATTCTCCATGCTTACAAAAGCCATCTATTAGAGTACTATAAGAGATAGCATCTGGGTGAAGGCCTTTCTCCAACATCTGAATAAAAAACCTCTTTGCTTCATCCAATCTGCCCTCTTTGCAGAGGCCATCAATGACTGTTGTATATGTGACAACATCAGGAAAGCAGTTGTCATGTTTAACTGAATTTACCAGCCCCAAAGAAGCATCGTCAGTCTCATAGAAATCACCAATAGCAGCACTGCCATGAAGCCATATGCGATCCATCATTTTAATCGCTTTGTTTACTTTGCCATTTTTACAAAGGCCTTCTATCATAATATTACAAGTTATCGTATCTGGAGTGAGACCTTTTTCCCTCATTTTGTGTAGAAGATGTTCTGCTTCTGAAATCCTGCCCTCCTTGCAGACACTTTGGACCAAAATGTTGTAAGTAACCGTATTAGCAAAACAGCCTTTGCTCAACATTTCACCAAGCAATTGATTAGCTCGTTCAACATTACCTCTCTTGCAATACCCATGAAGAAGCGTGCTGTATGTTATGCTGTCTGGAACAACTCCATTTCGAAGCATCGCATTCAGGATCCTCTCTGCATCAGATAGCATACCCTCCCTACACATGCCGTCGACTAATACATTGTAGGAGATCACATCGGGAGTGATGTTAGCTGTCATCATCGCTTCCAGGAGCTTGCGAGCCTCAGAAATTCTCCCTAGCTTGG(T/C)CAGTCCACTTAAAAGAATGTTGTAGGACACCACATTGCAGGAATGACAATTATCCTTCATGTGCTCCATAACCTTGAAAGCTTCGTCAAGTTTGCCTTC;
Cluster-20708.0:
the SNP locus is a nucleotide sequence shown in Cluster-20708.0, wherein an allelic gene in the 169 th base from the 5' end is C/T, and the nucleotides are shown as follows (SEQ ID NO. 8):
TTAAGGCCTACATAACCTCCGAAGCTCTCTTAAGCGTTCTATGTCGAAATTCCTTTTCTTCATCAGTCTGCTCTGCCTGGCACTGAAGCGAGTATTGAAGAAGCTCTTGAATGAAGGATTTGTGGTTGTGTTAAGGAAGAATAGATACCCCGCCATGAAGTATATAGA(C/T)GCGAGAAAGTAGCATACGGGTTCCATCACATCCCATGACAGCTCCCAGAATGTCAGCCTCATTAAACCCGCAGTTTGAAGCCCGAAAAATCCCAAGCCCAACCGTAGTTCAGTCATGGCGTCACTCTTAGCTTTCTGA;
Cluster-23480.0:
the SNP locus is a nucleotide sequence shown in Cluster-23480.0, wherein an allelic gene in 128 th base from the 5' end is T/G, and the nucleotide is shown as follows (SEQ ID NO. 9):
TGCCCATGTAGAGCATAGCCTACAATCATCGCAGTCCACGAAACCACATCTGTCTTCGGTATCATTTTAAAGACTTCACATGCATCACCTATGGAACTACATTTAGCATACATTGTAATAAGAGCAG(T/G)CCCCACAGAGGCATCCTCCTGAAATCCATTTTTGATGGTCTGGCAATGAAACTGCTTGCCTTGATCAATAATTGCCAGGTCCGCACACACG;
Cluster-24038.0:
the SNP locus has a nucleotide sequence shown in Cluster-24038.0, wherein allelic genes in 103 th and 194 th bases from the 5' end are G/T and A/G, and the nucleotides are shown as follows (SEQ ID NO. 10):
ATGGTGCCTTCTCTAATTCAAATTGATTCATCACAGGTGATGGAGTCAGAAACAGCAGCAGTAACAGGGTCTCTACTTTTTGGAACAGATAAAATGACTATG(G/T)GACCAGGATCTGAAAATGTTTGCCCTGTTTTTTCCTCTTTAACAGGCCTGCAATCAAACCAGATGCTTATTTCTCATTCCAACCAAATGG(A/G)CTTCCACAAAAGTTCTGTAGTCCCAATCCATGCAGCAGAATTAAGCAGAGAAATATATGATATGGAAAGTACCCAATACAACAATATTCACTTTT;
Cluster-29744.0:
the SNP locus is a nucleotide sequence shown in Cluster-29744.0, wherein an allele in the 130 th base from the 5' end is G/T, and the nucleotides are shown as follows (SEQ ID NO. 11):
GGTTGTTCCGCAAATCCAGTTGGATTACTAAACCCTTGCCATTTTTAAGCACCCTGCTTAAATTCTCGTCATATTCTTCAAGAGGGGACCATTCCTCTGATTCTTCTTCTATTTCACACTCGTCCAATA(G/T)CCAAAATCAATCTCCTATGCTCACCCCTGCGCGCGTGCACAAATTAATTGCTAAGCAGACAGATGCATTGTTAGCAATGGAAATTTTTG;
Cluster-31423.0:
the SNP locus is a nucleotide sequence shown in Cluster-31423.0, wherein an allelic gene in 29 th base from the 5' end is C/T, and the nucleotide is shown as follows (SEQ ID NO. 12):
TTATATGCGAGACCTTGGATTACCATTG(C/T)TTGGTGATGCACGGTAGGAGTAGACGCCCTTCCCACTCTTCCGACCAAGGTACCCTGCATCAACATATTGCACAAGCAGTGGGCATGGGGAATATTTGCTTTCTCCAAGCCCGTGATGGAGAATCTTCATAATTGAGAGGCAAACATCTAATCCAATGAAGTCTGCAAGTTCCAGGGGTCCCATGG;
Cluster-32834.3
the SNP site is a nucleotide sequence shown in Cluster-32834.3, wherein allelic genes in 73 rd base and 486 th base from the 5' end are T/A and G/C, and the nucleotides are shown as follows (SEQ ID NO. 13):
CAACAACATACCCAAACCAGTCTCAGTTCCCAAGGAGAATCTAACCGACTAACGAGGTCGTCATCCTCCAGA(T/A)CCAAAGCAAACACAATACCTGCAAGTGCTGCTTTCATAGATGGGGTGGCCACCAAAGACATCAACATTGATCCTAGTACACTGGTTTCCCTTCGAATTTTTCTGCCAGAACCAGAATTAGCAGTAGCAGCCCTCCCACAGGGCCAGCAGAGAGATTCAAGGGATGTGGATCTCACTCTCAGAGCTTGGGGTTTGGCCAAGAGAGAAACCATACATGAAGTGGATTCAGCTCAAAGTTCCAGGAAATTGAACAGGAATGCCAGTACTAATGCAAAAGGGGTCAGTTACGAATCAAGTTCAACACACAGGAGGAAAAGTTATGGTGGTATTGTAGAAAACATGAGAAGCAGCTATGGTGGTATTGGGGACAAGCAGCAGCGAGTTTGTCAGGATTCGTTGCTCAGACAAGGAAT(G/C)GACGAAGACGTGAATCCATCAGAGAAGGATAGTTTGCTCAGGTGTACGGATAAGACATCAGGATCAGGCCAAATAAATGAT;
Cluster-32938.2
the SNP locus is a nucleotide sequence shown in Cluster-32938.2, wherein an allele in 109 th base from the 5' end is A/G, and the nucleotides are shown as follows (SEQ ID NO. 14):
AGACAAAACACCAATAAAGGTGACATGGTTTGGTTTCACATGGGAGTTCCTCATATGTTCAAACAGCTTGAGAGCCTTCTTTCCTTGACCATGTATGGCATATCCTAC(A/G)ATCATGGCAGTCCATGAAACAACATTTCGCCTAGGCATTTTGTCAAATACTCCGCATGCATCCTCTAGGCACCCACATTTGGCATACATATCTATAAGGGCATTCCCCAC;
Cluster-34813.0
the SNP locus is a nucleotide sequence shown in Cluster-34813.0, wherein an allelic gene in 143 rd base from 5' end is C/T, and the nucleotide is shown as follows (SEQ ID NO. 15):
GGTATCTAACTCTGTATTGTCAACAATTTTTGCCTGCTGTTTGGATGTCAAAAGACCTGCATCAGAAGATTCATTGCAAGTTTGTGCAGCTGTGAAATCCTCATTTCTGACCACCATCTCTGAATCCCTTTTGCAATGATAT(C/T)CATGCTTTCCAGTATGGGATACGCCCTCCAGTCTGTTAAATTTTGGTTCCAGAACAACAGTTGTTTCCCCTTTATCTG;
Cluster-35640.0
the SNP site is a nucleotide sequence shown in Cluster-35640.0, wherein alleles in 125 th, 527 th, 895 th and 1087 th bases from the 5' end are G/T, G/T, G/C, T/G, and the nucleotides are shown as follows (SEQ ID NO. 16):
GGAATGTGGTGTCATGGAATGCAATAATTGCGGCATCTATACAGCATGGTTGGGTTAAGCAGGCATTGGAATTATTTGCTGAAATGCAATTAGCAGGCATAAAGGCGAGCCACGTTACTTTTGG(G/T)CTCGTTCTCAACGCGTTTGCGATTCTAGGAAACCTCGAACAGGGTAAGCAACTCCATGCCTGTGTGTTTAGAAATGGATTTGGATCGGATTTGCTTGTGGGTAGCTCTGTTATTCTAATGTATATAAAGTGTGGAAACATAGATGGTGCCCGCCAAGTGTTTGACAAAATGCCTATGGTAGACCTGGGGTTATTCAATGCTACAATTCAAGGATATAGCAGCGAGGGCCATAATAATGAGGCAATGGAACTATTTGGTCAACTACTACGAACGGGTTTGAAACCAAATGATATTACCTTCACCATTGTTCTTAGAGCCTGTGCCAGCTTTGAAACTGCCCTTGAACAGGGCAGGCAACTCCATGTACAGATAATGAAATCTGTGTTTCAGCACAATGTTTC(G/T)GTGATTAGTTCTCTCATCACTGTGTATGCTAAAAGTGGTTGCATAGCTGATGCACAAAAAGTGTTTGACAAGATGACTGTACAAAACAATGTAGTTTTATGGACTGCATTGATAGCTGGTTACACCCAGAAAGGGTATACAGACGAAGCCTTGAAACTCTTTCTCTTAATGCAACGTGCAGGTGTTAAACCTAATCAATTCACCTACCCAACTGTTCTCCGTGCTTGTGCAAACTTAGCTTCTATAGAAGAAGGAAAACAAGTCCATTGTCATATTATCAAAGCTGGGGCTGCGTCGGATACTTTCGTTGCCAGCGCCATTGTTGACATGTATATTAAGTGCGGTAGTCTAGAGGATGGCCAACGAG(G/C)GTTTGATAGAATTCCTAGACGAGACATTGTATCGTGGAATACAATGATTGCAGGATATGCTCAACATGGGTATGTTGACAAGGCACTTTTAATCTTTGAAGAAATGCAACAGTATGGCATGAAACCCAACCACGTAACTTTTGTTGCAGTTCTCTCTGCATGCAGCCATGGAGGACTTGTTCGTTTAGGGC(T/G)CCGATACTTTAGTTCCTTGAGCCGAACTCATGGCATTATTCCAAGAATGGAGCATTATGCTTGCATAGTTGACCTCCTTGGCCGTGTTGGGCACTTGTATGAGGCAGAAGACTTTATCAACAATATGCCTTTCGAGCCA;
Cluster-39957.3
the SNP locus is a nucleotide sequence shown in Cluster-39957.3, wherein an allele in 138 th base from the 5' end is A/G, and the nucleotides are shown as follows (SEQ ID NO. 17):
TTGCAGCGCATGCCTTAAGAACACAGGGAAATGTGAAATTATCCGGTCGCAAACCCATCCCTTGCATCTGATCATATAATGCAAGCGCCTCTCTGCAGCCACCCTGCCTAGCATACGCTCCAATCATGGCATTCCAC(A/G)TAAACTTATTCCTTTGAGCCATTTTGTCAAACACTTCGCGTGCATATTCCAAGTTTCCACACTTTGCATACATGGTGATG;
Cluster-40412.10076
the SNP locus is a nucleotide sequence shown in Cluster-40412.10076, wherein an allele in 97 th base from the 5' end is A/C, and the nucleotides are shown as follows (SEQ ID NO. 18):
AAATTGGTTTTTTGTTTTAGAAAAGACATAATAGAAGAACTGAGAATCATTGAAATACATAAAGAGCAGTTTTGTGGAAGACTGATGTCTGCTCGT(A/C)TAGTTACTCTAATAACACAGATCATGGTAATGGCGGTGGTGATATCAGTTATTTTGCTTTTCCTTGGAATTGGAATTCTGGTTTCAATTCATCTGTGTATTGTGGGTAGAGCACTCAGGA。
the above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.
Sequence listing
<110> Guangdong province forestry scientific research institute
<120> screening method of Pinus massoniana high-lipid-production functional SNP marker and application thereof
<160> 38
<170> SIPOSequenceListing 1.0
<210> 1
<211> 948
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
atgcagaaga ttgtggatca aactgatgca actataagaa aagcccgagt ttcagtccga 60
gctagaaccg agtcacccat gataagtgat ggttgccaat ggaggaaata tggacaaaag 120
atggccaagg gcaatccatg ccccagagct tactacagat gcaccatgtc accgtgctgt 180
ccagtgcgca agcaggtgca acgcctggcc gaagacagat cgattctgat aacaacatac 240
gagggcagcc ataaccacat gcttcctcca gcagccacag ccatggcatc cactacagca 300
gcagctgcct caatgcttct gtccggttct tccacatcag ctgataacat cgcactaaat 360
gcaagcttca tggcaggtgc cctcatgcag catccttgca acacatcaac tgctagcatc 420
tcggcctcag ctccattccc cactatcaca ctggatctca cacacaatcc caaccaaatg 480
gccaacgctc caggtcacat ggcagcctca aacccaagag cactagcagg actcccagct 540
catgccatgc cctttgtagg catgccacac caattcccca ccaacacacc ccagggggca 600
ttattccatg ggcaatcaat ttacaatgct ccttccatgt ttgcaccctt agcaggccag 660
ctgcaacgcc cccaacaacc catgatgccc acaccaccaa agctcaatat tcaagctggc 720
caaccaccaa caccaccaca acaacaacca tcattcatag acacagtcag tgcagctaca 780
gctgccataa cttctgatcc caatttcaca gcagcccttg cagcagccat cacatccctt 840
atgaacaaca acaacaatgc tgccaatgcc atttctaaac ccaactcctc tcaggcactc 900
aatcctagtc cccacgttgc agctcatgtc aagacagatc acacccaa 948
<210> 2
<211> 723
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
atggaaaacc tccccaatca gcaacctgac cttgaaattg ctcaaacaca cgaggatccc 60
gggtgccgcc gatttaaggg aattcgactg cgaaaatggg gaaggtgggt atcggaaatc 120
cggatgccaa aatctcgaga gaaaatatgg ctgggctctt atacgactcc cgagcaggct 180
gcccgtgctt acgacgccgc agtgtattgt ctgagagggc ccaacgccaa atttaacttt 240
ccggaatccg tgcacgacat tccgtctgtg acttctgttt cccgtcagga aattcagcac 300
gccgccttca aatatgcctt gggccagccc cctccgagtt tgcagtctct ggaagggcac 360
gccgccctca aatatgcctt gggccagccc cctccgagtt tgcagtctct ggaagaaggg 420
catgccgccc tcaaatatgc cttgggccag cctcctccga gtttgcagtc tctggaaggg 480
cacgcgtcgc cgtcacagtc gtctacggtt tcggaaacgg agttatcggg agaacagctg 540
aagatatcgg aagagtgccc cgacttagca ctgtgttggt cgctgtttgc ggcagacgac 600
actgggattc ccaattcgga aaaagtcccg tcgattgacg aatatttcag tgcgactttg 660
caggagcagc gggaggaggg ttacattttc acagatttgt ggaatttcca agatcaagat 720
gtt 723
<210> 3
<211> 657
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
atggagaaat cctcccaaca ggaggatgac catgcccata ctccagaaga agaagttcgc 60
gggcaaaagt gccgtcaatt taagggaacc cgattgcgaa agtgggggag atgggtagca 120
gaaattcgaa tgcccaaatc tcgagagaag ttatggctgg ggtcatacaa aaagcccgag 180
caggccgccc gcgcctacga cgccgcagtg tattgtctga gagggccgaa cgccaaattc 240
aatttaccca attctctacc tgacattccg tctgcgtctt ctctttcccg ccggcagatt 300
caactcgctg ctgccaaatg tgcgttggat caattccctt cgagtgcgcc ccctctgcag 360
aattttaata ataaggccat ggacgaggcc gcatcgccgt caagactgga tccggtatca 420
gaaactgagt tgtcgagcga tggtcatcaa atatcagagg aaggggagtt ggatttgtgg 480
gaaagtccgt ttgaggtatc aggcggcaat tatgaagggc gcatgaacct gaatttagag 540
agaatgccat cgattgagga gttctcggcc ttggaaatta tttacagtat ttgtcagcag 600
catgaggagg aggaacacat aaaccttttc ctcgacccca cagagttgtg gaacttt 657
<210> 4
<211> 795
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
atggataatg gatcctctct cgtgcccatc gccatgccca attccttgac agacattgaa 60
gcaataagca attctccttt tgcggataag ggtggaaaca agagaattcg aacgcaagac 120
gaagctgcct cttcgccttc acagcaaagc aggctgaacc tgcaaaatcc agcatacaga 180
ggcgtgcgcc gtcgtagctg gggcaaatgg gtgtctgaaa ttcgggaacc aaaaaagaaa 240
aaccgaatct ggctcggctc ctacgataca ccagagatgg ccgctcgagc tcacgatgtc 300
gctgcattct acttgaaagg aaagaaacat tcgttgctca attttccaga gctcattgat 360
caacttccag aaccaatttc ttcggctccg ccccacattc aagctgccgc ggcagcagca 420
gccgtcgctt tcaattctgc atcccgtagt gcacagaact ctggaatgtc aagcgatatc 480
aataaacaag acagcggaag gccaagaaat actccagtaa gcaatgagca cgcaggtatc 540
tcctcttcga atcaactatc ggcagtaatt agctcagagt taaacttgga gacggctaat 600
gtcgagttat tagggaaacc gaattatact aattcatcat caatggagat cgtaacagag 660
gaggacttgt tcgaatccac caacttctat acgaatttgg cagaaggcct tatgcttcct 720
ccacctctgt tcagtattcc tgaattagac atagaggagc agagattgga ggaaggattt 780
ctttggtctg gtttt 795
<210> 5
<211> 840
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
atggctggca tggatgacgg agacattaat tttagcagca atatagtaga tgattttggc 60
aatgggtcct ccatggaaag ctttttcgag gagattttga gggatactac tcatgcctgc 120
actcatacac acacctgcaa ccctcccggg ccagataaca cacatacaca cacatgcttc 180
cacacgcaca caaaaatcct tgctgctccc gatgacgaga agtctgcaga cactgctgag 240
tctccacaaa acagctcttc caagccaaaa aaacgaccag taggtaatcg agaggcagtt 300
agaaaataca gggaaaaaaa aaaggcccgg acggcctccc tggaggagca ggttgttcaa 360
ctgaccactg ttaatcagca attgcatagg agattacagg gtcaagcagc tttagaggct 420
gagattgcaa gattgaagtg tctgctggct gactttaggg gccggatcga tggggaattg 480
gggtcctatc cttaccaaaa gtcaattaga atggataaaa cttgcaatga tgcaccattc 540
cggcaaccga tgcctggggg atatgtcctg gatccctgca atatctggtg caatgcagat 600
gcagcttgcc gtgaaccgac tctggcatcc aacagtgagg gtggtgtgca gcacgaacgt 660
gatagtgctg cacgttggaa tggcgattgt ggccagatcg caggccattg tcaaggtttg 720
aagggtgata tggcagtaac ctctagtgga ctttctggat gctcagaggg tacagcaaca 780
aaaacagtgc ctgctgccat ggcttcttct ggaaaagaga gaaaaggtgc atttggtgtg 840
<210> 6
<211> 219
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (76)..(76)
<223> n is c or t
<400> 6
gttttggaca tatccagcaa tcatgacagt ccaggacaca acactcctct cagacatttt 60
gtcaaacaat tgtctngcaa tttctatata tccacatcta gcatacatgt caataagagc 120
acccctaaca tagatatcgg actcaaaacc agttttgatt acataatcat gaacttcttt 180
gcccttttcc agggacttga agtaagcaca tgctgagag 219
<210> 7
<211> 1172
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (78)..(78)
<223> n is c or t
<221> misc_feature
<222> (1073)..(1073)
<223> n is c or t
<400> 7
gtttctgctc tttctaacag tttccaagct tcatcaacat tgcatgcctt acagaagcca 60
ccaataagtg tattaaanaa agaagcatta ggggggatgc ctttctggta catctcattt 120
aaaagggtat gagcttcata tactctgcct ccttcacaaa gacagttcat tagcagacta 180
tagctaaagg cattaggcac acaacccttc ttctgcatgt aattgagtaa tttataggcc 240
ttatctattt tgcctccaat actaaggcct cgcatcagta aattgtaggt tccagttgta 300
ggagcacact cccttgtttg catctcactt aagagtttga aagccattga caattctcca 360
tgcttacaaa agccatctat tagagtacta taagagatag catctgggtg aaggcctttc 420
tccaacatct gaataaaaaa cctctttgct tcatccaatc tgccctcttt gcagaggcca 480
tcaatgactg ttgtatatgt gacaacatca ggaaagcagt tgtcatgttt aactgaattt 540
accagcccca aagaagcatc gtcagtctca tagaaatcac caatagcagc actgccatga 600
agccatatgc gatccatcat tttaatcgct ttgtttactt tgccattttt acaaaggcct 660
tctatcataa tattacaagt tatcgtatct ggagtgagac ctttttccct cattttgtgt 720
agaagatgtt ctgcttctga aatcctgccc tccttgcaga cactttggac caaaatgttg 780
taagtaaccg tattagcaaa acagcctttg ctcaacattt caccaagcaa ttgattagct 840
cgttcaacat tacctctctt gcaataccca tgaagaagcg tgctgtatgt tatgctgtct 900
ggaacaactc catttcgaag catcgcattc aggatcctct ctgcatcaga tagcataccc 960
tccctacaca tgccgtcgac taatacattg taggagatca catcgggagt gatgttagct 1020
gtcatcatcg cttccaggag cttgcgagcc tcagaaattc tccctagctt ggncagtcca 1080
cttaaaagaa tgttgtagga caccacattg caggaatgac aattatcctt catgtgctcc 1140
ataaccttga aagcttcgtc aagtttgcct tc 1172
<210> 8
<211> 307
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (169)..(169)
<223> n is c or t
<400> 8
ttaaggccta cataacctcc gaagctctct taagcgttct atgtcgaaat tccttttctt 60
catcagtctg ctctgcctgg cactgaagcg agtattgaag aagctcttga atgaaggatt 120
tgtggttgtg ttaaggaaga atagataccc cgccatgaag tatatagang cgagaaagta 180
gcatacgggt tccatcacat cccatgacag ctcccagaat gtcagcctca ttaaacccgc 240
agtttgaagc ccgaaaaatc ccaagcccaa ccgtagttca gtcatggcgt cactcttagc 300
tttctga 307
<210> 9
<211> 219
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (128)..(128)
<223> n is g or t
<400> 9
tgcccatgta gagcatagcc tacaatcatc gcagtccacg aaaccacatc tgtcttcggt 60
atcattttaa agacttcaca tgcatcacct atggaactac atttagcata cattgtaata 120
agagcagncc ccacagaggc atcctcctga aatccatttt tgatggtctg gcaatgaaac 180
tgcttgcctt gatcaataat tgccaggtcc gcacacacg 219
<210> 10
<211> 289
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (103)..(103)
<223> n is g or t
<221> misc_feature
<222> (194)..(194)
<223> n is a or g
<400> 10
atggtgcctt ctctaattca aattgattca tcacaggtga tggagtcaga aacagcagca 60
gtaacagggt ctctactttt tggaacagat aaaatgacta tgngaccagg atctgaaaat 120
gtttgccctg ttttttcctc tttaacaggc ctgcaatcaa accagatgct tatttctcat 180
tccaaccaaa tggncttcca caaaagttct gtagtcccaa tccatgcagc agaattaagc 240
agagaaatat atgatatgga aagtacccaa tacaacaata ttcactttt 289
<210> 11
<211> 219
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (130)..(130)
<223> n is g or t
<400> 11
ggttgttccg caaatccagt tggattacta aacccttgcc atttttaagc accctgctta 60
aattctcgtc atattcttca agaggggacc attcctctga ttcttcttct atttcacact 120
cgtccaatan ccaaaatcaa tctcctatgc tcacccctgc gcgcgtgcac aaattaattg 180
ctaagcagac agatgcattg ttagcaatgg aaatttttg 219
<210> 12
<211> 215
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (29)..(29)
<223> n is c or t
<400> 12
ttatatgcga gaccttggat taccattgnt tggtgatgca cggtaggagt agacgccctt 60
cccactcttc cgaccaaggt accctgcatc aacatattgc acaagcagtg ggcatgggga 120
atatttgctt tctccaagcc cgtgatggag aatcttcata attgagaggc aaacatctaa 180
tccaatgaag tctgcaagtt ccaggggtcc catgg 215
<210> 13
<211> 567
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (73)..(73)
<223> n is a or t
<221> misc_feature
<222> (486)..(486)
<223> n is c or g
<400> 13
caacaacata cccaaaccag tctcagttcc caaggagaat ctaaccgact aacgaggtcg 60
tcatcctcca ganccaaagc aaacacaata cctgcaagtg ctgctttcat agatggggtg 120
gccaccaaag acatcaacat tgatcctagt acactggttt cccttcgaat ttttctgcca 180
gaaccagaat tagcagtagc agccctccca cagggccagc agagagattc aagggatgtg 240
gatctcactc tcagagcttg gggtttggcc aagagagaaa ccatacatga agtggattca 300
gctcaaagtt ccaggaaatt gaacaggaat gccagtacta atgcaaaagg ggtcagttac 360
gaatcaagtt caacacacag gaggaaaagt tatggtggta ttgtagaaaa catgagaagc 420
agctatggtg gtattgggga caagcagcag cgagtttgtc aggattcgtt gctcagacaa 480
ggaatngacg aagacgtgaa tccatcagag aaggatagtt tgctcaggtg tacggataag 540
acatcaggat caggccaaat aaatgat 567
<210> 14
<211> 219
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (109)..(109)
<223> n is a or g
<400> 14
agacaaaaca ccaataaagg tgacatggtt tggtttcaca tgggagttcc tcatatgttc 60
aaacagcttg agagccttct ttccttgacc atgtatggca tatcctacna tcatggcagt 120
ccatgaaaca acatttcgcc taggcatttt gtcaaatact ccgcatgcat cctctaggca 180
cccacatttg gcatacatat ctataagggc attccccac 219
<210> 15
<211> 221
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (143)..(143)
<223> n is c or t
<400> 15
ggtatctaac tctgtattgt caacaatttt tgcctgctgt ttggatgtca aaagacctgc 60
atcagaagat tcattgcaag tttgtgcagc tgtgaaatcc tcatttctga ccaccatctc 120
tgaatccctt ttgcaatgat atncatgctt tccagtatgg gatacgccct ccagtctgtt 180
aaattttggt tccagaacaa cagttgtttc ccctttatct g 221
<210> 16
<211> 1226
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (125)..(125)
<223> n is g or t
<221> misc_feature
<222> (527)..(527)
<223> n is g or t
<221> misc_feature
<222> (895)..(895)
<223> n is c or g
<221> misc_feature
<222> (1087)..(1087)
<223> n is g or t
<400> 16
ggaatgtggt gtcatggaat gcaataattg cggcatctat acagcatggt tgggttaagc 60
aggcattgga attatttgct gaaatgcaat tagcaggcat aaaggcgagc cacgttactt 120
ttggnctcgt tctcaacgcg tttgcgattc taggaaacct cgaacagggt aagcaactcc 180
atgcctgtgt gtttagaaat ggatttggat cggatttgct tgtgggtagc tctgttattc 240
taatgtatat aaagtgtgga aacatagatg gtgcccgcca agtgtttgac aaaatgccta 300
tggtagacct ggggttattc aatgctacaa ttcaaggata tagcagcgag ggccataata 360
atgaggcaat ggaactattt ggtcaactac tacgaacggg tttgaaacca aatgatatta 420
ccttcaccat tgttcttaga gcctgtgcca gctttgaaac tgcccttgaa cagggcaggc 480
aactccatgt acagataatg aaatctgtgt ttcagcacaa tgtttcngtg attagttctc 540
tcatcactgt gtatgctaaa agtggttgca tagctgatgc acaaaaagtg tttgacaaga 600
tgactgtaca aaacaatgta gttttatgga ctgcattgat agctggttac acccagaaag 660
ggtatacaga cgaagccttg aaactctttc tcttaatgca acgtgcaggt gttaaaccta 720
atcaattcac ctacccaact gttctccgtg cttgtgcaaa cttagcttct atagaagaag 780
gaaaacaagt ccattgtcat attatcaaag ctggggctgc gtcggatact ttcgttgcca 840
gcgccattgt tgacatgtat attaagtgcg gtagtctaga ggatggccaa cgagngtttg 900
atagaattcc tagacgagac attgtatcgt ggaatacaat gattgcagga tatgctcaac 960
atgggtatgt tgacaaggca cttttaatct ttgaagaaat gcaacagtat ggcatgaaac 1020
ccaaccacgt aacttttgtt gcagttctct ctgcatgcag ccatggagga cttgttcgtt 1080
tagggcnccg atactttagt tccttgagcc gaactcatgg cattattcca agaatggagc 1140
attatgcttg catagttgac ctccttggcc gtgttgggca cttgtatgag gcagaagact 1200
ttatcaacaa tatgcctttc gagcca 1226
<210> 17
<211> 218
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (138)..(138)
<223> n is a or g
<400> 17
ttgcagcgca tgccttaaga acacagggaa atgtgaaatt atccggtcgc aaacccatcc 60
cttgcatctg atcatataat gcaagcgcct ctctgcagcc accctgccta gcatacgctc 120
caatcatggc attccacnta aacttattcc tttgagccat tttgtcaaac acttcgcgtg 180
catattccaa gtttccacac tttgcataca tggtgatg 218
<210> 18
<211> 217
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<221> misc_feature
<222> (97)..(97)
<223> n is a or c
<400> 18
aaattggttt tttgttttag aaaagacata atagaagaac tgagaatcat tgaaatacat 60
aaagagcagt tttgtggaag actgatgtct gctcgtntag ttactctaat aacacagatc 120
atggtaatgg cggtggtgat atcagttatt ttgcttttcc ttggaattgg aattctggtt 180
tcaattcatc tgtgtattgt gggtagagca ctcagga 217
<210> 19
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
ccacaccacc aaagctcaat 20
<210> 20
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
taagggatgt gatggctgct 20
<210> 21
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 21
catgccgccc tcaaatatgc 20
<210> 22
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 22
aaacagcgac caacacagtg 20
<210> 23
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 23
tcagaggaag gggagttgga 20
<210> 24
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 24
tggggtcgag gaaaaggttt 20
<210> 25
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 25
gagttaaact tggagacggc t 21
<210> 26
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 26
agaggtggag gaagcataag g 21
<210> 27
<211> 19
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 27
gaaccgactc tggcatcca 19
<210> 28
<211> 20
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 28
ctgtaccctc tgagcatcca 20
<210> 29
<211> 25
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 29
atgcagaaga ttgtggatca aactg 25
<210> 30
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 30
ttgggtgtga tctgtcttga catg 24
<210> 31
<211> 22
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 31
atggaaaacc tccccaatca gc 22
<210> 32
<211> 30
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 32
aacatcttga tcttggaaat tccacaaatc 30
<210> 33
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 33
atggagaaat cctcccaaca ggag 24
<210> 34
<211> 21
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 34
aaagttccac aactctgtgg g 21
<210> 35
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 35
atggataatg gatcctctct cgtg 24
<210> 36
<211> 27
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 36
aaaaccagac caaagaaatc cttcctc 27
<210> 37
<211> 24
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 37
atgccgaatc cgaagaagaa tgtg 24
<210> 38
<211> 22
<212> DNA
<213> Artificial Sequence (Artificial Sequence)
<400> 38
attgacgatg agttgtcgag gg 22

Claims (10)

1. A method for screening a Pinus massoniana high-lipid-production functional SNP marker is characterized by comprising the following steps:
(1) selection of materials based on phenotype
Collecting turpentine of m clone pinus massoniana, selecting n plants from each clone pinus massoniana, respectively calculating the lipogenic power of each clone pinus massoniana according to a lipogenic power calculation formula, and calculating the average lipogenic power of each plant; then, respectively taking the plant with the largest lipogenic power of each clone pinus massoniana as a high-lipogenic plant, taking the plant with the smallest lipogenic power as a low-lipogenic plant, and taking the plant with the lipogenic power equal to or close to the average lipogenic power as a medium-lipogenic plant; respectively collecting secondary xylem tissue materials of high, medium and low-lipid-producing plants of each clone pinus massoniana; wherein m is more than or equal to 3, and n is more than or equal to 3; the calculation formula of the fat-producing power RYC is as follows:
Figure FDA0003485177160000011
in the formula: wt represents the total weight of rosin; d represents the number of times each tree cuts resin; wd represents the width of the cutting surface; c represents the trunk circumference where the bark is cut;
(2) screening candidate genes based on multigroup data joint analysis
Respectively carrying out transcriptome and metabolome sequencing on the tissue materials of the high, medium and low-lipid-producing plants of each clone masson pine acquired in the step (1), and simultaneously carrying out PacBio full-length transcriptome sequencing after mixing all the tissue materials; then carrying out differential expression analysis by utilizing multiple groups of mathematical data, and carrying out preliminary screening according to the combined analysis of the transcriptome and the metabolome to obtain candidate genes;
(3) qRT-PCR verification of gene expression level
Respectively extracting RNA of tissue materials of high, medium and low-lipid-producing plants of each clone pinus massoniana, reversely transcribing the RNA into cDNA, verifying the expression quantity of the candidate gene obtained by screening in the step (2) by adopting a qRT-PCR method, and selecting a gene with differential expression for developing SNP sites;
(4) development of SNP sites
Respectively extracting DNA of tissue materials of high-, medium-and low-adipogenic plants of each clone pinus massoniana, then carrying out PCR full-length amplification, directly carrying out sequence mutual comparison of the high-, medium-and low-adipogenic plants to develop SNP molecular markers, and screening to obtain SNP loci;
(5) development of lipogenic trait functional SNP (single nucleotide polymorphism) site
And (4) carrying out correlation analysis on the genotypes of the high, medium and low lipid-producing plants of each clone pinus massoniana according to the SNP loci screened in the step (4) to obtain SNP markers which are obviously related to lipid-producing traits, and further screening to obtain the high lipid-producing function SNP markers of pinus massoniana.
2. The method for screening Pinus massoniana high lipid production functional SNP markers according to claim 1, which is characterized in that:
the value range of m in the step (1) is as follows: m is more than or equal to 20;
the value range of n in the step (1) is as follows: n is more than or equal to 5;
the candidate genes in the step (2) are more than 3 candidate genes.
3. The method for screening Pinus massoniana high lipid production functional SNP markers according to claim 2, wherein the method comprises the following steps:
the value range of m in the step (1) is as follows: m is more than or equal to 50;
the value range of n in the step (1) is as follows: n is more than or equal to 5 and less than or equal to 20;
the candidate genes in the step (2) are more than 5 candidate genes.
4. The method for screening Pinus massoniana high lipid production functional SNP markers according to claim 3, wherein the method comprises the following steps:
the candidate genes in the step (2) are genes MG7, MG25, MG26, MG27 and MG36, and the nucleotide sequences are shown in SEQ ID NO. 1-5.
5. The method for screening Pinus massoniana high lipid production functional SNP markers according to claim 1, which is characterized in that:
the collection time of the turpentine in the step (1) is 7-9 months per year;
the height of the tissue material collecting part in the step (1) is the same as that of the fat cutting part, and the collecting time is clear at 12-1 am;
selecting a plant corresponding to the median of the lipid-producing ability as a medium lipid-producing ability plant from the medium lipid-producing ability plants in the step (1);
the clonal masson pine in the step (1) is grown for more than 10 years in age or reaches more than 16cm in breast diameter;
the tissue material in step (1) comprises secondary xylem, mature coniferous leaves, mature shoots or immature shoots.
6. The method for screening Pinus massoniana high lipid production functional SNP markers according to claim 1, which is characterized in that:
the primer sequence used in the qRT-PCR method in the step (3) is shown as SEQ ID NO. 19-28;
the amplification reaction procedure in the qRT-PCR method in the step (3) is as follows: the denaturation temperature lasts for 10 seconds at 95 ℃, the annealing temperature lasts for 30 seconds at 57-62 ℃, and the extension temperature lasts for 20 seconds at 72 ℃;
the primer sequence used for the PCR full-length amplification in the step (4) is shown as SEQ ID NO. 29-38;
the reaction procedure of the PCR full-length amplification in the step (4) is as follows: the pre-denaturation temperature is 94 ℃ for 5 minutes, the denaturation temperature is 94 ℃ for 1 minute, the annealing temperature is 55-63 ℃ for 30 seconds, and the extension temperature is 72 ℃ for 1-3 minutes.
7. The screening method of the Pinus massoniana high lipid production function SNP marker according to claim 1, characterized by further comprising at least one of the following steps after the step (2) and before the step (3):
a. performing function and metabolic pathway annotation on the differential genes by using GO and KEGG databases, selecting the functional genes and performing expression analysis by using the FPKM value of the candidate genes;
b. collecting other tissue materials of high, medium and low-adiposity plants of each clone pinus massoniana, analyzing the expression quantity of candidate genes, namely performing expression verification on the candidate genes;
c. and obtaining a full-length sequence of the candidate gene according to a PacBio full-length transcriptome sequencing database, and analyzing and determining the gene family of the differential gene by using NCBI conserved domain search software and a Pfam conserved domain database.
8. The method for screening Pinus massoniana high lipid production functional SNP markers according to claim 1, which is characterized in that: after the step (4) and before the step (5), the method further comprises a step of verifying the polymorphism of the SNP site in the population, which comprises the following steps:
designing primers at two end sequences of a site where the SNP marker is located, then extracting genome DNA of masson pine germplasm resources, performing PCR amplification and sequence comparison, and verifying the correlation between the SNP marker and the lipogenic property.
9. The use of the method for screening Pinus massoniana high-lipid-production functional SNP markers according to any one of claims 1 to 8 for screening Pinus massoniana high-lipid-production functional SNP markers or Pinus massoniana breeding.
10. A Pinus massoniana high-lipid-production functional SNP marker is characterized in that: the SNP markers are any one of the following:
(1) the nucleotide sequence shown in SEQ ID NO.6, wherein the allele in the 76 th base from the 5' end is T/C;
(2) the nucleotide sequence shown in SEQ ID NO.7, wherein alleles in the 78 th base and the 1073 rd base from the 5' end are both T/C;
(3) the nucleotide sequence shown in SEQ ID NO.8, wherein the allele in the 169 th base from the 5' end is C/T;
(4) the nucleotide sequence shown in SEQ ID NO.9, wherein the allele in the 128 th base from the 5' end is T/G;
(5) a nucleotide sequence shown in SEQ ID NO.10 in which the allele in the 103 th base from the 5' end is G/T or the allele in the 194 th base is A/G;
(6) the nucleotide sequence shown in SEQ ID NO.11, wherein the allele in the 130 th base from the 5' end is G/T;
(7) the nucleotide sequence shown in SEQ ID NO.12, wherein the allele in the 29 th base from the 5' end is C/T;
(8) the nucleotide sequence shown in SEQ ID NO.13, wherein the allele in the 73 rd base from the 5' end is T/A, or the allele in the 486 th base is G/C;
(9) the nucleotide sequence shown in SEQ ID NO.14, wherein the allele in the 109 th base from the 5' end is A/G;
(10) the nucleotide sequence shown in SEQ ID NO.15, wherein the allele in the 143 th base from the 5' end is C/T;
(11) a nucleotide sequence shown in SEQ ID NO.16, wherein the allele at base 125 from the 5' end is G/T, the allele at base 527 is G/T, the allele at base 895 is G/C, or the allele at base 1087 is T/G;
(12) the nucleotide sequence shown in SEQ ID NO.17, wherein the allele in the 138 th base from the 5' end is A/G;
(13) the nucleotide sequence shown in SEQ ID NO.18, wherein the allele in the 97 th base from the 5' end is A/C.
CN202210079030.0A 2022-01-24 2022-01-24 Screening method and application of Pinus massoniana high-fat-production functional SNP marker Active CN114255822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210079030.0A CN114255822B (en) 2022-01-24 2022-01-24 Screening method and application of Pinus massoniana high-fat-production functional SNP marker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210079030.0A CN114255822B (en) 2022-01-24 2022-01-24 Screening method and application of Pinus massoniana high-fat-production functional SNP marker

Publications (2)

Publication Number Publication Date
CN114255822A true CN114255822A (en) 2022-03-29
CN114255822B CN114255822B (en) 2023-04-07

Family

ID=80796776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210079030.0A Active CN114255822B (en) 2022-01-24 2022-01-24 Screening method and application of Pinus massoniana high-fat-production functional SNP marker

Country Status (1)

Country Link
CN (1) CN114255822B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110129420A (en) * 2019-05-24 2019-08-16 南京林业大学 A kind of masson pine SNP methods of genotyping based on HRM technology
AU2019101185A4 (en) * 2019-10-02 2020-01-23 Li, Feiqi MISS A method for predicting genetic risk of cyclosporine drug response based on next-generation sequencing
CN110846429A (en) * 2019-05-23 2020-02-28 北京市农林科学院 Corn whole genome InDel chip and application thereof
CN111996264A (en) * 2020-09-17 2020-11-27 湖北省农业科学院畜牧兽医研究所 Application of pig SNP molecular marker in pig breeding character screening and pig breeding
CN112342302A (en) * 2020-11-27 2021-02-09 广西壮族自治区水牛研究所 Method for identifying candidate gene marker of milk production traits of buffalo and application
CN112575010A (en) * 2020-12-14 2021-03-30 云南农业大学 Reference gene for fluorescence quantification of different tissues of Chinese yam as well as primer and application thereof
CN113151542A (en) * 2021-03-18 2021-07-23 西南林业大学 Development method and application of pinus armandi genome SNP

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110846429A (en) * 2019-05-23 2020-02-28 北京市农林科学院 Corn whole genome InDel chip and application thereof
CN110129420A (en) * 2019-05-24 2019-08-16 南京林业大学 A kind of masson pine SNP methods of genotyping based on HRM technology
AU2019101185A4 (en) * 2019-10-02 2020-01-23 Li, Feiqi MISS A method for predicting genetic risk of cyclosporine drug response based on next-generation sequencing
CN111996264A (en) * 2020-09-17 2020-11-27 湖北省农业科学院畜牧兽医研究所 Application of pig SNP molecular marker in pig breeding character screening and pig breeding
CN112342302A (en) * 2020-11-27 2021-02-09 广西壮族自治区水牛研究所 Method for identifying candidate gene marker of milk production traits of buffalo and application
CN112575010A (en) * 2020-12-14 2021-03-30 云南农业大学 Reference gene for fluorescence quantification of different tissues of Chinese yam as well as primer and application thereof
CN113151542A (en) * 2021-03-18 2021-07-23 西南林业大学 Development method and application of pinus armandi genome SNP

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
白青松;刘晚传;覃冀;潘庆优;蓝燕群;张谦;何波祥;: "马尾松优树群体特征分析" *
陈晓明;李魁鹏;陈博雯;刘青华;周志春;: "马尾松转录组SSR序列特征分析及其分子标记开发" *

Also Published As

Publication number Publication date
CN114255822B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Sato et al. Comprehensive structural analysis of the genome of red clover (Trifolium pratense L.)
Joly et al. Polyploid and hybrid evolution in roses east of the Rocky Mountains
CN107190062B (en) Screening and application of fluorescent quantitative reference genes of pear fruits in different development periods
WO2017092110A1 (en) Sesamum indicum inflorescence definite gene sidt1 and snp marker thereof
Kuhn et al. Application of genomic tools to avocado (Persea americana) breeding: SNP discovery for genotyping and germplasm characterization
Le Provost et al. Seasonal variation in transcript accumulation in wood-forming tissues of maritime pine (Pinus pinaster Ait.) with emphasis on a cell wall glycine-rich protein
CN110343767B (en) Specific primer of microsatellite molecular marker of litopenaeus vannamei and application of specific primer in genetic diversity analysis
Diaz et al. Identification of Phoenix dactylifera L. varieties based on amplified fragment length polymorphism (AFLP) markers
CN113151542B (en) Development method and application of Huashansong genome SNP
CN113604580A (en) Primer and kit for identifying rose hip genotype by whole blood method and application
CN112538535B (en) Molecular marker related to hair yield of long-hair rabbits and application of molecular marker
CN110564867B (en) SNP molecular marker of Qinchuan cattle CFL1 gene and detection method thereof
CN114255822B (en) Screening method and application of Pinus massoniana high-fat-production functional SNP marker
KR20210110521A (en) InDel Markers for Discrimination of Cynanchum wilfordii and Cynanchum auriculatum and Method for Use thereof
CN116479164B (en) SNP locus, molecular marker, amplification primer and application of SNP locus and molecular marker related to soybean hundred-grain weight and size
CN116970734B (en) SNP locus linked with cotton multi-ventricular control gene GaMV and application thereof
CN113005214B (en) Molecular marker for screening new drought-resistant Chinese white poplar variety, and combination, method and application thereof
CN112080497B (en) SNP (Single nucleotide polymorphism) site primer combination for identifying watermelon germplasm authenticity and application
CN114107538B (en) Core primer group based on wolfberry variety SSR (simple sequence repeat) markers and application thereof
CN108315435A (en) With the relevant SNP marker of sheep litter size character and application
KR102412793B1 (en) SNP genetic markers and primer sets for discriminating domestic wheat cultivar and uses thereof
KR101845256B1 (en) SNP markers associated with drought tolerance of Populus davidiana Dode and its use
CN117363782A (en) InDel locus related to soybean kernel protein content, molecular marker, primer pair and application thereof
Zhang et al. Full-length transcriptome sequence and SSR marker development for genetic diversity research in yellowfin seabream Acanthopagrus latus
CN115927657A (en) Molecular marker related to chicken feather, primer and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant