CN111534631B

CN111534631B - 2 SNP molecular markers related to oil content of oil-tea camellia kernel and application thereof

Info

Publication number: CN111534631B
Application number: CN202010479308.4A
Authority: CN
Inventors: 林萍; 常君; 王开良
Original assignee: Research Institute of Subtropical Forestry of Chinese Academy of Forestry
Current assignee: Research Institute of Subtropical Forestry of Chinese Academy of Forestry
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2023-03-21
Anticipated expiration: 2040-05-29
Also published as: CN111534631A

Abstract

The invention relates to the technical field of molecular markers, in particular to 2 SNP molecular markers related to the oil content of oil-tea camellia kernels and application thereof. The invention provides 2 SNP molecular markers PB.70158.1-930 or PB.70158.1-935 highly correlated to the oil content of oil-tea camellia kernels, which can explain the phenotypic variance of the oil content of 25.13% and 23.88%, respectively. By detecting the SNP molecular marker, identification and auxiliary screening can be carried out in the seedling stage, so that the production cost is greatly saved and the selection efficiency is improved.

Description

2 SNP molecular markers related to oil content of oil-tea camellia kernel and application thereof

Technical Field

The invention relates to the technical field of molecular markers, in particular to 2 SNP molecular markers related to the oil content of oil-tea camellia kernels and application thereof.

Background

Camellia oleifera (Camellia oleifera Abel.) belongs to the genus Camellia (Camellia L.) of the family Theaceae, and is a woody oil tree species. The camellia seed oil contains rich nutrient substances, is high-quality edible oil, has the unsaturated fatty acid content of more than 90 percent, and mainly contains oleic acid and linoleic acid. The camellia seed oil has the effects of resisting oxidation, resisting tumors, reducing blood fat and the like, and has higher nutritional and health-care values. At present, oil tea breeding taking selection and cross breeding as main means and fruit yield as main breeding purposes has made important progress, but breeding research aiming at improving oil content of kernel and improving oil quality is still less. The conventional breeding period of the camellia oleifera is long, the new variety breeding is slow, and the improved variety breeding speed cannot meet the requirement of industrial development, which becomes one of the important factors for limiting the development of the camellia oleifera industry.

Compared with the traditional breeding technology, the molecular marker assisted breeding can be selected from the seedling stage, the breeding period is greatly shortened, and the advantages of the molecular marker assisted breeding on economic forests mainly aiming at fruits are particularly obvious. The molecular marker assisted breeding cannot be separated from effective molecular markers, so that the development of the molecular marker related to the oil content and the oil quality phenotype of the oil-tea camellia kernel has important significance for molecular marker assisted breeding of oil-tea camellia oil yield and quality and genetic improvement of related characters.

The yield (oil) of the oil tea in unit area is directly determined by indexes such as fruit yield, fresh fruit seed yield, kernel oil content and the like. Therefore, the development of the oil content research of the oil-tea camellia kernel is one of the important ways for improving the oil-tea camellia yield, and has very important significance for the promotion and the healthy development of the oil-tea camellia industry.

Disclosure of Invention

One of the purposes of the invention is to provide an SNP molecular marker related to the oil content of the kernel of the camellia oleifera seed, and the other purpose of the invention is to provide the application of the SNP molecular marker in the identification and breeding of the oil content phenotype of the camellia oleifera.

The development method of the loci related to the oil content of the oil-tea camellia kernels is based on that the oil-tea camellia is a typical outcrossing species, and Linkage Disequilibrium (LD) is usually reduced rapidly in a small range, so that LD mapping of important characters can be carried out. The complete transcript of the kernel of Camellia oleifera serves as the region for marker development in the present invention. On the premise of having a natural population of the oil-tea camellia which generates a large amount of obvious genetic variation, the development of the marker which is obviously related to the variation of the oil content of the oil-tea camellia kernel can be effectively developed.

The development process of the SNP molecular marker in the invention is basically as follows:

(1) Oil tea germplasm resources are widely collected in an oil tea full-distribution area, and an oil tea natural population with widely separated kernel oil content is established.

(2) Completely mature seeds of 500 parts of oil tea germplasm of a natural population are collected, and the oil content of kernels is measured by a Soxhlet extraction method.

(3) Collecting kernels of 500 oil-tea camellia single plants in a high-speed oil synthesis period of a natural population, extracting total RNA by adopting an RNAprep Pure polysaccharide polyphenol plant total RNA extraction kit (centrifugal column type, TIANGEN kit Code No. DP441), respectively constructing a cDNA library for each sample, and utilizing Illumina HiSeq ^TM The 4000 platform performed next generation transcriptome sequencing.

(4) Collecting roots, tender leaves, mature leaves, petals and immature seeds of the camellia oleifera 'Changlin No. 4', respectively extracting RNA by using an RNAprep Pure polysaccharide polyphenol plant total RNA extraction kit (centrifugal column type, TIANGEN kit Code No. DP441), mixing RNA of each tissue in equal proportion, constructing a PacBio SMRTbell library, and carrying out three-generation transcriptome sequencing on a PacBio sequential platform. After low quality data and redundant sequences were filtered out of the sequencing results, annotation analysis was performed on all transcripts. The software used in the process is LoRDEC (http:// www.atgc-montpellier. Fr/LoRDEC /), CD-HIT v4.6 (Fu L, niu B, zhu Z, wu S, li W,2012.CD-HIT: accessed for the purpose of specifying the new-generation sequencing data. Bioinformatics 28,3150-2.), coding Positional Calcium (CPC) (Kong L, zhang Y, ye Z-Q, et al, CPC 2007.2007.for the protein-Coding sequence of Coding sequences and Coding vector machine nucleic Acids// nucleic Acids Research 35, wcoding-Coding 345 and Coding-Coding of Coding sequences (Nonflex/CI), which is disclosed by No..

(5) And (3) analyzing the SNP sites of the 500 sample transcriptome sequences obtained in the step (3) by using the full-length transcriptome sequences obtained in the step (4) as reference sequences and adopting a multi-sequence alignment method. SNP data were strictly filtered according to the following principles: each site has only 2 alleles; the genotype deletion rate is less than or equal to 20 percent; the minimum allele frequency is more than or equal to 5 percent; the SNP mass value is more than or equal to 100; the number of homozygous genotype samples exceeds 10; the heterozygous genotype rate is less than or equal to 70 percent. Software bcftools v1.9 software (http:// www.htslib.org/doc/bcftools. Html) is used in the process for public free.

(6) The genotype data of the population was imported into GCTAv1.25.2 (Jianan Y, S Hong L, goddard ME, visscher PM,2011. GCTA. A tool for genome-wide complex analysis. American Journal of Human Genetics 88,76-82.) software for Principal Component Analysis (PCA).

(7) The genotype data of the population, the data of the first 10 main components (PC), the phenotype data of the kernel oil content and the Kinship matrix data are input into TASSEL5.0 (http:// www.maizegenetics.net/TASSEL) software, and the linkage imbalance of the SNPs markers and the oil content characters of the oil-tea kernels is analyzed by adopting a uniform mixed linear model method (MLM).

By utilizing the technical measures, the invention finally obtains the extremely obvious correlation (P) with the oil content of the oil-tea camellia kernel<10 ^-5 ) The 2 SNP markers PB.70158.1-930 and PB.70158.1-935 (see Table 1) in the same transcript have contribution rates of 25.13% and 23.88% to phenotypic variation, respectively (Table 1).

TABLE 1 SNP molecular marker information

Serial number

Associative SNP

REF

ALT

Position of

P value

Contribution ratio (R) ² ,％)

1

PB.70158.1-930

C

G

downstream

3.53E-14

25.13

2

PB.70158.1-935

T

G

downstream

2.08E-13

23.88

Specifically, the invention provides the following technical scheme:

in a first aspect, the invention provides SNP molecular markers related to the oil content of oil-tea camellia kernels, which comprise PB.70158.1-930 or PB.70158.1-935;

wherein the SNP molecular marker PB.70158.1-930 comprises a nucleotide sequence with the polymorphism of C/G at the 930 th site of the sequence shown as SEQ ID NO. 3; the SNP molecular marker PB.70158.1-935 contains a nucleotide sequence with polymorphism T/G at position 935 of the sequence shown in SEQ ID NO. 3.

Further, the SNP molecular marker related to the oil content of the oil tea seed kernel can be obtained by PCR amplification of a primer pair with a nucleotide sequence shown as SEQ ID NO.1-2 and oil tea cDNA as a template.

In the SNP molecular marker PB.70158.1-930, the genotype of a site with the polymorphism is C/C, which corresponds to high oil content; genotype is C/G, corresponding to candidate high oil content; genotype is G/G, corresponding to low oil content;

in the SNP molecular marker PB.70158.1-935, the genotype of a site with the polymorphism is T/T, which corresponds to high oil content; genotype is T/G, corresponding to candidate high oil content; genotype is G/G, corresponding to low oil content.

The oil content of the invention is the oil content of the kernel.

The SNP molecular markers can be used for identifying the oil-containing rate phenotype of the oil-tea camellia kernel independently or jointly, and the identification accuracy is higher when the SNP molecular markers are used jointly.

The invention also provides an SNP molecular marker combination related to the oil content of the oil-tea camellia kernel, which comprises PB.70158.1-930 and PB.70158.1-935;

wherein PB.70158.1-930 and PB.70158.1-935 are as described above.

In a second aspect, the present invention provides primers for amplifying the SNP molecular markers or a combination thereof.

As an embodiment of the present invention, the primer includes a primer shown in SEQ ID NO. 1-2.

The invention also provides a reagent or a kit containing the primer.

In a third aspect, the present invention provides any one of the following applications of the SNP molecular marker or the SNP molecular marker combination or the primer or the reagent or the kit:

(1) The application in identifying the oil content phenotype of the oil tea kernel;

(2) The application in the identification, improvement or molecular marker-assisted breeding of the oil tea germplasm resources, wherein the traits of the identification, improvement or molecular marker-assisted breeding of the oil tea germplasm resources are the oil content of oil tea kernels;

(3) The application in early prediction of the oil content of the oil-tea camellia kernel;

(4) The application in screening the oil-tea camellia with high oil content.

When 2 SNP molecular markers provided by the invention are used for oil content phenotype identification or molecular marker-assisted breeding in oil tea kernels, one skilled in the art can select any one molecular marker or a combination of two molecular markers as required. The two are used together, so that the identification accuracy is higher.

In a fourth aspect, the present invention provides a method of identifying an oil content phenotype of an oil camellia kernel, comprising:

(1) Extracting total RNA of the camellia oleifera to be identified, and synthesizing cDNA through reverse transcription;

(2) Using cDNA as a template, and performing PCR amplification by using the primer;

(3) Analyzing the genotype of the SNP molecular marker or the SNP molecular marker combination in the PCR amplification product, and judging the kernel oil content phenotype of the oil tea to be identified according to the genotype.

In step (1) of the above method, the camellia oleifera to be identified may be any breeding material, including natural population individuals and sexual population individuals.

The total RNA of the camellia oleifera is extracted by adopting an RNAprep Pure polysaccharide polyphenol plant total RNA extraction kit (centrifugal column, TIANGEN kit Code No. DP441). Synthesis of single-stranded cDNA by reverse transcription Using PrimeScript RT Master Mix kit (TaKaRa, dalian, china).

In the step (2), the reaction procedure of the PCR amplification is as follows: 94-95 ℃ for 3-5 min; 94-95 ℃, 15-30s, 65-69 ℃, 40-60s, 38-45 cycles; 67-70 ℃ for 3-6 min. Preferably, the pre-denaturation is carried out at 95 ℃ for 3min in 1 cycle; denaturation at 95 ℃ for 15s, elongation at 68 ℃ for 45s, and 40 cycles; at 68 ℃ for 5min,1 cycle was fully extended.

In step (2), after the amplification, the resulting PCR product is detected and recovered by agarose gel electrophoresis.

In one embodiment, the agarose gel electrophoresis is performed at an agarose gel concentration of 1.2%. Gel recovery Using AxyPrep DNA gel recovery kit (AxyGEN, code No. AP-GX-50).

In the step (3), the genotype of the SNP molecular marker can be analyzed by adopting the conventional technical means in the field, such as sequencing and the like, and sequencing can be carried out by taking SEQ ID NO.1-2 as a sequencing primer.

The method for judging the oil content phenotype of the kernel of the camellia oleifera to be identified in the step (3) comprises the following steps:

if the genotype of the site with the polymorphism of the SNP molecular marker PB.70158.1-930 is C/C, the oil tea to be identified has high oil content; if the genotype is C/G, the oil tea to be identified is a candidate high oil content; if the genotype is G/G, the oil tea to be identified has low oil content; and/or the presence of a gas in the gas,

if the genotype of the SNP molecular marker PB.70158.1-935 at the site with the polymorphism is T/T, the oil tea to be identified has high oil content; if the genotype is T/G, the oil tea to be identified is a candidate high oil content; and if the genotype is G/G, the oil-tea camellia to be identified has low oil content.

The invention provides a method for identifying oil-tea trees with high oil content, which comprises the following steps:

(3) Analyzing the genotype of the SNP molecular marker or the SNP molecular marker combination in the PCR amplification product, and judging whether the oil tea to be identified is the oil tea with high oil content or not according to the genotype.

If the genotype of the SNP molecular marker PB.70158.1-930 with the polymorphic locus is C/C, the oil tea to be identified has high oil content; if the genotype is C/G, the oil tea to be identified is a candidate high oil content; and/or the presence of a gas in the gas,

if the genotype of the SNP molecular marker PB.70158.1-935 at the site with the polymorphism is T/T, the oil tea to be identified has high oil content; and if the genotype is T/G, the oil tea to be identified is a candidate high oil content.

The invention has the beneficial effects that: the invention develops 2 SNP loci highly related to the oil content of the kernel of the camellia oleifera and can respectively explain the phenotypic variance of the oil content of 25.13 percent and 23.88 percent. The 2 markers are used for carrying out auxiliary selection on the sexual oil tea group, and the result shows that 79.76% of individuals with high oil content genotype at the same time have kernel oil content higher than the average oil content of the group kernel; of the individuals with both loci having a high oil content or a candidate high oil content genotype, 71.27% of the individuals have a kernel oil content higher than the population mean. This indicates that the marker is useful for aiding selection.

In the conventional selection breeding of the oil tea, the identification of the oil content character of the kernel requires 5 to 6 years of seedling afforestation, and wastes time and labor. The SNP locus position in the invention is definite, the detection method is convenient and quick, is not influenced by the environment, and has stronger purposiveness, small workload, higher efficiency and low cost. Therefore, by detecting the SNP locus, identification and auxiliary screening can be carried out in the seedling stage, the production cost is greatly saved, and the selection efficiency is improved. In the breeding of the camellia oleifera, the molecular marker and the detection method thereof can be selected to identify the camellia oleifera with high oil content for breeding, so that the selection efficiency of the breeding of the camellia oleifera can be improved, and the breeding process is accelerated.

Detailed Description

The following examples are intended to illustrate the invention but are not intended to limit the scope of the invention. Unless otherwise specified, the technical means used in the examples are conventional means well known to those skilled in the art.

500 individual plants of natural population materials used in the following examples were collected, evaluated and stored in germplasm resource gardens of the eastern Hongling farm in the Jinhua Wutomu Wuzhou area, zhejiang by woody oil breeding and cultivation research groups of subtropical forestry research institute, china forestry scientific research institute.

Example 1 construction of oil Camellia seed kernel oil content segregation population and trait determination

In the embodiment, natural groups of 500 germplasm resources in a common oil tea resource collection garden are used, and the origin of the natural groups covers most of the main production areas of oil tea in China, including Zhejiang province, hunan province, jiangxi province, guangxi province, fujian province, guangdong province and the like. After the fruits of 500 individuals are completely ripe (5% of fruits are cracked), collecting seeds, and measuring the oil content of kernels by a Soxhlet extraction method. The operation steps are as follows:

(1) Preparing a medium-speed filter paper bag, putting the medium-speed filter paper bag into an aluminum box, baking the medium-speed filter paper bag to constant mass at 105 ℃, and recording the mass (W) of the aluminum box and the filter paper bag ₁ )。

(2) Removing hard seed coat from appropriate amount of oil tea seed, oven drying at 105 deg.C to constant mass, pulverizing with pulverizer, packaging in filter paper bag, and recording total mass (W) of aluminum box, filter paper bag and sample ₂ )。

(3) Adopting a Switzerland Buchi Soxhlet extractor B-811LSV, putting a weighed sample filter paper bag into a leaching bottle, adding about 100ml of petroleum ether, extracting for 6h, recovering the petroleum ether, putting the filter paper bag (with residues inside) into an aluminum box, drying at 105 ℃ to constant mass, and recording the mass of the aluminum box, the filter paper bag and the residues (W) ₃ )。

Kernel oil content = [ (W) ₂ -W ₃ )/(W ₂ -W ₁ )]×100％

The oil content determination result of the oil tea seed kernel shows that: the oil content of the kernels in the natural population is normally distributed, which shows that the character has the characteristic of quantitative character.

Example 2 sequencing and Annotation analysis of Camellia oleifera Trigeneration transcriptome

1. Extracting RNA of a third-generation sequencing sample:

collecting roots, tender leaves, mature leaves, petals and immature seeds of 'Changlin No. 4' of camellia oleifera, and respectively extracting RNAs by using an RNAprep Pure polysaccharide polyphenol plant total RNA extraction kit (centrifugal column type, TIANGEN kit Code No. DP441), wherein the method comprises the following specific steps:

(1) First, 500. Mu.l of lysis solution SL (checked for the addition of. Beta. -mercaptoethanol before use) was added to a 1.5ml centrifuge tube. Adding 0.1g of sample material into liquid nitrogen for fully grinding, quickly adding the ground sample powder into a centrifuge tube, and immediately and violently shaking and uniformly mixing by vortex.

(2) Centrifuge at 12000rpm for 2 minutes.

(3) The supernatant was transferred to the filtration column CS (the filtration column CS was placed in the collection tube), centrifuged at 12000rpm for 2 minutes, and the supernatant from the collection tube was carefully pipetted into a fresh RNase-Free centrifuge tube, avoiding the tip from contacting the cell debris pellet in the collection tube as much as possible.

(4) Slowly adding 0.4 times of the volume of the supernatant of absolute ethyl alcohol, mixing (at this time, precipitation may occur), transferring the obtained solution and the precipitation into an adsorption column CR3, centrifuging at 12000rpm for 15 seconds, pouring off waste liquid in the collection tube, and returning the adsorption column CR3 to the collection tube.

Note that if there is a loss in the volume of the fruit supernatant, please adjust the amount of ethanol added accordingly.

(5) 350 μ l of deproteinizing solution RW1 was added to the adsorption column CR3, and centrifuged at 12000rpm for 15 seconds, and the waste liquid in the collection tube was discarded, and the adsorption column CR3 was returned to the collection tube.

(6) Preparing DNase I working solution: add 10. Mu.l DNase I stock into a new RNase-Free centrifuge tube, add 70. Mu.l RDD buffer, mix gently.

(7) 80. Mu.l of DNase I working solution was added to the center of the adsorption column CR3, and the mixture was left at room temperature for 15 minutes.

(8) 350 μ l of deproteinizing solution RW1 was added to the adsorption column CR3, and centrifuged at 12000rpm for 15 seconds, and the waste liquid in the collection tube was discarded, and the adsorption column CR3 was returned to the collection tube.

(9) The adsorption column CR3 was returned to the collection tube by adding 500. Mu.l of the rinsing solution RW (previously examined whether ethanol was added or not) and centrifuging at 12000rpm for 15 seconds, discarding the waste solution in the collection tube.

(10) Step 9 is repeated.

(11) Centrifuging at 12000rpm for 2 min, placing adsorption column CR3 into a new RNase-Free centrifuge tube, and adding 30-50 μ l RNase-Free ddH dropwise into the middle part of the adsorption membrane ₂ O, left at room temperature for 2 minutes, and centrifuged at 12000rpm for 1 minute to obtain an RNA solution.

2. Third generation transcriptome sequencing and annotation analysis:

total RNA from five tissue samples, tested by purity and concentration, was mixed in equal proportions using Clontech

And (3) carrying out reverse transcription by the PCR cDNA synthesis kit to synthesize single-stranded cDNA. Performing first round PCR amplification by using a KAPA HiFi PCR kit and taking the single-stranded cDNA as a template to generate a double strandcDNA. The resulting double-stranded cDNA was divided into three pools of 0.5-2kb,2-3kb and 3-6kb fragments by Blue Pippin. A second round of PCR amplification was then performed to generate sufficient cDNA to construct a PacBio SMRTbell library for three-generation transcriptome sequencing on the PacBio query platform. Sequencing data were processed using SMRTlink 5.0 software. And filtering low-quality data and redundant sequences from the sequencing result to generate the CCS. All CCS were divided into full-length and non-full-length sequences according to whether the sequences contained 5'primer,3' primer and polyA tail. The full-length CCS adopts an ICE algorithm to carry out clustering analysis under the condition of default parameters to generate CS. CS was further filtered using Arrow and LoRDEC (http:// www.atgc-montpellier. Fr/LoRDEC /) software, and redundant sequences were removed using CD-HIT v4.6 (Fu L, niu B, zhu Z, wu S, li W,2012.CD-HIT: accepted for clustering the next-generation sequencing data. Bioinformatics 28,3150-2) software.

Protein Coding Potential predictions of transcripts were performed using Coding Potential predictor (CPC) (Kong L, zhang Y, ye Z-Q, et al, 2007. CPC. Transcripts that failed the detection of protein coding potential were further aligned in the Swiss-Prot database, and were considered long non-coding RNA if they were still not annotated in the Swiss-Prot database. Additional transcripts were further aligned in databases of NR, swiss-Prot, COG, KEGG and GO, etc., and transcripts were annotated.

Example 3 nucleolus transcriptome sequencing and polymorphic site recognition during high-speed synthesis of oil

1. Extracting total RNA of 500 kernels of clonal oil of camellia oleifera in a high-speed synthesis period:

total RNA of each clone immature kernel was extracted separately using RNAprep Pure polysaccharide polyphenol plant total RNA extraction kit (centrifugal column type, TIANGEN kit Code No. DP441) (see example 2).

2. Second generation transcriptome sequencing:

the total RNA of each sample is detected by purity and concentrationExcept for ribosomal RNA, to retain all coding RNA and ncRNA to the maximum. Randomly breaking the obtained RNA into short segments, and then taking the RNA after fragmentation as a template to synthesize a first cDNA chain by using hexabasic random primers (random hexamers); then buffer, dNTPs (dUTP instead of dTTP), RNase H and DNA polymerase I were added to synthesize the second strand of cDNA, purified by QiaQuick PCR kit and eluted with EB buffer, end repaired, base A added, sequencing linker added, and then the second strand was degraded by UNG (Uracil-N-Glycosylase) enzyme. Fragment size selection was performed by agarose gel electrophoresis and PCR amplification was performed. Finally establishing Illumina HiSeq for sequencing library ^TM The 4000 platform performed next generation transcriptome sequencing.

3. Polymorphic site recognition:

in order to ensure the data quality, clear reads obtained by preliminary filtering after off-line are further strictly filtered to obtain high-quality clear reads for subsequent information analysis. The filtration steps were as follows:

(1) Removing reads containing the linker;

(2) Removing reads which are all A bases;

(3) Removing reads with the N proportion of more than 10 percent;

(4) Low quality reads were removed (the number of bases with a quality value Q ≦ 20 accounted for more than 50% of the whole reads).

High quality reads from each sample were aligned to the reference transcriptome sequence using Tophant v2.1.1 (Trapnell C, roberts A, goff L, et al., 2012.Difference gene and transcript expression analysis of RNA-seq experiments with TopHat and dCufflinks. Nature protocols 7,562-78.) software (see example 2). The sequences that were not aligned were deleted and the remaining sequences identified SNP sites using bcftoolv 1.9 software (http:// www.htslib.org/doc/bcftools. Html). The identified SNP sites are strictly filtered to obtain SNPs data with high quality. The filtration criteria were as follows:

(1) Only 2 alleles were present at the locus;

(2) The genotype deletion rate is less than or equal to 20 percent;

(3) Minimum Allele Frequency (MAF) 5% or more;

(4) The SNP mass value is more than or equal to 100;

(5) The number of samples of homozygous genotypes is greater than 10;

(6) The sample rate of heterozygous genotype is less than or equal to 70 percent.

Example 4 screening of SNP sites associated with oil content of Camellia oleifera kernels

1. And (3) analyzing a population structure:

the natural population of Camellia oleifera was subjected to Principal Component Analysis (PCA) using GCTA v1.25.2 (Jianan Y, S Hong L, goddard ME, visscher PM,2011. GCTA.

TABLE 2 first 10 PC values of partial individuals of the Natural population

2. Correlation analysis:

introducing SNPs locus data, first 10 PC value data, phenotype data (see example 1) and Kinship matrix data of all samples into TASSEL5.0 software, analyzing linkage imbalance of SNPs and kernel oil content characters by adopting an MLM method, screening molecular markers significantly associated with kernel oil content, and detecting 2 loci (P) significantly associated with kernel oil content through multiple inspection and correction<10 ^-5 Table 1), these 2 sites are located in the non-coding region within the pb.70158.1 transcript and contribute 25.13% and 23.88% respectively to the difference in oil content (table 1).

Example 5 application of 2 molecular markers in the invention in oil-tea camellia high-oil breeding

(1) Selecting a camellia oleifera hybrid F1 generation family group as a material (the female parent is 'Changlin No. 53', the male parent is 'Changlin No. 40', the female parent and the male parent are both nationally approved improved varieties, and the improved varieties are 'national S-SC-CO-012-2008' and 'national S-SC-CO-011-2008', respectively), and collecting tender leaves to extract total RNA (see example 2). Using RNA as a template, single-stranded cDNA was generated by reverse transcription using Clontech cDNA synthesis kit, and diluted 100-fold to serve as a working solution.

(2) The single-stranded cDNA working solution was subjected to PCR amplification using the primer set shown in SEQ ID NO.1-2, and the reaction system is shown in Table 3:

TABLE 3

The PCR amplification procedure was:

(3) And carrying out gel detection, purification, recovery, sequencing and genotyping on the PCR amplification product. Gel detection and purification recovery were performed according to AxyPrep DNA gel recovery kit (AxyGEN, code No. AP-GX-50) instructions, and the procedure was as follows:

(1) preparing 1.2% agarose gel, loading 50 μ l of amplification product, electrophoresis voltage is 5V/cm, and stopping electrophoresis after electrophoresis for about 20 min until xylene in loading buffer solution reaches 1cm from the front end of gel.

(2) The agarose gel containing the desired DNA was cut under an ultraviolet lamp, and the surface of the gel was blotted with a paper towel and minced. The gel weight was calculated as a gel volume (e.g. 100mg =100 μ l volume).

(3) Adding 3 volumes of Buffer DE-A, mixing uniformly, heating at 75 ℃, and mixing intermittently every 2-3 minutes until the gel block is completely melted.

(4) 0.5 volume of Buffer DE-B was added and mixed well.

(5) The above solution was transferred to a DNA preparation tube, centrifuged at 12000rpm for 1 minute, and the filtrate was discarded.

(6) Mu.l of Buffer W1 was added and centrifuged at 12000rpm for 30 seconds, and the filtrate was discarded.

(7) Mu.l of Buffer W2 was added and centrifuged at 12000rpm for 30 seconds, and the filtrate was discarded. In the same manner, the mixture was washed once with 700. Mu.l of Buffer W2, centrifuged at 12000rpm for 1 minute, and the filtrate was discarded.

(8) The prepared tube was returned to the centrifuge tube and centrifuged at 12000rpm for 1 minute.

(9) Placing the preparation tube in a clean 1.5ml centrifuge tube, adding 25-30 mul deionized water in the center of the preparation membrane, and standing for 1 minute at room temperature. DNA was eluted by centrifugation at 12000rpm for 1 minute.

And (c) recovering DNA (deoxyribonucleic acid) in the red gel, taking the corresponding amplification primer as a sequencing primer, determining the nucleotide sequence of an amplification product by adopting first-generation sequencing, and judging the genotype of each SNP site on a sequencing peak map by using Chromas software.

(4) The genotypes of 2 sites were identified separately for all individuals. Comparing the relationship between the genotype of each site and the oil content, if the genotype of each PB.70158.1-930 and PB.70158.1-935 site is C/C, T/T, the oil-tea individual is the high-oil-content oil-tea; if one of the genotypes of the two loci is a high-content genotype and the other locus is a heterozygous genotype or both the genotypes of the two loci are heterozygous genotypes, the oil tea individual is a candidate high-oil-content oil tea; if the two loci are the genotype G/G with low oil content, the oil-tea camellia individual is the oil-tea camellia with low oil content.

(5) All F1 individuals were collected from fully mature seeds and their kernel oil content was determined (see example 1). The results show (Table 4) that 79.76% of individuals with high-oil-content genotypes have kernels with oil content higher than the average oil content of the population (37.79%); of the individuals with both sites having high oil content or candidate high oil content genotypes, 71.27% of the individuals have kernel oil content higher than the population average (37.79%). The marker is practical and effective when used for auxiliary selection, can be used for early identification or auxiliary identification, can greatly save production cost, improve selection efficiency and accelerate the oil-tea camellia high-oil breeding process.

TABLE 4 kernel oil content and genotype data for individual F1 plants

Note: in the table ". -" indicates a genotype deletion.

Although the invention has been described in detail hereinabove by way of general description, specific embodiments and experiments, it will be apparent to those skilled in the art that many modifications and improvements can be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Sequence listing

<110> subtropical forestry research institute of China forestry science research institute

<120> 2 SNP molecular markers related to oil content of oil-tea camellia kernel and application thereof

<130> KHP201112302.2

<160> 3

<170> SIPOSequenceListing 1.0

<210> 1

<211> 21

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

acacacacac acagcagagg a 21

<210> 2

<211> 23

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 2

agcagcacca accaagcaat gac 23

<210> 3

<211> 1193

<212> DNA

<213> Artificial Sequence (Artificial Sequence)

<400> 3

acacacacac acagcagagg aaaaatgaaa agcattccag agatgttact gtgttgtagt 60

tctgatcata agccaattcc tcttgtgggg tttggaacag ctgtttatcc tctttcatcc 120

tctgaaacca tgaaacaatc catcctccat gcaatcaaac ttggttacag acacttcgac 180

tctgcaactt tataccagtc agagcagcct cttggagaat caattgttga tgccatacgc 240

ctaggcttca ttcaatctcg ccaagacctc ttcatcacct ctaagctttg gtgttctgat 300

gctcaccctc atcatgtcct ccctgctctt caaaattcac tcaagaatct tggattggaa 360

taccttgatc tgtatctcat tcactggcca gtgagctcaa agccaggtaa atttgagtat 420

ccggtgaaca agcaagagct tcttcccatg gatttcaagt ctgtttggga agccatggag 480

gagtgtcaga atcttggcct cacaaaattt attggagtca gtaacttctc atgcaagaag 540

ctccaattat tactagcaac cgcaaagatc cctccagctg tcaaccaggt cgagatgaac 600

ccactttggc aacagaagaa gctaagagag ttttgtgaga aaaatggtat tcatatcaca 660

gcttactctc ctttgggcgc caaaggaaca atttggggga agtgacaaag tcatggaatg 720

tgaggtgctc aaacagattg ccaaagctag aggaaaatct gttgcccagg tttgtctcag 780

atagggttta tgagcaaggg gtgagtgttc tggtgaagag cttcagtgag gagaggatga 840

aagagaacct tcaaatattt gattgggagc taagcgcaca agactccgag atgataaatc 900

aaatttcaca gtataaagga tgtgctggac ttgatttcat atcagatgaa ggcccttaca 960

aatctctcca ggatttatgg gatggtgaaa ttgtttgatc ctgtaaacgt gtagccaaaa 1020

accacttaga taccgtttga taacatttta tgcttacaac acaaattaat gtgtgtttta 1080

tgtttacaaa aactttggaa actgtttggt tttttatttt catttatggt ttgatcattg 1140

atttcacagt tacatttcac tttattgaat gtcattgctt ggttggtgct gct 1193

Claims

1. The SNP molecular marker related to the oil content of the oil tea seed kernel is characterized in that the SNP molecular marker is PB.70158.1-930 or PB.70158.1-935;

wherein, SNP molecular markers PB.70158.1-930 are shown as SEQ ID NO.3, and the polymorphism at the 930 th site is C/G; the SNP molecular marker PB.70158.1-935 is shown as SEQ ID NO.3, and the polymorphism at the 935 th site is T/G.

2. The SNP molecular marker according to claim 1, wherein in the SNP molecular marker PB.70158.1-930, the genotype of the site with the polymorphism is C/C, which corresponds to high oil content; genotype is C/G, corresponding to candidate high oil content; genotype is G/G, corresponding to low oil content;

3. The SNP molecular marker combination related to the oil content of the oil tea seed kernel is characterized in that the SNP molecular marker combination is PB.70158.1-930 and PB.70158.1-935;

wherein PB.70158.1-930 and PB.70158.1-935 are as described in claim 1 or 2.

4. A primer for amplifying the SNP molecular marker according to claim 1 or 2 or the SNP molecular marker combination according to claim 3.

5. The primer according to claim 4, comprising the primer shown as SEQ ID NO. 1-2.

6. A reagent or a kit comprising the primer according to claim 4 or 5.

7. Any one of the following uses of the SNP molecular marker according to claim 1 or 2, or the SNP molecular marker combination according to claim 3, or the primer according to claim 4 or 5, or the reagent or kit according to claim 6:

(2) The application in oil tea germplasm resource identification, improvement or molecular marker-assisted breeding is characterized in that the oil tea germplasm resource identification, improvement or molecular marker-assisted breeding property is the oil content of oil tea kernels;

(4) The application in screening the oil-tea camellia with high oil content.

8. The method for identifying the oil content phenotype of the oil tea seed kernel is characterized by comprising the following steps:

(2) Performing PCR amplification using the cDNA as a template and the primer according to claim 4 or 5;

(3) Analyzing the genotype of the SNP molecular marker of claim 1 or 2 or the SNP molecular marker combination of claim 3 in the PCR amplification product, and judging the kernel oil content phenotype of the oil tea to be identified according to the genotype.

9. The method according to claim 8, wherein in the step (2), the reaction procedure of the PCR amplification is as follows: 94 to 95 ℃ and 3 to 5min;94 to 95 ℃,15 to 30s,65 to 69 ℃,40 to 60s, and 38 to 45 cycles; 67 to 70 ℃ for 3 to 6min.

10. The method according to claim 8 or 9, wherein the method for judging the kernel oil content phenotype of the camellia oleifera to be identified in the step (3) is as follows:

if the genotype of the SNP molecular marker PB.70158.1-930 with the polymorphic locus is C/C, the oil tea to be identified has high oil content; if the genotype is C/G, the oil tea to be identified is a candidate high oil content; if the genotype is G/G, the oil tea to be identified has low oil content; and/or the presence of a gas in the gas,

if the genotype of the site with the polymorphism of the SNP molecular marker PB.70158.1-935 is T/T, the oil tea to be identified has high oil content; if the genotype is T/G, the oil tea to be identified is a candidate high oil content; and if the genotype is G/G, the oil tea to be identified has low oil content.