CN112575104A

CN112575104A - Method for quickly positioning industrial hemp character related gene

Info

Publication number: CN112575104A
Application number: CN202011463353.7A
Authority: CN
Inventors: 赵越; 王晓楠; 曹焜; 张晓艳; 边境; 孙宇峰; 张治国; 朱浩; 王盼; 韩承伟; 姜颖; 徐磊
Original assignee: Heilongjiang Academy of Sciences Daqing Branch
Current assignee: Heilongjiang Academy of Sciences Daqing Branch
Priority date: 2020-12-11
Filing date: 2020-12-11
Publication date: 2021-03-30

Abstract

A method for rapidly positioning industrial hemp character related genes belongs to the technical field of industrial hemp research. The method comprises the following steps: according to the target characters, variety selection, extreme group construction, specific site amplification fragment library construction and high-throughput sequencing, marker development, association analysis, gene annotation and real-time fluorescence quantitative PCR are carried out. According to the invention, a high-low mixed pool is constructed for F1 generation groups obtained by hybridizing parents with larger target character differences, the parents are subjected to resequencing, the high-low mixed pool is subjected to sequencing by adopting a simplified genome deep sequencing technology, a SLAF label is developed and single nucleotide polymorphism detection is carried out, marker association analysis is carried out for genotype frequency difference between mixed pools, a target character related candidate area is obtained, most plant groups can be subjected to gene localization only when the propagation is carried out to F2 generation at least, and due to the high heterozygosity characteristic of industrial hemp, the gene localization is realized through F1 generation, so that the rapid localization of industrial hemp genes can be realized in a high-efficiency and low-cost manner.

Description

Method for quickly positioning industrial hemp character related gene

Technical Field

The invention belongs to the technical field of industrial hemp research, and particularly relates to a method for quickly positioning genes related to characteristics of industrial hemp.

Background

Industrial hemp (Cannabis sativa. L.) is an annual herbaceous plant of the genus Cannabis (Cannabis) of the family Cannabiaceae (Cannabis), the content of Tetrahydrocannabinol (THC) is lower than 0.3 percent, the stems, the flower leaves, the seeds and the like of industrial cannabis have economic utilization value, the method is widely applied to industries such as textile, building, paper making, medicine, food and the like, the main target of industrial hemp variety cultivation can be set according to specific industrial requirements, at present, the traditional breeding method is mainly adopted for industrial hemp breeding, however, the method for cultivating new species not only has long period but also is difficult to achieve the customized breeding target, the molecular breeding method can not only improve the breeding efficiency but also achieve the purpose of accurate breeding, however, the premise is that the reliable molecular marker is searched for by positioning the related genes of the characters, but because the research of the industrial hemp molecular biology starts late and the related functional genes are not positioned, the development of the sequencing technology and the bioinformatics provides possibility for the rapid positioning of the character genes.

Disclosure of Invention

The invention provides a method for quickly positioning industrial hemp character related genes, which solves the problems that related functional genes are not positioned in the beginning of the molecular biology of industrial hemp at present and the like.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

a method for rapidly positioning industrial hemp trait related genes comprises the following steps: according to target characters, variety selection, extreme population construction, Specific Locus Amplification Fragment (SLAF) library construction and high-throughput sequencing, marker development, correlation analysis, gene annotation and real-time fluorescence quantitative PCR are carried out.

Further, the variety selection specifically comprises: two varieties with obvious target character difference and more consistent field performances of other characters (including agriculture, yield, disease resistance and the like) are selected as parents and parents.

Further, the extreme population is specifically constructed as follows: selecting two varieties at proper time according to the flowering phase, pulling out male plants of female parent varieties or removing male flowers in the bud phase to obtain hybrid seeds which are F1 generation, selecting industrial hemp plants with the same sex in the bud phase to hang tags, quickly freezing young and tender leaves by liquid nitrogen, storing in a refrigerator at minus 80 ℃ for later use, harvesting the industrial hemp in the process maturation phase or the kernel maturation phase, and measuring target character data.

Further, the construction and high-throughput sequencing of the specific site amplification fragment library specifically comprises the following steps: dividing samples into two groups with obvious differences according to the statistic result of target characters, selecting at least 30-100 plants in each group (accounting for about 5% of the total amount of the samples, and selecting 50 plants in each group if 1000 plants are selected), extracting DNA, mixing the DNA of each plant sample in equal amount to construct two mixing pools of high and low, performing electronic enzyme digestion prediction on the reference genome of industrial hemp by using enzyme digestion sequencing software SLAF-Predict, performing enzyme digestion on the DNA mixing pools by using endonuclease, recovering enzyme digestion fragments, adding A at the 3' end, connecting a Dual-index sequencing joint, performing PCR amplification, purification, mixing samples and gel cutting on DNA sequence fragments at the same position in each sample, checking the qualified products by the SLAF library, sequencing the amplification products by using a sequencing system, and using BWA software for the sequencing result and rice (Oryza sativa) as a contrast to evaluate whether the enzyme digestion scheme is effective.

Further, the marker development is specifically as follows: and performing cluster analysis on the obtained reads, mapping the distribution of the SLAF labels on each chromosome of the industrial hemp, and performing SNP detection on the positioning result of the clean reads on a reference genome by using a GATK software package.

Further, the association analysis specifically includes: before the correlation analysis, firstly filtering SNP loci, and filtering out loci with read support degree less than 4, loci with multiple genotypes, loci with recessive mixed pool genes not from recessive parents and loci with consistent genotypes between mixed pools; carrying out association analysis by adopting an SNP-index method to find out the obvious difference of the genotype frequency between the mixed pools, and carrying out statistics by using delta (SNP-index), wherein the stronger the association degree of the SNP and the target character is, the closer the numerical value of the delta (SNP-index) is to 1; the calculation formula is as follows: snpindex (aa) ═ ma/(ma + Paa); snpindex (ab) ═ Mab/(Mab + Pab); Δ (SNP-index) ═ snpindex (aa) -snpindex (ab);

note: paa refers to the depth of the aa pool from the male parent, Maa refers to the depth of the aa pool from the female parent, Pab refers to the depth of the ab pool from the male parent, and Mab refers to the depth of the ab pool from the female parent;

the elimination of the false positive sites mainly utilizes the position of a marker on a genome, adopts an SNPNUM method to fit the delta SNP-index, selects a region above a threshold value according to a correlation threshold value as a character related candidate region, and calculates a result according to a computer simulation experiment.

Further, the gene annotation specifically is: and carrying out deep annotation of NR, Swiss-Prot, GO, KEGG and COG on the coding genes in the candidate region obtained by the correlation analysis, and quickly screening the candidate genes according to the annotation result.

Further, the real-time fluorescent quantitative PCR specifically comprises: 2-3 varieties with different target characters are selected, sample RNA is extracted, reverse transcription is carried out, 1-2 primers are respectively designed for different candidate genes for carrying out primer debugging on corresponding primers, and the debugging result is qualified and used for relative quantitative PCR analysis.

Compared with the prior art, the invention has the beneficial effects that: constructing a high-low mixed pool for F1 generation groups obtained by hybridizing parents with obvious target character differences, performing resequencing on the parents, sequencing the high-low mixed pool by adopting a simplified genome deep sequencing technology (SLAF-seq), developing an SLAF label and performing Single Nucleotide Polymorphism (SNP) detection, performing marker association analysis on genotype frequency difference between the mixed pools to obtain a target character related candidate region, and performing gene localization on most plant groups at least until F2 generation, wherein due to the high heterozygosity characteristic of industrial hemp, the gene localization can be realized through F1 generation.

Drawings

FIG. 1 is a map of the SLAF markers (black lines) on the chromosome of industrial cannabis sativa;

FIG. 2 is a map of SNP-index association values on a chromosome;

note: the abscissa is the chromosome name, the black dots represent the calculated SNP-index (or. DELTA. -SNP-index) values, and the black lines are the fitted SNP-index (or. DELTA. -SNP-index) values. The upper graph is the distribution graph of SNP-index values of the recessive mixed pool; the middle panel is the distribution of SNP-index values of the dominant pool; the lower graph is a distribution of Δ SNP-index values, where the dashed line represents the threshold line of the 99 percentile.

FIG. 3 is a pathway profile of genes within a candidate region;

fig. 4 is a plot of the results of fluorescence quantitative PCR for the gene LOC115705530 (. about.p <0.05,. about.p < 0.01);

fig. 5 is a plot of the results of the fluorescent quantitative PCR of gene LOC115707511 (. P <0.05,. P < 0.01);

fig. 6 is a plot of the results of fluorescence quantitative PCR for the gene LOC115704794 (. P <0.05,. P < 0.01);

fig. 7 is a plot of the results of fluorescent quantitative PCR of the gene LOC115705371 (. about.p <0.05,. about.p < 0.01);

fig. 8 is a plot of the results of fluorescence quantitative PCR for the genes LOC115705688 (. sp <0.05,. sp < 0.01).

Detailed Description

The technical solutions of the present invention are further described below with reference to the drawings and the embodiments, but the present invention is not limited thereto, and modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Example 1:

(1) variety selection: selecting golden knife-15 with high fiber content as male parent and hemp I with low fiber content as female parent to perform hybridization.

(2) Group construction: the gold knife-15 and the first flowering period of the hemp are the same, two varieties are planted at the same time, the first male plant of the hemp is pulled out at the current bud period, the obtained hybrid is F1 generation, industrial hemp plants with the same sex are selected at the current bud period to be listed, young and tender leaves are taken to be quickly frozen by liquid nitrogen, the young and tender leaves are stored in a refrigerator at minus 80 ℃ for standby, the industrial hemp is harvested at the process maturation period, the weight of the original stem and the weight of the fiber of a single plant are measured after the hemp plants are freshly peeled, and the fiber content is calculated. The calculation formula is as follows: fiber content is fiber weight/protostem weight x 100%.

(3) SLAF library construction and high throughput sequencing: selecting 30 plants with high content and 30 plants with low content according to the fiber content result, extracting DNA, equivalently mixing DNA of each plant sample to construct a high-low mixed pool, performing electronic enzyme digestion prediction on an industrial hemp reference genome by using enzyme digestion sequencing software SLAF-Predict to obtain 105823 SLAF labels, wherein each label is basically and uniformly distributed on a genome chromosome, selecting RsaI and HaeIII to perform enzyme digestion on the DNA mixed pool as shown in figure 1, recovering enzyme digestion fragments, adding A at the 3' end, connecting a Dual-index sequencing joint, and performing D sequencing at the same position in each sampleCarrying out Polymerase Chain Reaction (PCR) amplification, purification, sample mixing and gel cutting on the NA sequence fragment, checking by the SLAF library to be qualified, and using a sequencing system IlluminaHiSeq to amplify the product^TM2500 for sequencing. BWA software was used to evaluate the validity of the digestion protocol in comparison with rice (Oryza sativa) for the sequencing results.

(4) And (3) label development: performing cluster analysis on the obtained reads, mapping the distribution of the SLAF labels on each chromosome of the industrial hemp, performing SNP detection on the positioning result of the clean reads on a reference genome by using GATK software, detecting 389,687 SNP sites among mixed pools, and developing SNP molecular markers.

(5) Correlation analysis: before the correlation analysis, SNP loci are firstly filtered, and loci with read support degree less than 4, loci with multiple genotypes, loci with recessive mixed pool genes not from recessive parents and loci with consistent genotypes among mixed pools are filtered out. Performing association analysis by adopting an SNP-index method, mainly aiming at searching for a significant difference of genotype frequencies among mixed pools and performing statistics by using delta (SNP-index). The stronger the SNP is associated with the target trait, the closer the value of Δ (SNP-index) is to 1. The calculation formula is as follows: snpindex (aa) ═ ma/(ma + Paa); snpindex (ab) ═ Mab/(Mab + Pab); Δ (SNP-index) ═ SNPindex (aa) -SNPindex (ab)

eliminating false positive sites mainly utilizes the position of a marker on a genome, adopts an SNPNUM method to fit the delta SNP-index, selects a region above a threshold value according to a correlation threshold value as a character related candidate region, and calculates a result according to a computer simulation experiment. When the confidence is 0.90, no correlation to the relevant candidate region is made. Theoretically, the target site and its nearby linkage sites should be close to the threshold, and a higher peak should appear near the significant association region, but in the present experimental results, no significant localization result is obtained because no region exceeding the theoretical threshold is found. To fully exploit the data, the potential localization regions were found by lowering the threshold, using the 99 percentile of the fitted Δ SNP-index, i.e., 0.10, as shown in FIG. 2. A total of 4 candidate regions of 8.72Mb total length were obtained, including 397 genes.

(6) Gene annotation: coding genes in a candidate region obtained by correlation analysis are subjected to deep annotation of NR, Swiss-Prot, GO, KEGG and COG, 389 genes are annotated, the pathway distribution map of the genes in the candidate region is shown in figure 3, and the candidate genes LOC115705530, LOC115707511, LOC115703881, LOC115704794, LOC115705010, LOC115705371, LOC115705568, LOC115705688, LOC115705891, LOC115705892 and LOC115706200 are obtained by gene comparison with crops such as arabidopsis thaliana, flax, cotton and the like.

(7) Real-time fluorescent quantitative PCR: selecting 3 varieties of hemp I (22.1%), hemp 10 (27.1%) and golden knife 15 (33.1%) with different fiber contents, performing verification in seedling stage and process maturation stage, extracting sample RNA (the specific method refers to a Tiangen plant RNA extraction Kit (DP432)), performing reverse transcription by using a FastKing RT Kit (KR116) reagent, performing system 20ul, and preparing Buffer mixed liquor (2ul FQ-RT Primer Mix, 2ul10 Xking RT Buffer, 1ul FastKing RT Enzyme Mix, 5ul RNase-Free dd H₂O), 1ug of RNA was added to 10ul of buffer mixture using dd H₂Supplementing O to 20ul, reacting at 42 ℃ for 30min and 95 ℃ for 3min, diluting the cDNA (complementary deoxyribonucleic acid) obtained by reverse transcription by 10 times, using the diluted cDNA to perform quantitative PCR, designing 1-2 corresponding primers aiming at different candidate genes respectively, debugging the corresponding primers, using the qualified debugging result to perform quantitative PCR analysis, using a reagent Power qPCR PreMix (Genecopoeia) to perform quantitative analysis, using a 96-pore plate SYBR Green 20ul system (10ul Mix, 1ul cDNA, 0.5ul and 8ul H primers respectively before and after the primer is used₂O, reaction sequence is shown in the following table.

Constructing standard product of target gene and reference gene by standard curve method, constructing standard curve, and calculating amplification of target gene and reference gene primerEfficiency is increased, the multiple relation between the two is obtained by substituting calculation, and a copy number calculation formula is as follows: copy number/. mu.l ═ (ng/. mu.l). times.10^-9×6.02×10²³/(bp × 660). Wherein: 6.02X 10²³As a molar constant, 660 is the average molecular weight of the base (AGCT). The genes LOC115705530, LOC115707511, LOC115704794, LOC115705371 and LOC115705688 were found to be related to fiber content by quantitative results, as shown in FIGS. 4-8.

Claims

1. A method for rapidly positioning industrial hemp character related genes is characterized in that: the method comprises the following steps: according to the target characters, variety selection, extreme group construction, specific site amplification fragment library construction and high-throughput sequencing, marker development, association analysis, gene annotation and real-time fluorescence quantitative PCR are carried out.

2. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the variety selection specifically comprises the following steps: two varieties with obvious target character difference and more consistent field performances of other characters are selected as parents and parents.

3. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the extreme population construction specifically comprises the following steps: selecting two varieties at proper time according to the flowering phase, pulling out male plants of female parent varieties or removing male flowers in the bud phase to obtain hybrid seeds which are F1 generation, selecting industrial hemp plants with the same sex in the bud phase to hang tags, quickly freezing young and tender leaves by liquid nitrogen, storing in a refrigerator at minus 80 ℃ for later use, harvesting the industrial hemp in the process maturation phase or the kernel maturation phase, and measuring target character data.

4. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the construction and high-throughput sequencing of the specific site amplification fragment library specifically comprises the following steps: dividing samples into two groups with obvious differences according to a target character statistical result, selecting at least 30-100 plants in each group, extracting DNA, equivalently mixing the DNA of each plant sample to construct a high-low mixed pool, performing electronic enzyme digestion prediction on an industrial hemp reference genome by using enzyme digestion sequencing software SLAF-Predict, performing enzyme digestion on the DNA mixed pool by using endonuclease, recovering enzyme digestion fragments, adding A at the 3' end, connecting a Dual-index sequencing joint, performing PCR amplification, purification, sample mixing and gel cutting on the DNA sequence fragments at the same position in each sample, performing sequencing on an amplification product by using a sequencing system after the DNA sequence fragments are qualified through the SLAF library inspection, and evaluating whether the enzyme digestion scheme is effective or not by using BWA software and rice as a reference on the sequencing result.

5. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the marker development is specifically as follows: and performing cluster analysis on the obtained reads, mapping the distribution of the SLAF labels on each chromosome of the industrial hemp, and performing SNP detection on the positioning result of the clean reads on a reference genome by using a GATK software.

6. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the correlation analysis specifically comprises: before the correlation analysis, firstly filtering SNP loci, and filtering out loci with read support degree less than 4, loci with multiple genotypes, loci with recessive mixed pool genes not from recessive parents and loci with consistent genotypes between mixed pools; carrying out association analysis by adopting an SNP-index method to find out the obvious difference of the genotype frequency between the mixed pools, and carrying out statistics by using delta (SNP-index), wherein the stronger the association degree of the SNP and the target character is, the closer the numerical value of the delta (SNP-index) is to 1; the calculation formula is as follows: snpindex (aa) ═ ma/(ma + Paa); snpindex (ab) ═ Mab/(Mab + Pab); Δ (SNP-index) ═ snpindex (aa) -snpindex (ab);

7. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the gene annotation is specifically: and carrying out deep annotation of NR, Swiss-Prot, GO, KEGG and COG on the coding genes in the candidate region obtained by the correlation analysis, and quickly screening the candidate genes according to the annotation result.

8. The method for rapidly mapping industrial hemp trait-related genes according to claim 1, wherein the method comprises the following steps: the real-time fluorescent quantitative PCR specifically comprises the following steps: 2-3 varieties with different target characters are selected, sample RNA is extracted, reverse transcription is carried out, 1-2 primers are respectively designed for different candidate genes for carrying out primer debugging on corresponding primers, and the debugging result is qualified and used for relative quantitative PCR analysis.