CN117133354A - Method for efficiently identifying key breeding gene modules of forest tree - Google Patents
Method for efficiently identifying key breeding gene modules of forest tree Download PDFInfo
- Publication number
- CN117133354A CN117133354A CN202311097273.8A CN202311097273A CN117133354A CN 117133354 A CN117133354 A CN 117133354A CN 202311097273 A CN202311097273 A CN 202311097273A CN 117133354 A CN117133354 A CN 117133354A
- Authority
- CN
- China
- Prior art keywords
- snp
- key
- breeding
- genes
- specific
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 111
- 238000009395 breeding Methods 0.000 title claims abstract description 61
- 230000001488 breeding effect Effects 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 38
- 230000002068 genetic effect Effects 0.000 claims abstract description 19
- 230000003993 interaction Effects 0.000 claims description 21
- 230000000694 effects Effects 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 12
- 238000013518 transcription Methods 0.000 claims description 12
- 230000035897 transcription Effects 0.000 claims description 12
- 238000012098 association analyses Methods 0.000 claims description 9
- 238000013461 design Methods 0.000 claims description 7
- 230000001667 episodic effect Effects 0.000 claims description 7
- 230000002596 correlated effect Effects 0.000 claims description 6
- 230000000875 corresponding effect Effects 0.000 claims description 6
- 239000002023 wood Substances 0.000 claims description 6
- 108091046869 Telomeric non-coding RNA Proteins 0.000 claims description 4
- 108091070501 miRNA Proteins 0.000 claims description 3
- 239000002773 nucleotide Substances 0.000 claims description 3
- 125000003729 nucleotide group Chemical group 0.000 claims description 3
- 238000012360 testing method Methods 0.000 claims description 3
- 230000002922 epistatic effect Effects 0.000 claims description 2
- 230000006872 improvement Effects 0.000 abstract description 6
- 230000008569 process Effects 0.000 abstract description 2
- SRBFZHDQGSBBOR-IOVATXLUSA-N D-xylopyranose Chemical compound O[C@@H]1COC(O)[C@H](O)[C@H]1O SRBFZHDQGSBBOR-IOVATXLUSA-N 0.000 description 50
- PYMYPHUHKUWMLA-UHFFFAOYSA-N arabinose Natural products OCC(O)C(O)C(O)C=O PYMYPHUHKUWMLA-UHFFFAOYSA-N 0.000 description 25
- SRBFZHDQGSBBOR-UHFFFAOYSA-N beta-D-Pyranose-Lyxose Natural products OC1COC(O)C(O)C1O SRBFZHDQGSBBOR-UHFFFAOYSA-N 0.000 description 25
- 241000249899 Populus tomentosa Species 0.000 description 21
- 239000003795 chemical substances by application Substances 0.000 description 6
- 238000004458 analytical method Methods 0.000 description 5
- IJGRMHOSHXDMSA-UHFFFAOYSA-N Atomic nitrogen Chemical compound N#N IJGRMHOSHXDMSA-UHFFFAOYSA-N 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 101150044508 key gene Proteins 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 102000004169 proteins and genes Human genes 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 241000196324 Embryophyta Species 0.000 description 2
- 239000007788 liquid Substances 0.000 description 2
- 239000003550 marker Substances 0.000 description 2
- 229910052757 nitrogen Inorganic materials 0.000 description 2
- 108091032973 (ribonucleotides)n+m Proteins 0.000 description 1
- 101150084750 1 gene Proteins 0.000 description 1
- 108020004414 DNA Proteins 0.000 description 1
- 108700011259 MicroRNAs Proteins 0.000 description 1
- 241000183024 Populus tremula Species 0.000 description 1
- 238000012180 RNAeasy kit Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004034 genetic regulation Effects 0.000 description 1
- 238000004128 high performance liquid chromatography Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000002679 microRNA Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- KJONHKAYOJNZEC-UHFFFAOYSA-N nitrazepam Chemical compound C12=CC([N+](=O)[O-])=CC=C2NC(=O)CN=C1C1=CC=CC=C1 KJONHKAYOJNZEC-UHFFFAOYSA-N 0.000 description 1
- 230000035515 penetration Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 238000013441 quality evaluation Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000008844 regulatory mechanism Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/10—Ontologies; Annotations
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/13—Plant traits
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Organic Chemistry (AREA)
- Genetics & Genomics (AREA)
- Molecular Biology (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Botany (AREA)
- General Engineering & Computer Science (AREA)
- Mycology (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The application provides a method for efficiently identifying key breeding gene modules of trees, and relates to the technical field of molecular genetics. The method can accurately and efficiently identify the key breeding gene module of the important forest trait, systematically evaluate the genetic effect of each genotype combination in the breeding module on the phenotypic variation, determine the genotype combination of the optimal key breeding gene module, and be widely applied to the accurate breeding of the important forest trait in the seedling stage and the genetic improvement process of the forest molecules.
Description
Technical Field
The application relates to the technical field of molecular genetics, in particular to a method for efficiently identifying key breeding gene modules of woods.
Background
The important economic characters of the forest are complex quantitative characters regulated by multiple genes, and the forest has the characteristics of multiple years of growth, strong wild property, wide distribution and the like, so that the genetic basis of the important characters is unclear, and the regulation mechanism is unknown. In recent years, with the gradual penetration of molecular genetics and genomics research, a series of key genes with important breeding values are discovered; however, the key genes are difficult to be widely applied to forest molecular genetic improvement practice and important character seedling stage precise breeding, and the main reasons of the key genes include the following three aspects: (1) The natural resources of the forest are widely distributed, so that the phenotypic variation and the allelic variation of the genome of the germplasm resources are abundant, and the genetic effect of the allelic locus of the genome affecting the phenotypic variation of the important characters cannot be fully known in the prior art; (2) The complex forest traits are subjected to the joint regulation of multiple genes, the genetic regulation mechanism is very complex, most of current researches only concern the biological functions of single genes, and neglect the joint genetic effect of a breeding module consisting of multiple genes on phenotypic variation; (3) The existing molecular genetics technical means lack of accurate identification of key breeding modules of important traits and deep analysis of genetic effects thereof, so that the development of molecular genetic improvement on important economic traits of trees and the screening efficiency and the screening precision in seedling stage are low. Therefore, the lack of a strategy for efficiently identifying key breeding gene modules of the forest in the prior art influences the establishment of a tree molecular design breeding technology system and the effective implementation of tree molecular genetic improvement.
Disclosure of Invention
The application aims to provide a method for efficiently identifying key breeding gene modules of trees, which can be used for efficiently identifying key breeding gene modules of important characteristics of the trees, accurately analyzing the genetic effect of each genotype combination in the breeding modules on phenotype variation, and can be widely applied to efficient screening of tree important characteristics in seedling stage, thereby providing important technical support for tree molecular design breeding.
The application provides a method for efficiently identifying key breeding gene modules of woods, which comprises the following steps:
1) Carrying out whole genome association analysis on genome Single Nucleotide Polymorphism (SNP) genotype data of each individual of a wood germplasm resource group to be detected and phenotype values of each individual of specific characters of the wood to be detected in the germplasm resource group, and determining SNP loci obviously associated with the characteristic characters;
the determined conditions include: SNP genotype loci in genome are obviously associated with phenotypic traits, so as to reach significance level in biology statistics;
2) Performing functional annotation on a transcription module where the SNP locus which is obviously associated with the specific character in the step 1) is located, and defining the transcription module as a candidate gene;
3) Determining the expression quantity data of the candidate genes in the step 2) in each individual of the forest germplasm resource group to be detected;
4) Detecting a pearson correlation coefficient r between the population expression level of the candidate genes in the step 3) and the population phenotype value of the specific trait in the step 1), detecting the 'expression and phenotype' correlation between the candidate genes and the specific trait, determining the candidate genes with highly correlated expression patterns with the phenotypic variation of the specific trait, defining the candidate genes as key genes, and indicating that the expression level of the key genes greatly influences the phenotypic variation of the specific trait;
the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4;
5) Based on the SNP genotype data obviously associated in the key genes in the step 4), in combination with the idioplasm resource population phenotype values of the specific characters in the step 1), detecting the superior interaction effect affecting the phenotypic variation among SNP loci in the key genes, determining a key breeding gene module affecting the specific phenotypic character, evaluating the genetic effect of each genotype combination in the key breeding gene module on the phenotypic variation, and identifying the optimal genotype combination in the key breeding gene module;
the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
Preferably, the number of the forest germplasm resource groups in each step is more than 200 plants.
Preferably, the SNP genotype frequency in steps 1), 2) and 5) is greater than 10%.
Preferably, the method for performing the whole genome association analysis in the step 1) is a mixed linear model in TASSEL v5.0, and the significance level of the association between each SNP site and a specific phenotype is obtained by using software to obtain a P value; multiple hypothesis testing is performed on the P value by using 1/n (n represents the total number of SNP in the whole genome; bonferroni method), and SNP sites with the P value less than 1/n are screened, so that SNP sites which are obviously associated with specific characters are determined.
Preferably, the annotated transcription module in step 2) includes a protein coding gene, long non-coding RNA, and microRNA.
Preferably, the software for calculating the pearson correlation coefficient r in the step 4) includes SPSS v19.0.
Preferably, the software for detecting the epistatic interaction effect in the step 5) is an epi np1 package in epi SNP software, and the significance P value associated with phenotype between SNPs is calculated by the software; SNP-SNP interaction pairs that are significantly associated with a particular trait are determined using a screening criteria of P.ltoreq.0.001.
Preferably, in the step 5), only when SNPs with significant episodic interactions are involved, the corresponding key genes can be incorporated into the key breeding gene module.
Preferably, in said step 5), the frequency of each genotype combination in the critical breeding gene module is greater than 10% of the germplasm resources population when evaluating the phenotypic inheritance effect of each genotype combination.
Another object of the present application is to provide the application of the above method in molecular design breeding of forest trees.
The application provides a method for efficiently identifying key breeding gene modules of trees. The key breeding genes identified by the prior art are difficult to apply to the genetic improvement of the tree molecules and the accurate screening of important character seedling stage, and the reason is that the prior researches fail to fully recognize the allelic variation rule of the tree germplasm resource group, and the system identification and the deep analysis of the key breeding module with strong phenotypic variation genetic effect of the important character are lacking. Therefore, the method for efficiently identifying the key breeding gene module of the forest can accurately and efficiently identify the key breeding gene module of the forest important character, deeply analyze the genetic effect of each genotype combination in the breeding module on the phenotype variation, can be widely applied to the accurate screening of the important character in the seedling stage and the genetic improvement process of the forest molecules, and provides important technical support for the design and breeding of the forest molecules.
By adopting the method provided by the application, the key breeding gene module of the xylem xylose content of the populus tomentosa is PtoGAO1-PtoCAMTA5-PtoC3H3-PtoDOF2, and the chr1_34278210-chr6_5959112-chr10_12844616-chr2_21224438 genotype combination AA/AG/GT/AT corresponding to the xylem xylose content is found to be the highest, the xylem xylose content corresponding to the GA/AG/TT/AT genotype combination is the lowest, and the xylem xylose content can be rapidly screened in the populus tomentosa seedling stage.
Drawings
FIG. 1 shows phenotype effect values of each genotype in a populus tomentosa xylose content breeding gene module;
FIG. 2 is an analytical flow chart of the identification method of the present application.
Detailed Description
The application provides a method for efficiently identifying key breeding gene modules of woods, which comprises the following steps:
1) Carrying out whole genome association analysis on genome Single Nucleotide Polymorphism (SNP) genotype data of each individual of a wood germplasm resource group to be detected and phenotype values of each individual of specific characters of the wood to be detected in the germplasm resource group, and determining SNP loci obviously associated with the characteristic characters;
the determined conditions include: SNP genotype loci in genome are obviously associated with phenotypic traits, so as to reach significance level in biology statistics;
2) Performing functional annotation on a transcription module where the SNP locus which is obviously associated with the specific character in the step 1) is located, and defining the transcription module as a candidate gene;
3) Determining the expression quantity data of the candidate genes in the step 2) in each individual of the forest germplasm resource group to be detected;
4) Detecting a pearson correlation coefficient r between the population expression level of the candidate genes in the step 3) and the population phenotype value of the specific trait in the step 1), detecting the 'expression and phenotype' correlation between the candidate genes and the specific trait, determining the candidate genes with highly correlated expression patterns with the phenotypic variation of the specific trait, defining the candidate genes as key genes, and indicating that the expression level of the key genes greatly influences the phenotypic variation of the specific trait;
the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4;
5) Based on the SNP genotype data obviously associated in the key genes in the step 4), in combination with the idioplasm resource population phenotype values of the specific characters in the step 1), detecting the superior interaction effect affecting the phenotypic variation among SNP loci in the key genes, determining a key breeding gene module affecting the specific phenotypic character, evaluating the genetic effect of each genotype combination in the key breeding gene module on the phenotypic variation, and identifying the optimal genotype combination in the key breeding gene module;
the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
The application firstly obtains SNP genotype data of each individual whole genome of a forest germplasm resource group to be detected, and the SNP genotype data of the forest germplasm resource group to be detected is obtained based on the forest whole genome resequencing. The number of the forest germplasm resource groups is more than 200, and the number is the same and is not repeated. The application requires that the SNP genotype frequency is greater than 10%, and is the same and not described in detail.
The application also needs to obtain the phenotype value of each individual of the specific character of the forest tree to be detected in the germplasm resource group, and the method for obtaining the phenotype value of the specific character is not particularly limited.
Carrying out whole genome association analysis on genome SNP genotype data of each individual of a forest germplasm resource group to be detected and phenotype values of specific characters of each individual of the germplasm resource group, and determining SNP loci obviously associated with the characteristic characters; the determined conditions include: SNP genotypic loci in genomes are significantly associated with phenotypic traits, reaching significance levels in biometrics. According to the method for carrying out whole genome association analysis, the mixed linear model in TASSEL v5.0 is optimized, and the significance level of association between each SNP locus and a specific phenotype is obtained by using software to obtain a P value; multiple hypothesis testing is performed on the P value by using 1/n (n represents the total number of SNP in the whole genome; bonferroni method), and SNP sites with the P value less than 1/n are screened, so that SNP sites which are obviously associated with specific characters are determined.
The application carries out functional annotation on the transcription module where the SNP locus which is obviously associated with the specific character is located, and defines the transcription module as a candidate gene. The method of obtaining the annotated transcription module is not limited in the present application, and the annotated transcription module preferably includes, but is not limited to, protein coding genes, long non-coding RNAs (lncRNA), micro RNAs (miRNA), and the like.
The application determines the expression quantity data of the candidate genes in each individual of the forest germplasm resource group to be detected. The method for obtaining the population expression level of the candidate gene is not particularly limited in the present application.
The application detects the Pelson correlation coefficient r between the population expression quantity of candidate genes and the population phenotype value of the specific character, detects the 'expression and phenotype' correlation between the candidate genes and the specific character, determines the candidate genes with highly correlated expression modes with the phenotype variation of the specific character, defines the candidate genes as key genes, and indicates that the expression level of the key genes influences the phenotype variation of the specific character to a great extent; the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4. The application preferably uses SPSS v19.0 to calculate the pearson correlation coefficient r.
The application is based on SNP genotype data which are obviously associated in a key gene, combines with a specific character germplasm resource group phenotype value, detects the superior interaction effect which affects the phenotype variation among the obviously associated SNP loci in the key gene, determines a key breeding gene module which affects the specific phenotype character, evaluates the genetic effect of each genotype combination in the key breeding gene module on the phenotype variation, and identifies the optimal genotype combination in the key breeding gene module; the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
In the application, the software for detecting the superior interaction effect is preferably an EPISNP1 program package in the epi SNP software, and the significance P value associated with the phenotype between SNP and SNP is obtained by software calculation; SNP-SNP interaction pairs that are significantly associated with a particular trait are determined using a screening criteria of P.ltoreq.0.001. In the present application, only when there is a key gene corresponding to a SNP having a significant interaction relationship, it can be incorporated into a key breeding gene module. In evaluating the phenotypic inheritance of each genotype combination within a critical breeding gene module, it is preferred that the frequency of each genotype combination be greater than 10% of the germplasm resource population.
Another object of the present application is to provide the application of the above method in molecular design breeding of forest trees.
The method for efficiently identifying key breeding gene modules of woods according to the application is described in further detail below with reference to specific examples, and the technical scheme of the application includes but is not limited to the following examples.
Example 1
By using the method for efficiently identifying the key breeding gene module of the forest, the key breeding gene module of the xylem xylose content of the populus tomentosa is identified, and the phenotypic effect of each genotype combination in the gene module on the xylem xylose content is analyzed, so that the method is used for screening the seedling stage of the character and establishing a tree molecular design breeding technology system.
Step S1, obtaining genome-wide SNP genotype data of populus tomentosa germplasm resource group (303 individuals) based on genome-wide resequencing technology, wherein the genome-wide SNP genotype data comprises the following specific steps:
extracting leaf DNAs of all individuals from a resource group 303 individuals in populus tomentosa as a material for genome resequencing, performing sequence comparison by taking a populus tomentosa reference genome as a reference to obtain whole genome SNP data and the position of the whole genome SNP data in the genome, and screening SNP with genotype frequency of more than 10% for subsequent analysis to obtain 12,800,000 SNP data.
Step S2, obtaining xylem xylose content of populus tomentosa germplasm resource groups (303 individuals), wherein the specific operation steps are as follows:
collecting mature xylem materials of each individual of the populus tomentosa germplasm resource group, immediately placing the materials into liquid nitrogen (-196 ℃) for preservation after collection, and determining the xylem xylose content of the populus tomentosa germplasm resource group by adopting a high performance liquid chromatography according to the specifications of national standard methods GB2677.7-81, GB2677.8-81 and GB 2677.10-81; the analysis shows that the xylem xylose content ranges from 2.03% to 31.95% in populus tomentosa germplasm resource groups, the average value is 14.08%, and the xylem xylose content accords with normal distribution, and is suitable for carrying out whole genome association analysis.
Step S3, carrying out whole Genome association analysis (Genome-wide association Study, GWAS) on the whole Genome SNP data of the populus tomentosa germplasm resource group in step S1 and the xylose data of the populus tomentosa germplasm resource group in step S2 by using a mixed Linear Model (Mix Linear Model) in TASSEL v5.0 to obtain a significance value P of the xylose content of each SNP and the group, and screening SNP loci smaller than the P value as remarkably associated loci by taking P <7.81E-08 (1/n, n is the number of the whole Genome SNP and accords with a Bonferroni test method) as a screening standard; as a result, 14 SNP sites in total were found to form a remarkable correlation with xylem xylose content of populus tomentosa (P < 7.81E-08), and specific results are shown in Table 1.
Table 1 Gene information significantly correlated with xylem xylose content of Populus tomentosa
And S4, carrying out gene annotation on the obviously-related SNP loci based on the coding gene annotation information of the populus tomentosa genome protein, namely positioning the genes of the obviously-related SNP loci, and carrying out annotation to obtain 9 candidate genes, wherein the specific results are shown in Table 1.
In step S5, since the trait of interest in this example is xylem xylose content, the expression level data of 9 candidate genes obtained in step S4 in xylem of populus tomentosa germplasm resource group needs to be detected, and the specific steps are as follows:
collecting mature xylem of populus tomentosa germplasm resource group (303 individuals), storing in liquid nitrogen after collecting, extracting the collected mature xylem RNA by using a Plant Qiagen RNAeasy kit (Qiagen China, shanghai, china) kit, and carrying out transcriptome sequencing by a biological company after quality evaluation to obtain the expression quantity of 9 candidate genes in the populus tomentosa germplasm resource group xylem obtained in the step S4.
Step S6, calculating the pearson correlation coefficient r of expression and phenotype between the group expression quantity of 9 candidate genes in the step S5 and the group xylose content in the step S2 by using SPSS v19.0 software, and finding that the expression level of total 6 genes in the group is highly correlated with the group xylose content (r >0.4 or r < -0.4), wherein the 6 key genes are respectively: ptoARF8, ptoWRKY41, ptoGAO1, ptoDOF2, ptoCAMTA5 and PtoC3H3, and the specific information is shown in Table 1.
TABLE 2 significant marker locus to locus episodic interactions (P < 0.001)
TABLE 3 analysis of Key seed-breeding Module Effect of xylem xylose content of Populus tomentosa
Step S7, combining the SNP genotype data which are obviously associated and correspond to the 6 key genes in the step S6 with the xylem xylose content data of the population, detecting the SNP-SNP interaction effect by using an EPISNP1 program package in the epi SNP software, and finding that a obvious upper-level interaction relationship (P < 0.001) exists among four obviously associated SNPs among obvious marker loci, namely, chr1_34278210, chr6_5959112, chr10_12844616 and Chr2_21224438, which indicates that upper-level interaction exists among the 4 key genes PtoGAO1, ptoCAMTA5, ptoC3H3 and PtoDOF2 (Table 2), and defining that the 4 key genes are incorporated into a key breeding gene module, namely, the xylem xylose content key breeding gene module of aspen is PtoGAO1-PtoCAMTA5-PtoC3H3-PtoDOF2. Further, each genotype combination (each combination has a minimum frequency of more than 10%) of the corresponding significantly associated SNPs (chr1_ 34278210-chr6_5959112-chr10_12844616-chr2_ 21224438) in the breeding module consisting of these 5 key genes was evaluated, and the genetic effect on xylem xylose content was found to be highest for the AA/AG/GT/AT genotype combination, and lowest for the GA/AG/TT/AT genotype combination, as shown in fig. 1 and table 3 for specific information.
From the above, the xylem xylose content breeding gene module of populus tomentosa is as follows: ptoGAO1-PtoCAMTA5-PtoC3H3-PtoDOF2, wherein a genotype combination AA/AG/GT/AT composed of 5 breeding genes with obvious association SNP Chr1_34278210-Chr6_5959112-Chr10_12844616-Chr2_21224438 has the highest xylem xylose content, and a GA/AG/TT/AT genotype combination has the lowest xylem xylose content.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.
Claims (10)
1. A method for efficiently identifying key breeding gene modules of trees comprises the following steps:
1) Carrying out whole genome association analysis on the genome single nucleotide polymorphism genotype data of each individual in the wood germplasm resource group to be detected and the phenotype value of each individual in the germplasm resource group of the wood specific character to be detected, and determining SNP loci obviously associated with the characteristic characters;
the determined conditions include: SNP genotype loci in genome are obviously associated with phenotypic traits, so as to reach significance level in biology statistics;
2) Performing functional annotation on a transcription module where the SNP locus which is obviously associated with the specific character in the step 1) is located, and defining the transcription module as a candidate gene;
3) Determining the expression quantity data of the candidate genes in the step 2) in each individual of the forest germplasm resource group to be detected;
4) Detecting a pearson correlation coefficient r between the population expression level of the candidate genes in the step 3) and the population phenotype value of the specific trait in the step 1), detecting the 'expression and phenotype' correlation between the candidate genes and the specific trait, determining the candidate genes with highly correlated expression patterns with the phenotypic variation of the specific trait, defining the candidate genes as key genes, and indicating that the expression level of the key genes greatly influences the phenotypic variation of the specific trait;
the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4;
5) Based on the SNP genotype data obviously associated in the key genes in the step 4), in combination with the idioplasm resource population phenotype values of the specific characters in the step 1), detecting the superior interaction effect affecting the phenotypic variation among SNP loci in the key genes, determining a key breeding gene module affecting the specific phenotypic character, evaluating the genetic effect of each genotype combination in the key breeding gene module on the phenotypic variation, and identifying the optimal genotype combination in the key breeding gene module;
the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
2. The method of claim 1, wherein the number of forest germplasm resources is greater than 200 plants per step.
3. The method of claim 1, wherein the SNP genotype frequencies in steps 1), 2), and 5) are greater than 10%.
4. The method according to claim 1, wherein the method for performing the whole genome association analysis in step 1) is a mixed linear model in TASSEL v5.0, and the significance level of the association between each SNP site and a specific phenotype is obtained by using software, so as to obtain a P value; multiple hypothesis testing is performed on the P value by using 1/n (n represents the total number of SNP in the whole genome; bonferroni method), and SNP sites with the P value less than 1/n are screened, so that SNP sites which are obviously associated with specific characters are determined.
5. The method of claim 1, wherein the annotated transcription module of step 2) comprises a protein-encoding gene, long non-coding RNA, and microrna.
6. The method according to claim 1, wherein the software for calculating the pearson correlation coefficient r in step 4) comprises SPSS v19.0.
7. The method according to claim 1, wherein the software for detecting the epistatic interaction effect in step 5) is an epinp 1 package in the epiSNP software, and the significance P value associated with the phenotype between SNPs is calculated by the software; SNP-SNP interaction pairs that are significantly associated with a particular trait are determined using a screening criteria of P.ltoreq.0.001.
8. The method according to claim 1, wherein in the step 5), only SNPs having a significant episodic interaction relationship can have their corresponding key genes incorporated into the key breeding gene module.
9. The method of claim 1, wherein in step 5), the frequency of each genotype combination in the critical breeding gene module is greater than 10% of the population of germplasm resources when evaluating the phenotypic inheritance of each genotype combination.
10. Use of the method of any one of claims 1 to 9 in molecular design breeding of forests.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311097273.8A CN117133354B (en) | 2023-08-29 | 2023-08-29 | Method for efficiently identifying key breeding gene modules of forest tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311097273.8A CN117133354B (en) | 2023-08-29 | 2023-08-29 | Method for efficiently identifying key breeding gene modules of forest tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117133354A true CN117133354A (en) | 2023-11-28 |
CN117133354B CN117133354B (en) | 2024-06-14 |
Family
ID=88859460
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311097273.8A Active CN117133354B (en) | 2023-08-29 | 2023-08-29 | Method for efficiently identifying key breeding gene modules of forest tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117133354B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117831637A (en) * | 2024-03-05 | 2024-04-05 | 中国农业科学院作物科学研究所 | Genotype and environment interaction method based on machine learning and application thereof |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144664A1 (en) * | 2003-05-28 | 2005-06-30 | Pioneer Hi-Bred International, Inc. | Plant breeding method |
WO2008087185A1 (en) * | 2007-01-17 | 2008-07-24 | Syngenta Participations Ag | Process for selecting individuals and designing a breeding program |
US20100037342A1 (en) * | 2008-08-01 | 2010-02-11 | Monsanto Technology Llc | Methods and compositions for breeding plants with enhanced yield |
CN104293774A (en) * | 2013-07-16 | 2015-01-21 | 北京林业大学 | Functional SSR labels obviously related with wood quality characters in populus CesAs gene, and applications and kit thereof |
US20160289696A1 (en) * | 2015-04-03 | 2016-10-06 | The United States Of America, As Represented By The Secretary Of Agriculture | Mutant sorghum bicolor having enhanced seed yield |
CN106599607A (en) * | 2016-12-13 | 2017-04-26 | 北京林业大学 | Method for constructing photosynthetic pathway gene regulation network |
CN108504757A (en) * | 2017-04-14 | 2018-09-07 | 北京林业大学 | Probe into the method and system of regulatory mechanism between the gene and miRNAs that participate in forest miRNAs biologies formation access |
CN108517368A (en) * | 2017-04-21 | 2018-09-11 | 北京林业大学 | The method and system of Chinese white poplar LncRNA Pto-CRTG and its target gene Pto-CAD5 interactions are parsed using epistasis |
CN109545278A (en) * | 2018-12-18 | 2019-03-29 | 北京林业大学 | A kind of method of plant identification lncRNA and interaction of genes |
CN111863127A (en) * | 2020-07-17 | 2020-10-30 | 北京林业大学 | Method for constructing genetic control network of plant transcription factor to target gene |
CN112204156A (en) * | 2018-05-25 | 2021-01-08 | 先锋国际良种公司 | Systems and methods for improving breeding by modulating recombination rates |
CN113025741A (en) * | 2021-03-09 | 2021-06-25 | 北京林业大学 | Haplotype-epistasis site polymerization breeding module for breeding new poplar pulp variety and application thereof |
-
2023
- 2023-08-29 CN CN202311097273.8A patent/CN117133354B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050144664A1 (en) * | 2003-05-28 | 2005-06-30 | Pioneer Hi-Bred International, Inc. | Plant breeding method |
WO2008087185A1 (en) * | 2007-01-17 | 2008-07-24 | Syngenta Participations Ag | Process for selecting individuals and designing a breeding program |
US20100037342A1 (en) * | 2008-08-01 | 2010-02-11 | Monsanto Technology Llc | Methods and compositions for breeding plants with enhanced yield |
CN104293774A (en) * | 2013-07-16 | 2015-01-21 | 北京林业大学 | Functional SSR labels obviously related with wood quality characters in populus CesAs gene, and applications and kit thereof |
US20160289696A1 (en) * | 2015-04-03 | 2016-10-06 | The United States Of America, As Represented By The Secretary Of Agriculture | Mutant sorghum bicolor having enhanced seed yield |
CN106599607A (en) * | 2016-12-13 | 2017-04-26 | 北京林业大学 | Method for constructing photosynthetic pathway gene regulation network |
CN108504757A (en) * | 2017-04-14 | 2018-09-07 | 北京林业大学 | Probe into the method and system of regulatory mechanism between the gene and miRNAs that participate in forest miRNAs biologies formation access |
CN108517368A (en) * | 2017-04-21 | 2018-09-11 | 北京林业大学 | The method and system of Chinese white poplar LncRNA Pto-CRTG and its target gene Pto-CAD5 interactions are parsed using epistasis |
CN112204156A (en) * | 2018-05-25 | 2021-01-08 | 先锋国际良种公司 | Systems and methods for improving breeding by modulating recombination rates |
CN109545278A (en) * | 2018-12-18 | 2019-03-29 | 北京林业大学 | A kind of method of plant identification lncRNA and interaction of genes |
CN111863127A (en) * | 2020-07-17 | 2020-10-30 | 北京林业大学 | Method for constructing genetic control network of plant transcription factor to target gene |
CN113025741A (en) * | 2021-03-09 | 2021-06-25 | 北京林业大学 | Haplotype-epistasis site polymerization breeding module for breeding new poplar pulp variety and application thereof |
Non-Patent Citations (2)
Title |
---|
JN SI,等: "Genetic interactions among Pto-miR319 family members and their targets influence growth and wood properties in Populus tomentosa", 《MOLECULAR GENETICS AND GENOMICS》, vol. 295, no. 4, 31 July 2020 (2020-07-31), pages 885 - 870 * |
李鹏,等: "林木全基因组关联分析研究进展与展望", 《中国科学:生命科学》, vol. 50, no. 2, 31 December 2020 (2020-12-31), pages 144 - 153 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117831637A (en) * | 2024-03-05 | 2024-04-05 | 中国农业科学院作物科学研究所 | Genotype and environment interaction method based on machine learning and application thereof |
CN117831637B (en) * | 2024-03-05 | 2024-05-28 | 中国农业科学院作物科学研究所 | Genotype and environment interaction method based on machine learning and application thereof |
Also Published As
Publication number | Publication date |
---|---|
CN117133354B (en) | 2024-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109196123B (en) | SNP molecular marker combination for rice genotyping and application thereof | |
CN108779459B (en) | Cotton whole genome SNP chip and application thereof | |
Govindaraj et al. | Importance of genetic diversity assessment in crop plants and its recent advances: an overview of its analytical perspectives | |
CN102747138B (en) | Rice whole genome SNP chip and application thereof | |
CN109545278B (en) | Method for identifying interaction between plant lncRNA and gene | |
US20210285063A1 (en) | Genome-wide maize snp array and use thereof | |
CN108998550B (en) | SNP molecular marker for rice genotyping and application thereof | |
CN117133354B (en) | Method for efficiently identifying key breeding gene modules of forest tree | |
CN111863127A (en) | Method for constructing genetic control network of plant transcription factor to target gene | |
CN113795597A (en) | Soybean SNP typing detection chip and application thereof in molecular breeding and basic research | |
CN111979346B (en) | Improved variety peach breeding method based on KASP molecular marker | |
CN110846429A (en) | Corn whole genome InDel chip and application thereof | |
CN108517368B (en) | Method and system for analyzing interaction relation of LncRNA Pto-CRTG and target gene Pto-CAD5 of Chinese white poplar by using epistasis | |
CN108179220A (en) | The KASP labels of wheat dwarf stem gene Rht12 close linkages and its application | |
CN118186103A (en) | Lateolabrax japonicus 100k liquid phase chip and application thereof | |
CN117089644A (en) | MNP (MNP) marking site for identification of arundo donax varieties, primer composition, kit and application of MNP marking site | |
CN115141893B (en) | Molecular marker group containing 7 molecular markers for predicting dry matter content of kiwi fruits, application of molecular marker group and kit | |
CN108416189B (en) | Crop variety heterosis mode identification method based on molecular marker technology | |
CN115820923A (en) | Molecular marker combination for constructing sugarcane DNA fingerprint and application thereof | |
CN113789407B (en) | SNP molecular marker combination for cyperus esculentus genotyping and application thereof | |
CN112359102A (en) | Method for constructing tobacco core germplasm based on genomics and application thereof | |
CN117095748B (en) | Method for constructing plant miRNA genetic regulation pathway | |
CN111607659A (en) | SNP molecular marker associated with hemicellulose content of ramie and application thereof | |
CN118006822B (en) | Probe combination, detection system and application of water-saving drought-resistant rice breeding chip (WDR 6K) | |
CN112251530B (en) | SSR (simple sequence repeat) marker associated with ramie fiber fineness and application thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |