CN117133354A - Method for efficiently identifying key breeding gene modules of forest tree - Google Patents

Method for efficiently identifying key breeding gene modules of forest tree Download PDF

Info

Publication number
CN117133354A
CN117133354A CN202311097273.8A CN202311097273A CN117133354A CN 117133354 A CN117133354 A CN 117133354A CN 202311097273 A CN202311097273 A CN 202311097273A CN 117133354 A CN117133354 A CN 117133354A
Authority
CN
China
Prior art keywords
snp
key
breeding
genes
specific
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311097273.8A
Other languages
Chinese (zh)
Other versions
CN117133354B (en
Inventor
权明洋
张德强
杜庆章
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Forestry University
Original Assignee
Beijing Forestry University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Forestry University filed Critical Beijing Forestry University
Priority to CN202311097273.8A priority Critical patent/CN117133354B/en
Publication of CN117133354A publication Critical patent/CN117133354A/en
Application granted granted Critical
Publication of CN117133354B publication Critical patent/CN117133354B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6876Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
    • C12Q1/6888Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
    • C12Q1/6895Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/13Plant traits
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/156Polymorphic or mutational markers

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biotechnology (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Molecular Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Botany (AREA)
  • General Engineering & Computer Science (AREA)
  • Mycology (AREA)
  • Biochemistry (AREA)
  • Microbiology (AREA)
  • Immunology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The application provides a method for efficiently identifying key breeding gene modules of trees, and relates to the technical field of molecular genetics. The method can accurately and efficiently identify the key breeding gene module of the important forest trait, systematically evaluate the genetic effect of each genotype combination in the breeding module on the phenotypic variation, determine the genotype combination of the optimal key breeding gene module, and be widely applied to the accurate breeding of the important forest trait in the seedling stage and the genetic improvement process of the forest molecules.

Description

Method for efficiently identifying key breeding gene modules of forest tree
Technical Field
The application relates to the technical field of molecular genetics, in particular to a method for efficiently identifying key breeding gene modules of woods.
Background
The important economic characters of the forest are complex quantitative characters regulated by multiple genes, and the forest has the characteristics of multiple years of growth, strong wild property, wide distribution and the like, so that the genetic basis of the important characters is unclear, and the regulation mechanism is unknown. In recent years, with the gradual penetration of molecular genetics and genomics research, a series of key genes with important breeding values are discovered; however, the key genes are difficult to be widely applied to forest molecular genetic improvement practice and important character seedling stage precise breeding, and the main reasons of the key genes include the following three aspects: (1) The natural resources of the forest are widely distributed, so that the phenotypic variation and the allelic variation of the genome of the germplasm resources are abundant, and the genetic effect of the allelic locus of the genome affecting the phenotypic variation of the important characters cannot be fully known in the prior art; (2) The complex forest traits are subjected to the joint regulation of multiple genes, the genetic regulation mechanism is very complex, most of current researches only concern the biological functions of single genes, and neglect the joint genetic effect of a breeding module consisting of multiple genes on phenotypic variation; (3) The existing molecular genetics technical means lack of accurate identification of key breeding modules of important traits and deep analysis of genetic effects thereof, so that the development of molecular genetic improvement on important economic traits of trees and the screening efficiency and the screening precision in seedling stage are low. Therefore, the lack of a strategy for efficiently identifying key breeding gene modules of the forest in the prior art influences the establishment of a tree molecular design breeding technology system and the effective implementation of tree molecular genetic improvement.
Disclosure of Invention
The application aims to provide a method for efficiently identifying key breeding gene modules of trees, which can be used for efficiently identifying key breeding gene modules of important characteristics of the trees, accurately analyzing the genetic effect of each genotype combination in the breeding modules on phenotype variation, and can be widely applied to efficient screening of tree important characteristics in seedling stage, thereby providing important technical support for tree molecular design breeding.
The application provides a method for efficiently identifying key breeding gene modules of woods, which comprises the following steps:
1) Carrying out whole genome association analysis on genome Single Nucleotide Polymorphism (SNP) genotype data of each individual of a wood germplasm resource group to be detected and phenotype values of each individual of specific characters of the wood to be detected in the germplasm resource group, and determining SNP loci obviously associated with the characteristic characters;
the determined conditions include: SNP genotype loci in genome are obviously associated with phenotypic traits, so as to reach significance level in biology statistics;
2) Performing functional annotation on a transcription module where the SNP locus which is obviously associated with the specific character in the step 1) is located, and defining the transcription module as a candidate gene;
3) Determining the expression quantity data of the candidate genes in the step 2) in each individual of the forest germplasm resource group to be detected;
4) Detecting a pearson correlation coefficient r between the population expression level of the candidate genes in the step 3) and the population phenotype value of the specific trait in the step 1), detecting the 'expression and phenotype' correlation between the candidate genes and the specific trait, determining the candidate genes with highly correlated expression patterns with the phenotypic variation of the specific trait, defining the candidate genes as key genes, and indicating that the expression level of the key genes greatly influences the phenotypic variation of the specific trait;
the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4;
5) Based on the SNP genotype data obviously associated in the key genes in the step 4), in combination with the idioplasm resource population phenotype values of the specific characters in the step 1), detecting the superior interaction effect affecting the phenotypic variation among SNP loci in the key genes, determining a key breeding gene module affecting the specific phenotypic character, evaluating the genetic effect of each genotype combination in the key breeding gene module on the phenotypic variation, and identifying the optimal genotype combination in the key breeding gene module;
the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
Preferably, the number of the forest germplasm resource groups in each step is more than 200 plants.
Preferably, the SNP genotype frequency in steps 1), 2) and 5) is greater than 10%.
Preferably, the method for performing the whole genome association analysis in the step 1) is a mixed linear model in TASSEL v5.0, and the significance level of the association between each SNP site and a specific phenotype is obtained by using software to obtain a P value; multiple hypothesis testing is performed on the P value by using 1/n (n represents the total number of SNP in the whole genome; bonferroni method), and SNP sites with the P value less than 1/n are screened, so that SNP sites which are obviously associated with specific characters are determined.
Preferably, the annotated transcription module in step 2) includes a protein coding gene, long non-coding RNA, and microRNA.
Preferably, the software for calculating the pearson correlation coefficient r in the step 4) includes SPSS v19.0.
Preferably, the software for detecting the epistatic interaction effect in the step 5) is an epi np1 package in epi SNP software, and the significance P value associated with phenotype between SNPs is calculated by the software; SNP-SNP interaction pairs that are significantly associated with a particular trait are determined using a screening criteria of P.ltoreq.0.001.
Preferably, in the step 5), only when SNPs with significant episodic interactions are involved, the corresponding key genes can be incorporated into the key breeding gene module.
Preferably, in said step 5), the frequency of each genotype combination in the critical breeding gene module is greater than 10% of the germplasm resources population when evaluating the phenotypic inheritance effect of each genotype combination.
Another object of the present application is to provide the application of the above method in molecular design breeding of forest trees.
The application provides a method for efficiently identifying key breeding gene modules of trees. The key breeding genes identified by the prior art are difficult to apply to the genetic improvement of the tree molecules and the accurate screening of important character seedling stage, and the reason is that the prior researches fail to fully recognize the allelic variation rule of the tree germplasm resource group, and the system identification and the deep analysis of the key breeding module with strong phenotypic variation genetic effect of the important character are lacking. Therefore, the method for efficiently identifying the key breeding gene module of the forest can accurately and efficiently identify the key breeding gene module of the forest important character, deeply analyze the genetic effect of each genotype combination in the breeding module on the phenotype variation, can be widely applied to the accurate screening of the important character in the seedling stage and the genetic improvement process of the forest molecules, and provides important technical support for the design and breeding of the forest molecules.
By adopting the method provided by the application, the key breeding gene module of the xylem xylose content of the populus tomentosa is PtoGAO1-PtoCAMTA5-PtoC3H3-PtoDOF2, and the chr1_34278210-chr6_5959112-chr10_12844616-chr2_21224438 genotype combination AA/AG/GT/AT corresponding to the xylem xylose content is found to be the highest, the xylem xylose content corresponding to the GA/AG/TT/AT genotype combination is the lowest, and the xylem xylose content can be rapidly screened in the populus tomentosa seedling stage.
Drawings
FIG. 1 shows phenotype effect values of each genotype in a populus tomentosa xylose content breeding gene module;
FIG. 2 is an analytical flow chart of the identification method of the present application.
Detailed Description
The application provides a method for efficiently identifying key breeding gene modules of woods, which comprises the following steps:
1) Carrying out whole genome association analysis on genome Single Nucleotide Polymorphism (SNP) genotype data of each individual of a wood germplasm resource group to be detected and phenotype values of each individual of specific characters of the wood to be detected in the germplasm resource group, and determining SNP loci obviously associated with the characteristic characters;
the determined conditions include: SNP genotype loci in genome are obviously associated with phenotypic traits, so as to reach significance level in biology statistics;
2) Performing functional annotation on a transcription module where the SNP locus which is obviously associated with the specific character in the step 1) is located, and defining the transcription module as a candidate gene;
3) Determining the expression quantity data of the candidate genes in the step 2) in each individual of the forest germplasm resource group to be detected;
4) Detecting a pearson correlation coefficient r between the population expression level of the candidate genes in the step 3) and the population phenotype value of the specific trait in the step 1), detecting the 'expression and phenotype' correlation between the candidate genes and the specific trait, determining the candidate genes with highly correlated expression patterns with the phenotypic variation of the specific trait, defining the candidate genes as key genes, and indicating that the expression level of the key genes greatly influences the phenotypic variation of the specific trait;
the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4;
5) Based on the SNP genotype data obviously associated in the key genes in the step 4), in combination with the idioplasm resource population phenotype values of the specific characters in the step 1), detecting the superior interaction effect affecting the phenotypic variation among SNP loci in the key genes, determining a key breeding gene module affecting the specific phenotypic character, evaluating the genetic effect of each genotype combination in the key breeding gene module on the phenotypic variation, and identifying the optimal genotype combination in the key breeding gene module;
the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
The application firstly obtains SNP genotype data of each individual whole genome of a forest germplasm resource group to be detected, and the SNP genotype data of the forest germplasm resource group to be detected is obtained based on the forest whole genome resequencing. The number of the forest germplasm resource groups is more than 200, and the number is the same and is not repeated. The application requires that the SNP genotype frequency is greater than 10%, and is the same and not described in detail.
The application also needs to obtain the phenotype value of each individual of the specific character of the forest tree to be detected in the germplasm resource group, and the method for obtaining the phenotype value of the specific character is not particularly limited.
Carrying out whole genome association analysis on genome SNP genotype data of each individual of a forest germplasm resource group to be detected and phenotype values of specific characters of each individual of the germplasm resource group, and determining SNP loci obviously associated with the characteristic characters; the determined conditions include: SNP genotypic loci in genomes are significantly associated with phenotypic traits, reaching significance levels in biometrics. According to the method for carrying out whole genome association analysis, the mixed linear model in TASSEL v5.0 is optimized, and the significance level of association between each SNP locus and a specific phenotype is obtained by using software to obtain a P value; multiple hypothesis testing is performed on the P value by using 1/n (n represents the total number of SNP in the whole genome; bonferroni method), and SNP sites with the P value less than 1/n are screened, so that SNP sites which are obviously associated with specific characters are determined.
The application carries out functional annotation on the transcription module where the SNP locus which is obviously associated with the specific character is located, and defines the transcription module as a candidate gene. The method of obtaining the annotated transcription module is not limited in the present application, and the annotated transcription module preferably includes, but is not limited to, protein coding genes, long non-coding RNAs (lncRNA), micro RNAs (miRNA), and the like.
The application determines the expression quantity data of the candidate genes in each individual of the forest germplasm resource group to be detected. The method for obtaining the population expression level of the candidate gene is not particularly limited in the present application.
The application detects the Pelson correlation coefficient r between the population expression quantity of candidate genes and the population phenotype value of the specific character, detects the 'expression and phenotype' correlation between the candidate genes and the specific character, determines the candidate genes with highly correlated expression modes with the phenotype variation of the specific character, defines the candidate genes as key genes, and indicates that the expression level of the key genes influences the phenotype variation of the specific character to a great extent; the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4. The application preferably uses SPSS v19.0 to calculate the pearson correlation coefficient r.
The application is based on SNP genotype data which are obviously associated in a key gene, combines with a specific character germplasm resource group phenotype value, detects the superior interaction effect which affects the phenotype variation among the obviously associated SNP loci in the key gene, determines a key breeding gene module which affects the specific phenotype character, evaluates the genetic effect of each genotype combination in the key breeding gene module on the phenotype variation, and identifies the optimal genotype combination in the key breeding gene module; the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
In the application, the software for detecting the superior interaction effect is preferably an EPISNP1 program package in the epi SNP software, and the significance P value associated with the phenotype between SNP and SNP is obtained by software calculation; SNP-SNP interaction pairs that are significantly associated with a particular trait are determined using a screening criteria of P.ltoreq.0.001. In the present application, only when there is a key gene corresponding to a SNP having a significant interaction relationship, it can be incorporated into a key breeding gene module. In evaluating the phenotypic inheritance of each genotype combination within a critical breeding gene module, it is preferred that the frequency of each genotype combination be greater than 10% of the germplasm resource population.
Another object of the present application is to provide the application of the above method in molecular design breeding of forest trees.
The method for efficiently identifying key breeding gene modules of woods according to the application is described in further detail below with reference to specific examples, and the technical scheme of the application includes but is not limited to the following examples.
Example 1
By using the method for efficiently identifying the key breeding gene module of the forest, the key breeding gene module of the xylem xylose content of the populus tomentosa is identified, and the phenotypic effect of each genotype combination in the gene module on the xylem xylose content is analyzed, so that the method is used for screening the seedling stage of the character and establishing a tree molecular design breeding technology system.
Step S1, obtaining genome-wide SNP genotype data of populus tomentosa germplasm resource group (303 individuals) based on genome-wide resequencing technology, wherein the genome-wide SNP genotype data comprises the following specific steps:
extracting leaf DNAs of all individuals from a resource group 303 individuals in populus tomentosa as a material for genome resequencing, performing sequence comparison by taking a populus tomentosa reference genome as a reference to obtain whole genome SNP data and the position of the whole genome SNP data in the genome, and screening SNP with genotype frequency of more than 10% for subsequent analysis to obtain 12,800,000 SNP data.
Step S2, obtaining xylem xylose content of populus tomentosa germplasm resource groups (303 individuals), wherein the specific operation steps are as follows:
collecting mature xylem materials of each individual of the populus tomentosa germplasm resource group, immediately placing the materials into liquid nitrogen (-196 ℃) for preservation after collection, and determining the xylem xylose content of the populus tomentosa germplasm resource group by adopting a high performance liquid chromatography according to the specifications of national standard methods GB2677.7-81, GB2677.8-81 and GB 2677.10-81; the analysis shows that the xylem xylose content ranges from 2.03% to 31.95% in populus tomentosa germplasm resource groups, the average value is 14.08%, and the xylem xylose content accords with normal distribution, and is suitable for carrying out whole genome association analysis.
Step S3, carrying out whole Genome association analysis (Genome-wide association Study, GWAS) on the whole Genome SNP data of the populus tomentosa germplasm resource group in step S1 and the xylose data of the populus tomentosa germplasm resource group in step S2 by using a mixed Linear Model (Mix Linear Model) in TASSEL v5.0 to obtain a significance value P of the xylose content of each SNP and the group, and screening SNP loci smaller than the P value as remarkably associated loci by taking P <7.81E-08 (1/n, n is the number of the whole Genome SNP and accords with a Bonferroni test method) as a screening standard; as a result, 14 SNP sites in total were found to form a remarkable correlation with xylem xylose content of populus tomentosa (P < 7.81E-08), and specific results are shown in Table 1.
Table 1 Gene information significantly correlated with xylem xylose content of Populus tomentosa
And S4, carrying out gene annotation on the obviously-related SNP loci based on the coding gene annotation information of the populus tomentosa genome protein, namely positioning the genes of the obviously-related SNP loci, and carrying out annotation to obtain 9 candidate genes, wherein the specific results are shown in Table 1.
In step S5, since the trait of interest in this example is xylem xylose content, the expression level data of 9 candidate genes obtained in step S4 in xylem of populus tomentosa germplasm resource group needs to be detected, and the specific steps are as follows:
collecting mature xylem of populus tomentosa germplasm resource group (303 individuals), storing in liquid nitrogen after collecting, extracting the collected mature xylem RNA by using a Plant Qiagen RNAeasy kit (Qiagen China, shanghai, china) kit, and carrying out transcriptome sequencing by a biological company after quality evaluation to obtain the expression quantity of 9 candidate genes in the populus tomentosa germplasm resource group xylem obtained in the step S4.
Step S6, calculating the pearson correlation coefficient r of expression and phenotype between the group expression quantity of 9 candidate genes in the step S5 and the group xylose content in the step S2 by using SPSS v19.0 software, and finding that the expression level of total 6 genes in the group is highly correlated with the group xylose content (r >0.4 or r < -0.4), wherein the 6 key genes are respectively: ptoARF8, ptoWRKY41, ptoGAO1, ptoDOF2, ptoCAMTA5 and PtoC3H3, and the specific information is shown in Table 1.
TABLE 2 significant marker locus to locus episodic interactions (P < 0.001)
TABLE 3 analysis of Key seed-breeding Module Effect of xylem xylose content of Populus tomentosa
Step S7, combining the SNP genotype data which are obviously associated and correspond to the 6 key genes in the step S6 with the xylem xylose content data of the population, detecting the SNP-SNP interaction effect by using an EPISNP1 program package in the epi SNP software, and finding that a obvious upper-level interaction relationship (P < 0.001) exists among four obviously associated SNPs among obvious marker loci, namely, chr1_34278210, chr6_5959112, chr10_12844616 and Chr2_21224438, which indicates that upper-level interaction exists among the 4 key genes PtoGAO1, ptoCAMTA5, ptoC3H3 and PtoDOF2 (Table 2), and defining that the 4 key genes are incorporated into a key breeding gene module, namely, the xylem xylose content key breeding gene module of aspen is PtoGAO1-PtoCAMTA5-PtoC3H3-PtoDOF2. Further, each genotype combination (each combination has a minimum frequency of more than 10%) of the corresponding significantly associated SNPs (chr1_ 34278210-chr6_5959112-chr10_12844616-chr2_ 21224438) in the breeding module consisting of these 5 key genes was evaluated, and the genetic effect on xylem xylose content was found to be highest for the AA/AG/GT/AT genotype combination, and lowest for the GA/AG/TT/AT genotype combination, as shown in fig. 1 and table 3 for specific information.
From the above, the xylem xylose content breeding gene module of populus tomentosa is as follows: ptoGAO1-PtoCAMTA5-PtoC3H3-PtoDOF2, wherein a genotype combination AA/AG/GT/AT composed of 5 breeding genes with obvious association SNP Chr1_34278210-Chr6_5959112-Chr10_12844616-Chr2_21224438 has the highest xylem xylose content, and a GA/AG/TT/AT genotype combination has the lowest xylem xylose content.
The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims (10)

1. A method for efficiently identifying key breeding gene modules of trees comprises the following steps:
1) Carrying out whole genome association analysis on the genome single nucleotide polymorphism genotype data of each individual in the wood germplasm resource group to be detected and the phenotype value of each individual in the germplasm resource group of the wood specific character to be detected, and determining SNP loci obviously associated with the characteristic characters;
the determined conditions include: SNP genotype loci in genome are obviously associated with phenotypic traits, so as to reach significance level in biology statistics;
2) Performing functional annotation on a transcription module where the SNP locus which is obviously associated with the specific character in the step 1) is located, and defining the transcription module as a candidate gene;
3) Determining the expression quantity data of the candidate genes in the step 2) in each individual of the forest germplasm resource group to be detected;
4) Detecting a pearson correlation coefficient r between the population expression level of the candidate genes in the step 3) and the population phenotype value of the specific trait in the step 1), detecting the 'expression and phenotype' correlation between the candidate genes and the specific trait, determining the candidate genes with highly correlated expression patterns with the phenotypic variation of the specific trait, defining the candidate genes as key genes, and indicating that the expression level of the key genes greatly influences the phenotypic variation of the specific trait;
the determined conditions include: the pearson correlation coefficient r >0.4 or r < -0.4;
5) Based on the SNP genotype data obviously associated in the key genes in the step 4), in combination with the idioplasm resource population phenotype values of the specific characters in the step 1), detecting the superior interaction effect affecting the phenotypic variation among SNP loci in the key genes, determining a key breeding gene module affecting the specific phenotypic character, evaluating the genetic effect of each genotype combination in the key breeding gene module on the phenotypic variation, and identifying the optimal genotype combination in the key breeding gene module;
the determined conditions include: the episodic interaction combinatorial relationship affecting a particular phenotypic variation needs to meet a level of significance in biometrics.
2. The method of claim 1, wherein the number of forest germplasm resources is greater than 200 plants per step.
3. The method of claim 1, wherein the SNP genotype frequencies in steps 1), 2), and 5) are greater than 10%.
4. The method according to claim 1, wherein the method for performing the whole genome association analysis in step 1) is a mixed linear model in TASSEL v5.0, and the significance level of the association between each SNP site and a specific phenotype is obtained by using software, so as to obtain a P value; multiple hypothesis testing is performed on the P value by using 1/n (n represents the total number of SNP in the whole genome; bonferroni method), and SNP sites with the P value less than 1/n are screened, so that SNP sites which are obviously associated with specific characters are determined.
5. The method of claim 1, wherein the annotated transcription module of step 2) comprises a protein-encoding gene, long non-coding RNA, and microrna.
6. The method according to claim 1, wherein the software for calculating the pearson correlation coefficient r in step 4) comprises SPSS v19.0.
7. The method according to claim 1, wherein the software for detecting the epistatic interaction effect in step 5) is an epinp 1 package in the epiSNP software, and the significance P value associated with the phenotype between SNPs is calculated by the software; SNP-SNP interaction pairs that are significantly associated with a particular trait are determined using a screening criteria of P.ltoreq.0.001.
8. The method according to claim 1, wherein in the step 5), only SNPs having a significant episodic interaction relationship can have their corresponding key genes incorporated into the key breeding gene module.
9. The method of claim 1, wherein in step 5), the frequency of each genotype combination in the critical breeding gene module is greater than 10% of the population of germplasm resources when evaluating the phenotypic inheritance of each genotype combination.
10. Use of the method of any one of claims 1 to 9 in molecular design breeding of forests.
CN202311097273.8A 2023-08-29 2023-08-29 Method for efficiently identifying key breeding gene modules of forest tree Active CN117133354B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311097273.8A CN117133354B (en) 2023-08-29 2023-08-29 Method for efficiently identifying key breeding gene modules of forest tree

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311097273.8A CN117133354B (en) 2023-08-29 2023-08-29 Method for efficiently identifying key breeding gene modules of forest tree

Publications (2)

Publication Number Publication Date
CN117133354A true CN117133354A (en) 2023-11-28
CN117133354B CN117133354B (en) 2024-06-14

Family

ID=88859460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311097273.8A Active CN117133354B (en) 2023-08-29 2023-08-29 Method for efficiently identifying key breeding gene modules of forest tree

Country Status (1)

Country Link
CN (1) CN117133354B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831637A (en) * 2024-03-05 2024-04-05 中国农业科学院作物科学研究所 Genotype and environment interaction method based on machine learning and application thereof

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144664A1 (en) * 2003-05-28 2005-06-30 Pioneer Hi-Bred International, Inc. Plant breeding method
WO2008087185A1 (en) * 2007-01-17 2008-07-24 Syngenta Participations Ag Process for selecting individuals and designing a breeding program
US20100037342A1 (en) * 2008-08-01 2010-02-11 Monsanto Technology Llc Methods and compositions for breeding plants with enhanced yield
CN104293774A (en) * 2013-07-16 2015-01-21 北京林业大学 Functional SSR labels obviously related with wood quality characters in populus CesAs gene, and applications and kit thereof
US20160289696A1 (en) * 2015-04-03 2016-10-06 The United States Of America, As Represented By The Secretary Of Agriculture Mutant sorghum bicolor having enhanced seed yield
CN106599607A (en) * 2016-12-13 2017-04-26 北京林业大学 Method for constructing photosynthetic pathway gene regulation network
CN108504757A (en) * 2017-04-14 2018-09-07 北京林业大学 Probe into the method and system of regulatory mechanism between the gene and miRNAs that participate in forest miRNAs biologies formation access
CN108517368A (en) * 2017-04-21 2018-09-11 北京林业大学 The method and system of Chinese white poplar LncRNA Pto-CRTG and its target gene Pto-CAD5 interactions are parsed using epistasis
CN109545278A (en) * 2018-12-18 2019-03-29 北京林业大学 A kind of method of plant identification lncRNA and interaction of genes
CN111863127A (en) * 2020-07-17 2020-10-30 北京林业大学 Method for constructing genetic control network of plant transcription factor to target gene
CN112204156A (en) * 2018-05-25 2021-01-08 先锋国际良种公司 Systems and methods for improving breeding by modulating recombination rates
CN113025741A (en) * 2021-03-09 2021-06-25 北京林业大学 Haplotype-epistasis site polymerization breeding module for breeding new poplar pulp variety and application thereof

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144664A1 (en) * 2003-05-28 2005-06-30 Pioneer Hi-Bred International, Inc. Plant breeding method
WO2008087185A1 (en) * 2007-01-17 2008-07-24 Syngenta Participations Ag Process for selecting individuals and designing a breeding program
US20100037342A1 (en) * 2008-08-01 2010-02-11 Monsanto Technology Llc Methods and compositions for breeding plants with enhanced yield
CN104293774A (en) * 2013-07-16 2015-01-21 北京林业大学 Functional SSR labels obviously related with wood quality characters in populus CesAs gene, and applications and kit thereof
US20160289696A1 (en) * 2015-04-03 2016-10-06 The United States Of America, As Represented By The Secretary Of Agriculture Mutant sorghum bicolor having enhanced seed yield
CN106599607A (en) * 2016-12-13 2017-04-26 北京林业大学 Method for constructing photosynthetic pathway gene regulation network
CN108504757A (en) * 2017-04-14 2018-09-07 北京林业大学 Probe into the method and system of regulatory mechanism between the gene and miRNAs that participate in forest miRNAs biologies formation access
CN108517368A (en) * 2017-04-21 2018-09-11 北京林业大学 The method and system of Chinese white poplar LncRNA Pto-CRTG and its target gene Pto-CAD5 interactions are parsed using epistasis
CN112204156A (en) * 2018-05-25 2021-01-08 先锋国际良种公司 Systems and methods for improving breeding by modulating recombination rates
CN109545278A (en) * 2018-12-18 2019-03-29 北京林业大学 A kind of method of plant identification lncRNA and interaction of genes
CN111863127A (en) * 2020-07-17 2020-10-30 北京林业大学 Method for constructing genetic control network of plant transcription factor to target gene
CN113025741A (en) * 2021-03-09 2021-06-25 北京林业大学 Haplotype-epistasis site polymerization breeding module for breeding new poplar pulp variety and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JN SI,等: "Genetic interactions among Pto-miR319 family members and their targets influence growth and wood properties in Populus tomentosa", 《MOLECULAR GENETICS AND GENOMICS》, vol. 295, no. 4, 31 July 2020 (2020-07-31), pages 885 - 870 *
李鹏,等: "林木全基因组关联分析研究进展与展望", 《中国科学:生命科学》, vol. 50, no. 2, 31 December 2020 (2020-12-31), pages 144 - 153 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117831637A (en) * 2024-03-05 2024-04-05 中国农业科学院作物科学研究所 Genotype and environment interaction method based on machine learning and application thereof
CN117831637B (en) * 2024-03-05 2024-05-28 中国农业科学院作物科学研究所 Genotype and environment interaction method based on machine learning and application thereof

Also Published As

Publication number Publication date
CN117133354B (en) 2024-06-14

Similar Documents

Publication Publication Date Title
CN109196123B (en) SNP molecular marker combination for rice genotyping and application thereof
CN108779459B (en) Cotton whole genome SNP chip and application thereof
Govindaraj et al. Importance of genetic diversity assessment in crop plants and its recent advances: an overview of its analytical perspectives
CN102747138B (en) Rice whole genome SNP chip and application thereof
CN109545278B (en) Method for identifying interaction between plant lncRNA and gene
US20210285063A1 (en) Genome-wide maize snp array and use thereof
CN108998550B (en) SNP molecular marker for rice genotyping and application thereof
CN117133354B (en) Method for efficiently identifying key breeding gene modules of forest tree
CN111863127A (en) Method for constructing genetic control network of plant transcription factor to target gene
CN113795597A (en) Soybean SNP typing detection chip and application thereof in molecular breeding and basic research
CN111979346B (en) Improved variety peach breeding method based on KASP molecular marker
CN110846429A (en) Corn whole genome InDel chip and application thereof
CN108517368B (en) Method and system for analyzing interaction relation of LncRNA Pto-CRTG and target gene Pto-CAD5 of Chinese white poplar by using epistasis
CN108179220A (en) The KASP labels of wheat dwarf stem gene Rht12 close linkages and its application
CN118186103A (en) Lateolabrax japonicus 100k liquid phase chip and application thereof
CN117089644A (en) MNP (MNP) marking site for identification of arundo donax varieties, primer composition, kit and application of MNP marking site
CN115141893B (en) Molecular marker group containing 7 molecular markers for predicting dry matter content of kiwi fruits, application of molecular marker group and kit
CN108416189B (en) Crop variety heterosis mode identification method based on molecular marker technology
CN115820923A (en) Molecular marker combination for constructing sugarcane DNA fingerprint and application thereof
CN113789407B (en) SNP molecular marker combination for cyperus esculentus genotyping and application thereof
CN112359102A (en) Method for constructing tobacco core germplasm based on genomics and application thereof
CN117095748B (en) Method for constructing plant miRNA genetic regulation pathway
CN111607659A (en) SNP molecular marker associated with hemicellulose content of ramie and application thereof
CN118006822B (en) Probe combination, detection system and application of water-saving drought-resistant rice breeding chip (WDR 6K)
CN112251530B (en) SSR (simple sequence repeat) marker associated with ramie fiber fineness and application thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant