CN107058525B - Method for predicting unknown gene function of corn based on dynamic correlation of gene expression quantity and character - Google Patents
Method for predicting unknown gene function of corn based on dynamic correlation of gene expression quantity and character Download PDFInfo
- Publication number
- CN107058525B CN107058525B CN201710169145.8A CN201710169145A CN107058525B CN 107058525 B CN107058525 B CN 107058525B CN 201710169145 A CN201710169145 A CN 201710169145A CN 107058525 B CN107058525 B CN 107058525B
- Authority
- CN
- China
- Prior art keywords
- gene
- expression
- genes
- value
- positive
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 108090000623 proteins and genes Proteins 0.000 title claims abstract description 109
- 230000014509 gene expression Effects 0.000 title claims abstract description 56
- 240000008042 Zea mays Species 0.000 title claims abstract description 32
- 235000002017 Zea mays subsp mays Nutrition 0.000 title claims abstract description 32
- 238000000034 method Methods 0.000 title claims abstract description 29
- 235000005824 Zea mays ssp. parviglumis Nutrition 0.000 title claims abstract description 23
- 235000005822 corn Nutrition 0.000 title claims abstract description 23
- 230000004186 co-expression Effects 0.000 claims abstract description 25
- 230000001105 regulatory effect Effects 0.000 claims abstract description 25
- 235000013339 cereals Nutrition 0.000 claims abstract description 14
- 238000012098 association analyses Methods 0.000 claims abstract description 11
- 235000016383 Zea mays subsp huehuetenangensis Nutrition 0.000 claims abstract description 9
- 235000009973 maize Nutrition 0.000 claims abstract description 9
- 238000012163 sequencing technique Methods 0.000 claims abstract description 8
- 230000010152 pollination Effects 0.000 claims abstract description 5
- 238000004458 analytical method Methods 0.000 claims description 12
- 239000002299 complementary DNA Substances 0.000 claims description 12
- 238000009826 distribution Methods 0.000 claims description 12
- 239000012634 fragment Substances 0.000 claims description 12
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims description 9
- 230000008859 change Effects 0.000 claims description 8
- 108020004999 messenger RNA Proteins 0.000 claims description 7
- 238000004088 simulation Methods 0.000 claims description 7
- 238000003559 RNA-seq method Methods 0.000 claims description 6
- 101150069452 z gene Proteins 0.000 claims description 6
- 238000010219 correlation analysis Methods 0.000 claims description 5
- 101150062179 II gene Proteins 0.000 claims description 3
- 102100034343 Integrase Human genes 0.000 claims description 3
- 108091034117 Oligonucleotide Proteins 0.000 claims description 3
- 238000012408 PCR amplification Methods 0.000 claims description 3
- 108010092799 RNA-directed DNA polymerase Proteins 0.000 claims description 3
- 238000000246 agarose gel electrophoresis Methods 0.000 claims description 3
- 230000002596 correlated effect Effects 0.000 claims description 3
- 210000005069 ears Anatomy 0.000 claims description 3
- 239000000463 material Substances 0.000 claims description 3
- 230000004048 modification Effects 0.000 claims description 3
- 238000012986 modification Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000002194 synthesizing effect Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 abstract description 4
- 108700005075 Regulator Genes Proteins 0.000 abstract description 3
- 241000196324 Embryophyta Species 0.000 abstract description 2
- 238000010353 genetic engineering Methods 0.000 abstract description 2
- 102000004169 proteins and genes Human genes 0.000 description 8
- 230000014616 translation Effects 0.000 description 7
- 230000008569 process Effects 0.000 description 6
- 238000011160 research Methods 0.000 description 6
- 210000003705 ribosome Anatomy 0.000 description 4
- 210000004027 cell Anatomy 0.000 description 3
- 230000002068 genetic effect Effects 0.000 description 3
- 102000006478 Protein Phosphatase 2 Human genes 0.000 description 2
- 108010058956 Protein Phosphatase 2 Proteins 0.000 description 2
- 229920002472 Starch Polymers 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 235000019698 starch Nutrition 0.000 description 2
- 239000008107 starch Substances 0.000 description 2
- 241000219194 Arabidopsis Species 0.000 description 1
- 241000219195 Arabidopsis thaliana Species 0.000 description 1
- 240000007594 Oryza sativa Species 0.000 description 1
- 235000007164 Oryza sativa Nutrition 0.000 description 1
- 102000005877 Peptide Initiation Factors Human genes 0.000 description 1
- 108010044843 Peptide Initiation Factors Proteins 0.000 description 1
- 102000002278 Ribosomal Proteins Human genes 0.000 description 1
- 108010000605 Ribosomal Proteins Proteins 0.000 description 1
- 241000209140 Triticum Species 0.000 description 1
- 235000021307 Triticum Nutrition 0.000 description 1
- 239000011324 bead Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000012217 deletion Methods 0.000 description 1
- 230000037430 deletion Effects 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 235000005911 diet Nutrition 0.000 description 1
- 230000000378 dietary effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010195 expression analysis Methods 0.000 description 1
- 235000013305 food Nutrition 0.000 description 1
- 239000004519 grease Substances 0.000 description 1
- 238000012165 high-throughput sequencing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- FUZOUVDGBZTBCF-UHFFFAOYSA-N methyl 7-[2-[2-(8-methoxycarbonyl-4b,8-dimethyl-5,6,7,8a,9,10-hexahydrophenanthren-2-yl)propan-2-ylperoxy]propan-2-yl]-1,4a-dimethyl-2,3,4,9,10,10a-hexahydrophenanthrene-1-carboxylate Chemical compound C1=C2CCC3C(C)(C(=O)OC)CCCC3(C)C2=CC=C1C(C)(C)OOC(C)(C)C1=CC=C2C3(C)CCCC(C(=O)OC)(C)C3CCC2=C1 FUZOUVDGBZTBCF-UHFFFAOYSA-N 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000006916 protein interaction Effects 0.000 description 1
- 230000009822 protein phosphorylation Effects 0.000 description 1
- 238000001243 protein synthesis Methods 0.000 description 1
- 235000009566 rice Nutrition 0.000 description 1
- 239000011232 storage material Substances 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 230000014621 translational initiation Effects 0.000 description 1
Images
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6888—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms
- C12Q1/6895—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for detection or identification of organisms for plants, fungi or algae
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/158—Expression markers
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Wood Science & Technology (AREA)
- Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Genetics & Genomics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Botany (AREA)
- Mycology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention belongs to the field of plant molecular biotechnology and genetic engineering, and particularly relates to a method for predicting unknown gene functions of corn based on dynamic correlation of gene expression quantity and characters, which is realized by the following steps: firstly, collecting seed transcript sequencing 15 days after pollination of a maize inbred line to obtain gene expression data; establishing a dynamic association analysis LA model; LA significance assessment; excavating dynamic association of a corn whole genome gene co-expression mode; functional annotation was performed on genes with significant LA outcome to predict the function of unknown genes. The invention takes the phenomenon that the genes in the corn grains are dynamically associated with the co-expression mode as a breakthrough to predict the function of the unknown genes. Compared with the traditional co-expression network construction, the dynamic association analysis can quickly find the regulatory gene for regulating the co-expression mode.
Description
Technical Field
The invention belongs to the field of plant molecular biotechnology and genetic engineering, and particularly relates to a method for predicting unknown gene functions of corn based on dynamic correlation of gene expression quantity and characters.
Background
Corn is one of the three major crops in the world, and since the 90 s of the 20 th century, the total world corn yield surpasses that of rice and wheat for the first time and becomes the first food crop. Corn kernel accumulates large amounts of storage materials including starch, oil and protein. With the improvement of living standard and the change of dietary structure of people and the development of starch and grease processing industry, the corn varieties gradually change from yield type to quality type, and the corn quality and the specificity thereof become more and more important.
The complex trait is regulated by multiple gene loci, and the interaction between genes forms a complex gene regulation network to control the progress of various biological reactions in the cell. The development of high throughput sequencing technology has enabled us to obtain large-scale and massive omics data, such as genotype data, gene expression data, protein interaction data, and the like. Research shows that the expression patterns of the genes with similar functions are related. Therefore, the construction of the co-expression network provides an idea for predicting the function of the unknown gene. However, in the process of constructing a co-expression network, we find many genes with similar functions, and the expression patterns of the genes are not related. Therefore, there is a limitation in predicting the function of an unknown gene using co-expression analysis. Researches show that a single gene/protein has limited influence on complex quantitative traits, the single gene/protein often needs to function through a high-order cell tissue form, the expression quantities of a plurality of functionally related genes are not related, genetic loci for controlling phenotypic traits are excavated, the genetic loci are relatively independent, the regulation and control relationship between the genetic loci is unknown, and the traditional analysis method needs years of multi-point phenotypic identification and wastes time and labor.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a method for predicting the unknown gene function of corn based on the dynamic correlation between the gene expression quantity and the character.
The invention is realized by the following technical scheme:
the invention provides a method for predicting unknown gene functions of corn based on dynamic correlation of gene expression quantity and traits, which comprises the following steps:
(1) collecting seed transcripts 15 days after pollination of a maize inbred line, and sequencing to obtain gene expression data;
(2) establishing a dynamic correlation analysis LA model;
(3) LA significance assessment;
(4) excavating dynamic association of a corn whole genome gene co-expression mode;
(5) functional annotation was performed on genes with significant LA outcome to predict the function of unknown genes.
Further, the maize inbred lines were divided into 2 groups: and in the tropical zone, subtropical zone and temperate zone, a complete random block method is adopted in a group, 2 repeats are set, 1 row is repeatedly sown in each selfing line, all materials are selfed, immature grains after pollination for 15 days are harvested, 3-4 ears are respectively taken in two repeats of each selfing line, 1-2 grains are taken in each ear, total RNA of the grains is extracted in a mixed mode, and samples of the number of the selfing lines are randomly selected for RNA-seq.
The RNA-seq comprises the following steps: firstly, extracting all RNA with Pol y (A) tail, mainly mRNA, from total RNA by using Pol y (T) oligonucleotide, randomly breaking the intercepted mRNA into fragments, synthesizing a cDNA first chain by using a six-base random primer, adding reverse transcriptase to synthesize a cDNA second chain, purifying the cDNA fragments by using a kit, carrying out end modification on the cDNA fragments, connecting sequencing joints, recovering target large and small fragments by agarose gel electrophoresis, carrying out PCR amplification, and carrying out sequence determination and analysis by using an Illumina GA II gene analysis system to obtain gene expression data.
Further, the dynamic association analysis LA model is specifically established by the following method: the mathematical definition of LA is as follows:
LA (X, Y | Z) = Eg' (Z) formula 1
Wherein X, Y and Z are data of gene expression quantity in corn grains; assuming that X, Y and Z are continuous random variables with a mean value of 0 and a variance of 1, the correlation of X and Y is expressed as E (XY); when Z = Z, g (Z) = E (XY | Z = Z), g (Z) the co-expression pattern of the XY gene pair when Z = Z is detected. The derivative of g (z) is denoted g' (z), which value can be used for the desired determination of the change in the co-expression pattern,
when Z conforms to a standard normal distribution, the LA value can be simply expressed as LA (X, Y | Z) = e (xyz).
X, Y, Z represent three genes with normally distributed expression profiles, then LA (X, Y | Z) is expressed as: e (xyz) = (x)1y1z1+x2y2z2+...+xmymzm) Equation 2 of/m
LA is used for reflecting the dynamic change of the co-expression mode of the gene pair, namely when the Z gene expression level is higher, the expression level of the XY gene pair is in positive correlation (co-regulated), and E (XY | Z =1) is a positive number; when the expression level of the Z gene is low, the expression levels of the XY gene pair are negatively correlated (curve-regulated), and E (XY | Z =0) is a negative number, so that the expression regulation pattern of the gene pair is changed from a positive correlation (co-regulated) to a negative correlation (curve-regulated), and the LA value is recorded as positive; in contrast, the expression control pattern of the gene pair was changed from negative (cotra-regulated) to positive (co-regulated), and the LA value was recorded as negative.
The evaluation steps of the dynamic association analysis model established by the invention are as follows: mixing expression values of all genes; in each simulation, expression quantity values of a pair of genes (X, Y) are randomly extracted by a back-put random sampling method, Z genes take all genes of a whole genome, and LA values of XY genes in the whole genome are calculated to respectively obtain a positive large value and a negative small value of the LA; and repeating the simulation for one million times to respectively obtain the positive value reference distribution and the negative value reference distribution of the LA, and taking the 99% quantile of the positive and negative LA reference distributions as the positive and negative LA significance threshold.
Further, the result of the whole genome dynamic association analysis is filtered according to the size of the LA value, the genes with obvious LA are annotated with functions, and the function of unknown genes is predicted.
Researches show that the reasons for irrelevant expression patterns of genes with similar functions mainly comprise two hypotheses, namely that the expression regulation of the genes with similar functions is not on the mRNA level, and the expression patterns of the genes with similar functions are only relevant in a specific cell environment, namely the dynamic correlation of co-expression patterns, and the dynamic correlation analysis (LA) provides theoretical support for verifying the second hypothesis. The invention is based on scientific hypothesis that genes with similar functions and expression patterns are related, adopts an LA method to identify the dynamic association of the corn whole genome gene co-expression pattern, predicts the function of an unknown gene according to the function annotation of the gene in an obvious LA result, verifies the LA prediction result according to the homologous gene function of the unknown gene in arabidopsis thaliana, is innovative in thought, and has no report in the research of the field of botany.
The invention has the beneficial effects that:
(1) the invention takes the phenomenon that the genes in the corn grains are dynamically associated with the co-expression mode as a breakthrough to predict the function of the unknown genes. Compared with the traditional co-expression network construction, the dynamic correlation analysis can quickly find the regulatory gene for regulating the co-expression mode;
(2) the invention conjectures the function of the unknown gene by annotating the function of the gene with obvious LA result and verifies the predicted result by the function of the homologous gene, thus being an effective method for predicting the function of the unknown gene.
Drawings
Figure 1 is a random simulation generated LA value to assess the significance of LA analysis.
Detailed Description
The invention is further described with reference to the following figures and specific examples, which are intended to be illustrative only and do not limit the scope of the invention.
Example 1
The invention discloses a method for predicting unknown gene functions of corn based on dynamic correlation analysis.
(1) Collecting gene expression amount data:
368 parts of inbred lines (the maize variety used in the invention can be any variety, including 35 parts of high-oil maize inbred lines (Yang et al, 2010 b) cultivated by Song and Mingzhiu, university of agriculture in China) are planted in Hubei Jingzhou in 2010, and are divided into 2 groups (tropical zone, subtropical zone and temperate zone) according to pedigree information, a complete random block method is adopted in the group, 2 times of inbred lines are set, and each inbred line is sown for 1 line repeatedly. All materials are selfed, immature grains 15 days after pollination (15 DAP) are harvested, 3-4 ears are respectively taken for two repetitions of each selfing line, 1-2 grains are taken for each ear, total RNA of grains is mixed and extracted, and 368 samples are randomly selected for RNA-seq. The RNA-Seq work of the sample was performed by Shenzhen Hua Dagen Institute (BGI), and the sequencing method is briefly described as follows: firstly, extracting all RNA with Pol y (A) tail, mainly mRNA, from total RNA by using Pol y (T) oligonucleotide, randomly breaking the intercepted mRNA into fragments, synthesizing a cDNA first chain by using hexabasic random primers (random hexamers), adding reverse transcriptase and the like to synthesize a cDNA second chain, purifying the cDNA fragments by using a kit (Ampure XP beads), carrying out end modification on the cDNA fragments, connecting sequencing joints, recovering target size fragments by agarose gel electrophoresis, carrying out PCR amplification, thus finishing the construction work of the whole library, and carrying out sequence determination and analysis on the constructed library by using an Illumina GA II gene analysis system. The deletion value pretreatment of gene expression data sets is as follows for expression quantity data of 28769 genes in 368 maize inbred lines obtained by transcript sequencing: gene expression data is missing due to noise in the experiment, detection techniques, etc. For each gene in the dataset, if its expression value is missing in more than 30% of the samples, the gene is discarded in subsequent analyses. 24,907 gene expression data (part of the data can be directly downloaded from a database as required) are obtained for subsequent genome-wide LA analysis;
(2) establishing a dynamic association analysis LA model:
the dynamic association analysis LA model is specifically established by adopting the following method: the mathematical definition of LA is as follows:
LA (X, Y | Z) = Eg' (Z) formula 1
Wherein X, Y and Z are data of gene expression quantity in corn grains; assuming that X, Y and Z are continuous random variables with a mean value of 0 and a variance of 1, the correlation of X and Y is expressed as E (XY); when Z = Z, g (Z) = E (XY | Z = Z), g (Z) the co-expression pattern of the XY gene pair when Z = Z is detected. The derivative of g (z) is denoted g' (z), which value can be used for the desired determination of the change in the co-expression pattern,
when Z conforms to a standard normal distribution, the LA value can be simply expressed as LA (X, Y | Z) = e (xyz).
X, Y, Z represent three genes with normally distributed expression profiles, then LA (X, Y | Z) is expressed as: e (xyz) = (x)1y1z1+x2y2z2+...+xmymzm) Equation 2 of/m
LA is used for reflecting the dynamic change of the co-expression mode of the gene pair, namely when the Z gene expression level is higher, the expression level of the XY gene pair is in positive correlation (co-regulated), and E (XY | Z =1) is a positive number; when the expression level of the Z gene is low, the expression levels of the XY gene pair are negatively correlated (curve-regulated), and E (XY | Z =0) is a negative number, so that the expression regulation pattern of the gene pair is changed from a positive correlation (co-regulated) to a negative correlation (curve-regulated), and the LA value is recorded as positive; in contrast, the expression control pattern of the gene pair was changed from negative (cotra-regulated) to positive (co-regulated), and the LA value was recorded as negative.
(3) LA significance assessment
Mixing expression values of all genes; in each simulation, expression quantity values of a pair of genes (X, Y) are randomly extracted by a back-put random sampling method, Z genes take all genes of a whole genome, and LA values of XY genes in the whole genome are calculated to respectively obtain a positive large value and a negative small value of the LA; and repeating the simulation for one million times to respectively obtain a positive value reference distribution and a negative value reference distribution of the LA, and taking 99% quantiles of the positive and negative LA reference distributions as LA positive and negative significance thresholds, wherein specific results are shown in figure 1.
(4) Whole genome LA analysis
LA analysis was performed with X = whole genome gene, Y = whole genome gene, and Z = whole genome gene, focusing on the list of the first 50 co-expressed gene pairs (LAP) with the largest absolute LA value. Functional notes X, Y and Z, Table 1GRMZM5G858880List of genes involved in the process of protein translation that are regulated. GeneGRMZM5G858880The function of regulating multiple pairs of co-expressed gene pairs (LAP) and Maize genomic Database (Maize Genome Database) for this gene is annotated as "encoding a protein comprising the WW domain". In thatGRMZM5G858880In the list of regulated LAPs, some genes were found to be involved in the protein translation process, including ribosomal protein synthesis, initiation of protein translation, and protein phosphorylation, and occurred many times,GRMZM2G092663 (encoding ribosomal S5 protein family, 4 times),GRMZM2G099352(encoding the ribosomal S3 protein family),GRMZM2G168149(encoding the ribosomal S5 protein family),GRMZM2G129015(encoding ribosomal S26e protein family, 2 times),GRMZM2G164352(encoding protein phosphatase 2A subunit A2, 4 times),GRMZM2G122135(encoding protein phosphatase 2A subunit A2, 2 times),GRMZM2G064133(encoding eukaryotic translation initiation factor 3G 1), thus the regulatory gene was presumedGRMZM5G858880Also involved in the protein translation process. Research tableIn the light of the above, it is clear that,GRMZM5G858880the homologous gene in Arabidopsis (AT 3G 13225) regulates the protein translation process by ribosome deceleration and reduced reinitiation efficiency (Tran et al, BMC Genomics, 2008).
TABLE 1GRMZM5G858880List of regulated genes involved in the protein translation process
The results prove the effectiveness of the invention, and the unknown gene function is predicted by dynamic association analysis of the whole genome gene on the co-expression mode and combining function annotation, so that a new thought and method is provided for the corn functional genomics research.
Claims (4)
1. A method for predicting the unknown gene function of corn based on the dynamic correlation between gene expression level and character is characterized by comprising the following steps:
(1) collecting seed transcripts 15 days after pollination of a maize inbred line, and sequencing to obtain gene expression data;
(2) establishing a dynamic correlation analysis LA model;
the dynamic association analysis LA model is specifically established by adopting the following method: the mathematical definition of LA is as follows:
LA (X, Y | Z) = Eg' (Z) formula 1
Wherein X, Y and Z are data of gene expression quantity in corn grains; assuming that X, Y and Z are continuous random variables with a mean value of 0 and a variance of 1, the correlation of X and Y is expressed as E (XY); when Z = Z, g (Z) = E (XY | Z = Z), g (Z) is detected the co-expression pattern of the XY gene pair when Z = Z, the derivative of g (Z) is denoted as g' (Z), which value can be used for the desired determination of the co-expression pattern change, and when Z follows a standard normal distribution, the LA value can be simply denoted as LA (X, Y | Z) = E (xyz);
x, Y, Z represent three genes with normally distributed expression profiles, then LA (X, Y | Z) is expressed as: e (xyz) = (x)1y1z1+x2y2z2+...+xmymzm) Equation 2 of/m
LA is used for reflecting the dynamic change of the co-expression mode of the gene pair, namely when the Z gene expression level is higher, the expression level of the XY gene pair is in positive correlation (co-regulated), and E (XY | Z =1) is a positive number; when the expression level of the Z gene is low, the expression levels of the XY gene pair are negatively correlated (curve-regulated), and E (XY | Z =0) is a negative number, so that the expression control pattern of the gene pair is changed from a positive correlation (co-regulated) to a negative correlation (curve-regulated), and the LA value is recorded as positive; in contrast, the expression control pattern of the gene pair changed from negative (cotra-regulated) to positive (co-regulated), and the LA value was recorded as negative;
(3) LA significance assessment;
(4) excavating dynamic association of a corn whole genome gene co-expression mode;
(5) performing functional annotation on the gene with the obvious LA result, and predicting the function of an unknown gene;
the maize inbred lines were divided into two groups: one group is tropical, one group is subtropical and temperate, 2 repetitions are set in the group by adopting a complete random block method, 1 row is repeatedly sown in each selfing line, all materials are selfed, immature pollinated grains are harvested 15 days later, 3-4 ears are respectively taken in two repetitions of each selfing line, 1-2 grains are taken in each ear, total RNA of the grains are mixed and extracted, and 368 samples are randomly selected for RNA-seq.
2. The method of claim 1, wherein the RNA-seq comprises the steps of: firstly, extracting all RNA (mainly mRNA) with Po ly (A) tail from total RNA by using Po ly (T) oligonucleotide, randomly breaking the intercepted mRNA into fragments, synthesizing a cDNA first chain by using a six-base random primer, adding reverse transcriptase to synthesize a cDNA second chain, purifying the cDNA fragments by using a kit, performing end modification on the cDNA fragments, connecting sequencing joints, recovering target large and small fragments by agarose gel electrophoresis, performing PCR amplification, and performing sequence determination and analysis by using an Illumina GA II gene analysis system to obtain gene expression data.
3. The method of claim 1, wherein the step of evaluating the dynamic association analysis LA model is as follows: mixing expression values of all genes; in each simulation, expression quantity values of a pair of genes (X, Y) are randomly extracted by a back-put random sampling method, Z genes take all genes of a whole genome, and LA values of XY genes in the whole genome are calculated to respectively obtain a positive large value and a negative small value of the LA; and repeating the simulation for one million times to respectively obtain the positive value reference distribution and the negative value reference distribution of the LA, and taking the 99% quantile of the positive and negative LA reference distributions as the positive and negative LA significance threshold.
4. The method of claim 1, wherein the results of the dynamic association analysis of the genome-wide gene co-expression pattern are filtered according to the magnitude of the LA value, and the genes with significant LA are functionally annotated to predict unknown gene function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710169145.8A CN107058525B (en) | 2017-03-21 | 2017-03-21 | Method for predicting unknown gene function of corn based on dynamic correlation of gene expression quantity and character |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710169145.8A CN107058525B (en) | 2017-03-21 | 2017-03-21 | Method for predicting unknown gene function of corn based on dynamic correlation of gene expression quantity and character |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107058525A CN107058525A (en) | 2017-08-18 |
CN107058525B true CN107058525B (en) | 2020-12-29 |
Family
ID=59617881
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710169145.8A Expired - Fee Related CN107058525B (en) | 2017-03-21 | 2017-03-21 | Method for predicting unknown gene function of corn based on dynamic correlation of gene expression quantity and character |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107058525B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109817278A (en) * | 2019-03-22 | 2019-05-28 | 济南大学 | A method of corn unknown gene function is predicted based on oil content associated gene and oil content dynamic correlation |
CN114863992B (en) * | 2022-06-27 | 2024-04-05 | 山东大学 | Corn alternative splice isomer function prediction system based on tissue specificity |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003058503A1 (en) * | 2001-12-26 | 2003-07-17 | The Regents Of The University Of California | System and method for identifying networks of ternary relationships in complex data systems |
CN101128591A (en) * | 2005-02-26 | 2008-02-20 | 巴斯福植物科学有限公司 | Expression cassettes for seed-preferential expression in plants |
CN102277377A (en) * | 2005-02-09 | 2011-12-14 | 巴斯福植物科学有限公司 | Expression cassettes for regulation of expression in monocotyledonous plants |
CN103088018A (en) * | 2012-12-27 | 2013-05-08 | 河南农业大学 | Intragenic single nucleotide polymorphism (SNP) mark of male sterility restoring gene RF4 of C-type cytoplasm of corn |
CN103160502A (en) * | 2013-02-28 | 2013-06-19 | 南通新禾生物技术有限公司 | Single nucleotide polymorphism (SNP) molecular markers for corn germplasm salt-resistant quantitative trait loci (QTL) and application thereof |
WO2013189047A1 (en) * | 2012-06-20 | 2013-12-27 | 南京大学 | Plant microrna identification and application thereof |
CN103509821A (en) * | 2013-10-18 | 2014-01-15 | 南京农业大学 | Rapid plant phosphorus nutrition diagnosis and visual dynamic monitoring method and application of recombinant expression vector |
-
2017
- 2017-03-21 CN CN201710169145.8A patent/CN107058525B/en not_active Expired - Fee Related
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2003058503A1 (en) * | 2001-12-26 | 2003-07-17 | The Regents Of The University Of California | System and method for identifying networks of ternary relationships in complex data systems |
CN102277377A (en) * | 2005-02-09 | 2011-12-14 | 巴斯福植物科学有限公司 | Expression cassettes for regulation of expression in monocotyledonous plants |
CN101128591A (en) * | 2005-02-26 | 2008-02-20 | 巴斯福植物科学有限公司 | Expression cassettes for seed-preferential expression in plants |
WO2013189047A1 (en) * | 2012-06-20 | 2013-12-27 | 南京大学 | Plant microrna identification and application thereof |
CN103088018A (en) * | 2012-12-27 | 2013-05-08 | 河南农业大学 | Intragenic single nucleotide polymorphism (SNP) mark of male sterility restoring gene RF4 of C-type cytoplasm of corn |
CN103160502A (en) * | 2013-02-28 | 2013-06-19 | 南通新禾生物技术有限公司 | Single nucleotide polymorphism (SNP) molecular markers for corn germplasm salt-resistant quantitative trait loci (QTL) and application thereof |
CN103509821A (en) * | 2013-10-18 | 2014-01-15 | 南京农业大学 | Rapid plant phosphorus nutrition diagnosis and visual dynamic monitoring method and application of recombinant expression vector |
Non-Patent Citations (9)
Title |
---|
2个Sh2 甜玉米自交系种子萌发过程中关键水解酶活性及相关基因表达的动态分析;程昕昕等;《植物资源与环境学报》;20150825;第24卷(第3期);第18-24页 * |
A system for enhancing genome-wide coexpression dynamics study;Ker-Chau Li等;《Proc Natl Acad Sci USA》;20041102;第101卷(第44期);第15561页右栏第1段,第15561页左栏"Abstract",第15562页左栏"Theory",第15563页左栏"Method",第15565页左栏"Discussion" * |
Genome-wide analysis of gene expression profiles during the kernel development of maize (Zea Mays L.);Xihui Liu等;《Genomics》;20080430;第91卷(第4期);第378-387页 * |
Genome-wide coexpression dynamics: Theory and application;Ker-Chau Li;《Proc Natl Acad Sci USA》;20021224;第99卷(第26期);第16875-16880页 * |
Genome-wide identification and transcriptional analysis of folate metabolism-related genes in maize kernels;Tong Lian等;《BMC Plant Biology》;20150819;第15卷;第204号第1-14页 * |
Genome-wide trait-trait dynamics correlation study dissects the gene regulation pattern in maize kernels;Xiuqin Xu等;《BMC Plant Biology》;20171016;第17卷;第163号第1-12页 * |
Trait-trait dynamic interaction: 2D-trait eQTL mapping for genetic variation study;Wei Sun等;《BMC Genomics》;20080523;第9卷;第242号第1-13页 * |
植物基因表达数据库的构建及共表达分析研究;孙秀丽;《中国博士学位论文全文数据库基础科学辑》;20140415(第04(2014年)期);第A006-6页 * |
玉米籽粒基因对共表达模式的动态关联分析;许秀勤;《中国优秀硕士学位论文全文数据库基础科学辑》;20180715(第07(2018年)期);第A006-28页 * |
Also Published As
Publication number | Publication date |
---|---|
CN107058525A (en) | 2017-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
McCormick et al. | The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization | |
Narsai et al. | Extensive transcriptomic and epigenomic remodelling occurs during Arabidopsis thaliana germination | |
Reyes-Chin-Wo et al. | Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce | |
Wang et al. | Exploring plant transcriptomes using ultra high-throughput sequencing | |
Salih et al. | Genome-wide characterization and expression analysis of MYB transcription factors in Gossypium hirsutum | |
Tai et al. | Transcriptomic and anatomical complexity of primary, seminal, and crown roots highlight root type-specific functional diversity in maize (Zea mays L.) | |
Tsuda et al. | Construction of a high-density mutant library in soybean and development of a mutant retrieval method using amplicon sequencing | |
Chen et al. | Continuous salt stress-induced long non-coding RNAs and DNA methylation patterns in soybean roots | |
Wall et al. | Comparison of next generation sequencing technologies for transcriptome characterization | |
Ibarra-Laclette et al. | Architecture and evolution of a minute plant genome | |
Mutum et al. | Identification of novel miRNAs from drought tolerant rice variety Nagina 22 | |
Loraine et al. | RNA-seq of Arabidopsis pollen uncovers novel transcription and alternative splicing | |
Wang et al. | Global gene expression responses to waterlogging in roots of sesame (Sesamum indicum L.) | |
Vega-Arreguín et al. | Deep sampling of the Palomero maize transcriptome by a high throughput strategy of pyrosequencing | |
Liu et al. | Identification of lncRNAs involved in rice ovule development and female gametophyte abortion by genome-wide screening and functional analysis | |
Hirsch et al. | Reduced representation approaches to interrogate genome diversity in large repetitive plant genomes | |
Bakala et al. | Smart breeding for climate resilient agriculture | |
Yang et al. | Designing microarray and RNA-Seq experiments for greater systems biology discovery in modern plant genomics | |
Li et al. | Coregulation of ribosomal RNA with hundreds of genes contributes to phenotypic variation | |
Abramson et al. | The genome and preliminary single-nuclei transcriptome of Lemna minuta reveals mechanisms of invasiveness | |
Alejandri-Ramírez et al. | Small RNA differential expression and regulation in Tuxpeño maize embryogenic callus induction and establishment | |
CN107058525B (en) | Method for predicting unknown gene function of corn based on dynamic correlation of gene expression quantity and character | |
Zhou et al. | Genome-wide identification and characterization of long noncoding RNAs during peach (Prunus persica) fruit development and ripening | |
Lang et al. | The genome of the model moss Physcomitrella patens | |
CN106929579B (en) | Method for excavating oil metabolism mechanism of corn kernel based on dynamic correlation analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20201229 |