CN110349633B - Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway - Google Patents
Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway Download PDFInfo
- Publication number
- CN110349633B CN110349633B CN201910631911.7A CN201910631911A CN110349633B CN 110349633 B CN110349633 B CN 110349633B CN 201910631911 A CN201910631911 A CN 201910631911A CN 110349633 B CN110349633 B CN 110349633B
- Authority
- CN
- China
- Prior art keywords
- radiation
- expression
- matrix
- gene
- species
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/10—Ontologies; Annotations
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Physiology (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The invention discloses a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway, and belongs to the technical field of biological information. Giving a radiation response biological channel and retrieving gene ontology semantics, acquiring an annotation gene set in a specific species, combining expression profile data after the same species is exposed by radiation, and extracting the expression profile data; obtaining differential expression characteristics and a matrix through single-factor variance analysis and protein interaction network analysis; establishing a multivariate regression model by using a biological statistical method or a machine learning algorithm, and determining a significant expression characteristic set and an optimal statistical model type through comparative analysis on the model predictive performance, namely a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological channel. The method has good prediction effect, and can provide a new technical method for nuclear radiation biological dose monitoring, nuclear radiation damage diagnosis and risk early warning under emergency conditions.
Description
Technical Field
The invention belongs to the technical field of biological information, and relates to a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway.
Background
With the wide application of nuclear energy in military or medicine, people have made practical requirements on the monitoring and diagnosis of nuclear radiation amount under emergency conditions such as nuclear terrorism attack or nuclear accident. The physical detection method for monitoring the radiation dose received by organisms/personnel and diagnosing radiation damage has many limitations and disadvantages, such as long time period required for monitoring, difficulty in comprehensively and accurately reflecting the absorbed dose of organisms/personnel, evaluation of radiation damage effect and risk and the like. Therefore, several biomarker-based methods have been developed to assess the radiation dose received by an individual, and the radiation response of these biomarkers is generally proportional to the radiation dose, and is referred to as a radiation dosimeter.
Currently, the main detection methods as a biomembrane meter are: cytology detection technology, cytogenetics detection technology and molecular biology detection technology. The ideal radiation biological dosimeter has strong specificity and high sensitivity, is suitable for measuring a larger radiation dose range, can determine the radiation quantity received by organisms and the radiation damage degree in the shortest time after irradiation, and has the operation method which is as simple as possible. The first two methods usually require complicated experimental means, and generally require at least 2-3 days to obtain corresponding results, so that the methods cannot effectively cope with nuclear radiation monitoring and diagnosis in emergency. The dose of nuclear radiation can be quickly and accurately estimated by using the change of the biomolecules after the radiation exposure. Therefore, how to implement the method becomes a key problem which needs to be solved urgently for nuclear radiation risk assessment and early warning and dealing with nuclear threats.
With the development of high-throughput technology, DNA expression profiles or next-generation sequencing data provide a good data support for accurate nuclear radiation monitoring and diagnosis. In recent years, several studies have shown the effectiveness and accuracy of predicting radiation dose based on gene expression levels. For example, Paul et al (Int J Radiat Oncol Biol Phys,2008,71,1236-1244.) found that the expression levels of 74 signature genes, most of which are regulated by the TP53 gene, can be used to predict the nuclear radiation dose received by human peripheral blood. The study by Dressman et al (PLoS Med,2007,4, e106.) found that specific gene expression levels accurately reflect the radiation dose received by human or mouse peripheral blood cells. However, most of the approaches involved in these studies ignore the unique biological properties, and genes do not actually exist and function in isolation, but rather, generally function in the form of a gene regulatory network or biological pathway. Therefore, how to screen radiation biomarkers and predict radiation dose based on the radiation response biological pathway becomes a key problem to be solved urgently in the field.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a method for screening radiation response biomarkers based on a specific radiation response biological pathway, a method for accurately predicting radiation dose based on the markers, and provides a new technical method for nuclear radiation biological dose monitoring, nuclear radiation damage diagnosis and risk early warning under emergency conditions.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway, comprising the steps of:
retrieving a radiation response biological pathway and obtaining a Gene Ontology GO semantic of the radiation response biological pathway in a Gene Ontology (GO) search tool;
using the gene ontology GO semantic of the radiation response biological pathway obtained in the step I as a filtering condition, searching and obtaining a gene set A annotated to the GO semantic obtained in the step I in the species i in a genome database of the species iiThe information of (a); the gene set AiThe information includes gene number (gene table ID), gene name (gene name), protein number (protein table ID), and protein reference sequence information (RefSeq peptide ID);
selecting any one of the species i, and acquiring expression profile data of any one of the species i by adopting a high-throughput sequencing technology after radiation exposure with different doses is carried out on any one of the species i; the expression profile data is an n multiplied by p matrix, n is the number of features in the expression profile data matrix, and p is the number of species radiation exposure treatment in the expression profile data matrix; the data in the expression profile data matrix is the expression quantity of the characteristics under the condition of corresponding radiation dose exposure; the characteristics in the expression profile data matrix comprise genes, transcripts, methylation sites, miRNA and proteins;
standardizing the expression spectrum matrix in the step III to obtain a standardized expression spectrum matrix;
fifthly, according to the A obtained in the step IIiExtracting a normalized expression spectrum matrix based on the biological pathway of the radiation response from the normalized expression spectrum matrix obtained in the step (iv);
analyzing the expression spectrum matrix obtained in the fifth step by using an ANOVA method, setting the significance level p of the difference among the groups as a corresponding threshold value, and obtaining a differential expression characteristic set BiAnd a corresponding difference expression feature matrix;
seventhly, utilizing PPI network database to carry out difference expression characteristic set B obtained in the step sixthlyiAnalyzing and constructing a protein interaction network; recording the characteristics of protein molecule connectivity greater than a set threshold value to obtain an important differential expression characteristic set CiAnd its corresponding important difference expression characteristic matrix;
carrying out statistical modeling analysis on the important difference expression characteristic matrix obtained in the step (C) by using a biometrical method or a machine learning algorithm, and establishing a set CiFeature C ofij(j-1, 2, …, s) is independent variable, and the radiation exposure dose Yk(k ═ 1,2, …, m) for multivariate regression models of dependent variables;
when a machine learning algorithm is used for modeling and analyzing an expression characteristic matrix, firstly, the expression characteristic matrix is divided into a training matrix and a testing matrix according to the proportion of 70% to 30% or a Leave-One-Out-Cross-Validation method (when the proportion of the training matrix and the testing matrix is selected, the training matrix and the testing matrix are determined according to the number of sample quantities, when the number of the sample quantities is large, the training matrix and the testing matrix are divided according to the proportion of 70% to 30%, and when the number of the sample quantities is small, the testing matrix and the testing matrix are divided according to the Leave-One-Cross-Validation method), secondly, parameters of the machine learning algorithm are subjected to stepwise regression analysis, the accuracy of statistical modeling is optimized, and the optimal algorithm execution parameters are determined;
ninthly, calculating evaluation indexes of the model established in the step eight, comprehensively comparing and analyzing the prediction performance of the model, and determining the optimal type of a biological statistical method or a machine learning algorithm; performing coefficient test according to the differential expression characteristics in an optimal biometry method or a machine learning algorithm, generally adopting t test, determining that the significance exists when a p value is smaller than a set threshold value, and determining a significant differential expression characteristic set; the set threshold values are 0.05, 0.01 and 0.001;
the obtained significant difference expression characteristic set is a radiation biomarker screened based on a radiation response biological pathway; the optimal biometric method or machine learning algorithm type obtained is a radiation quantity prediction method based on the radiation biomarkers.
Further, in the above technical solution, the gene ontology search tool in step (i) can be a QuickGO database (https:// www.ebi.ac.uk/QuickGO /); gene ontology GO semantics of known radiation response biological pathways can be repeatedly retrieved using the QuickGO database.
Further, in the above technical solution, the specific species i in the step (ii) is a species containing known genome information in a genome database, including human, mouse, and rat; the genome database in the step (II) is an animal genome annotation database Ensembl (http:// asia. ensemblel.org/index. html) and a plant genome annotation database Ensembl Plants (http:// Plants. ensemblel.org/index. html).
Further, in the above technical solution, the high throughput sequencing technology in step (c) includes gene expression profiling, transcriptome sequencing, methylation sequencing, miRNA expression profiling, and protein expression profiling.
Further, in the above technical solution, the expression profile data of any one of the species i in the third step may include a gene expression profile, a miRNA expression profile, and a protein expression profile.
Further, in the above technical solution, the method of the normalization processing in the step (iv) may include a Max-Min normalization method, a standard deviation normalization method, and a logarithmic normalization method.
Further, in the above technical solution, the corresponding threshold values in step (c) are 0.05, 0.01, 0.001.
Further, in the above technical solution, the PPI network database in step (c) may be an STRING network online database (https:// STRING-db.org/cgi/input.pl); the protein molecule connectivity is the score of the interaction between the protein molecules and can be the highest confidence, the high confidence, the medium confidence and the low confidence, wherein the highest confidence is 0.90, the high confidence is 0.70, the medium confidence is 0.40 and the low confidence is 0.15; the set threshold is a high confidence or above, or adjusted based on the number of features and the actual protein molecule connectivity. The adjustment according to the number of features and the actual protein molecule connectivity is specifically performed according to the fact that if the number of features and the actual protein molecules are mostly not connected, the connectivity can be reduced.
Further, in the above technical solution, the biometric method in the step (viii) may include a multiple linear regression method, a multiple non-linear regression method; the machine learning algorithm can comprise a neural network algorithm, a support vector machine algorithm and a random forest algorithm.
Further, in the above aspect, the evaluation index at the step ninthly includes a square of a correlation coefficient (R)2) The root-mean-square error (RMSE) and Mean Absolute Error (MAE) are respectively represented by R2=[cor(fk,Yk)]2、 Wherein f iskModel prediction value, Y, representing the kth samplekThe true value of the kth sample is represented and m represents the sample size.
Further, in the above technical solution, the radiation biomarker screening and radiation dose prediction method is a feature selection and statistical modeling algorithm based on a radiation response biological pathway in a genome database.
According to the technical scheme, the invention has the following beneficial effects:
the invention considers that the gene usually plays a role in a gene regulation network form, searches gene information participating in a radiation response biological pathway in a gene ontology search tool and a genome database by utilizing a bioinformatics method, performs single-factor analysis of variance and protein interaction network analysis on characteristic expression matrixes, and screens and obtains corresponding radiation biomarkers; and carrying out statistical modeling analysis on the obtained radiation biomarkers by using a biometry method or a machine learning algorithm, and establishing a method for screening the radiation biomarkers and predicting the radiation dose based on a radiation response biological channel. Therefore, the invention provides a novel method for screening the radiation biomarkers and predicting the radiation dose, and can provide a novel technical method for nuclear radiation biological dose monitoring, nuclear radiation damage diagnosis and risk early warning under emergency conditions.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
FIG. 2 is a diagram of a protein interaction network obtained using a PPI network database;
FIG. 3 is a graph comparing the predicted radiation dose and the actual radiation dose of a multiple linear regression model.
Detailed Description
The following description will be made in detail with reference to the accompanying drawings.
FIG. 1 is a flow chart of an implementation of the method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway according to the present invention.
The present invention will be described in detail below with reference to the data of peripheral blood gene expression profiles after X-ray exposure of mice (similar data are readily available in other relevant experiments or databases).
The gene expression profile data comprises 4 groups of data, namely 4 groups of C57BL/6 male mice are irradiated by X-rays with the dose rate of 1.03Gy/min and are subjected to radiation exposure doses of 0Gy, 1.1Gy, 2.2Gy, 4.4Gy and the like, the corresponding sample amount is 12, 6 and 6, 30 samples are counted, and the peripheral blood expression profile data of the mice are acquired 24h after irradiation. The data of the radiation Gene Expression profile can be downloaded from the Gene Expression profile (GEO) database of the National Center for Biotechnology Information (NCBI), which is numbered GSE 62623.
Screening a radiation sensitivity marker by using the peripheral blood gene expression profile data of the mouse after exposure of different doses according to the following method and establishing a prediction method of radiation dose based on the radiation sensitivity marker, wherein the method comprises the following steps:
s1: in a gene ontology search tool QuickGO database (https:// www.ebi.ac.uk/QuickGO /), retrieving and obtaining GO semantics of biological channels related to DNA Damage Response (DDR) (previous researches show that DDR is one of important radiation Response biological channels), namely 'DNA repair' (GO:0006281), 'adaptive processes' (GO:0006915), 'cell cycle array' (GO:0007050), 'cell Response to DNA damagiticulus' (GO:0006974) and 'telomemaintemance' (GO: 0000723);
s2: loading the GO semantic of the DDR-related radiation response biological pathway obtained in the step S1 into a BioMart tool of an animal genome annotation database Ensembl (http:// asia. ensemble. org/index. html), searching and acquiring information of a mouse Gene set annotated by the database to the GO semantic obtained in the step S1, wherein the information comprises Gene numbers (Gene stable ID), Gene names (Gene name), Gene descriptions (Gene description) and the like, the total Gene number of the mouse Gene set totals 1026, and specific results are shown in Table 1; on the basis, 949 genes are obtained after the repeated genes are deleted;
TABLE 1 number of DDR-related pathway GO semantically annotated genes obtained based on Ensembl
S3: acquiring peripheral blood gene expression profile data GSE62623 of the C57BL/6 mouse after the different X-ray exposure doses through a GEO database; the data type of the expression profile is a 44397 row-30 column matrix, wherein the rows in the matrix represent 44397 gene information, the columns represent 30 different samples (including 12 control groups and 18 radiation treatment groups) information, and the data in the matrix is the expression amount of genes under the condition of corresponding radiation dose exposure;
s4: performing Max-Min standardization processing on the expression spectrum matrix in the step S3 to obtain a standardized expression spectrum matrix;
s5: extracting standardized expression spectrum matrixes of the 5 kinds of DDR-related radiation response biological pathway GO semantic gene sets from the standardized expression spectrum matrixes obtained in the step S4 by using the gene set information annotated to the 5 kinds of DDR-related radiation response biological pathway GO semantics obtained in the step S2; as a result, it was found that the 949 genes obtained in step S2 retrieved 857 matched gene information from the expression profile matrix in step S4, and the expression profile data finally obtained was a 857 × 30 matrix;
s6: performing one-factor analysis of variance on the expression spectrum matrix obtained in the step S5, setting the significance level p value of differences among groups to be 0.01, and obtaining 152 differentially expressed genes and expression matrixes corresponding to the differentially expressed genes in total;
s7: analyzing the differential expression gene set obtained in the step S6 by using an STRING network online database (https:// STRING-db.org/cgi/input.pl), setting the score of the interaction between protein molecules to be 0.70 (high confidence), and constructing a protein interaction network map as shown in FIG. 2; recording the protein molecule connectivity degree more than or equal to 3 genes to obtain 28 important differential expression genes and expression matrixes corresponding to the genes;
s8: performing statistical modeling analysis on the important difference expression gene matrix obtained in the step S7 by using a multiple linear regression model, and establishing a multiple regression model with the gene characteristics in the expression matrix as independent variables and the radiation exposure dose as dependent variables; through stepwise regression analysis, a multivariate linear regression model is optimized to obtain 13 significant differentially expressed genes (p <0.05), namely Cdk7, Foxo1, Fzr1, Mcrs1, Nsmce4a, Pold1, Psmd14, Rad51c, Rfc1, Rnf144b, Sirt1, Usp1 and Xrcc6, which are specifically shown in Table 2; these genes can be considered as radiobiomarkers obtained based on DDR-related radiation response biological pathways;
TABLE 2 significant differential expression Gene information obtained based on multiple Linear regression analysis
Name of Gene | Description of Gene function | p value | |
Cdk7 | cyclin-dependent kinase 7 | 5.38e-05 | |
Foxo1 | forkhead box O1 | 6.63e-03 | |
Fzr1 | fizzy and cell division cycle 20related 1 | 2.42e-05 | |
Mcrs1 | microspherule |
4.94e-05 | |
Nsmce4a | NSE4homolog A,SMC5-SMC6complex component | 6.81e-06 | |
Pold1 | polymerase(DNA directed), |
2.44e-04 | |
Psmd14 | proteasome(prosome,macropain)26S subunit,non-ATPase,14 | 2.27e-05 | |
Rad51c | RAD51paralog C | 6.13e-06 | |
Rfc1 | replication factor C(activator 1)1 | 2.40e-05 | |
Rnf144b | ring finger protein 144B | 2.63e-05 | |
| sirtuin | 1 | 1.11e-04 |
Usp1 | ubiquitin |
4.22e-04 | |
Xrcc6 | X-ray repair complementing defective repair in Chinese hamster cells 6 | 1.31e-04 |
S9: through stepwise regression analysis, a multiple linear regression model based on the 13 significant differentially expressed genes in step S8 was also obtained, and the biological prediction expression of the radiation dose was Y-27.58-2.37 × [ Cdk7 ]]+4.92×[Foxo1]+6.86×[Fzr1]-12.64×[Mcrs1]-18.61×[Nsmce4a]-4.90×[Pold1]+9.66×[Psmd14]-5.65×[Rad51c]+8.42×[Rfc1]+3.53×[Rnf144b]+3.40×[Sirt1]+8.94×[Usp1]-9.64×[Xrcc6](ii) a By comparing the predicted radiation dose of the multiple linear regression model with the true radiation dose, the result is shown in fig. 3, and the square of the correlation coefficient (R) is calculated by the formula2) Root-mean-square error (RMSE) and Mean Absolute Error (MAE), the formula for calculation is R2=[cor(fk,Yk)]2、 (fkModel prediction value, Y, representing the kth samplekRepresenting the true value of the kth sample, m representing the sample amount), and obtaining an evaluation index R of the multiple linear regression model2The multivariate linear regression model based on the characteristics of 13 genes has good prediction effect and can be considered as a method for predicting the radiation dose based on the radiation biomarkers.
The above description is only one embodiment of the present invention, and the description is specific and detailed, but it should not be understood as the limitation of the scope of the invention, and any person skilled in the art can be considered as the technical solution of the present invention and the inventive concept thereof, and equivalent alternatives or modifications thereof, within the technical scope of the present invention.
Claims (10)
1. A method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway, comprising: the method comprises the following steps:
searching a radiation response biological pathway and obtaining a gene ontology GO semantic meaning of the radiation response biological pathway in a gene ontology search tool;
secondly, taking the gene ontology GO semantic of the radiation response biological pathway obtained in the step I as a filtering condition, searching and obtaining a gene set A annotated to the GO semantic obtained in the step I in the species i in a genome database of the species iiThe information of (a); the gene set AiThe information comprises gene number, gene name, protein number and protein reference sequence information;
selecting any one of the species i, and acquiring expression profile data of any one of the species i by adopting a high-throughput sequencing technology after radiation exposure with different doses is carried out on any one of the species i; the expression profile data is an n multiplied by p matrix, n is the number of features in the expression profile data matrix, and p is the number of species radiation exposure treatment in the expression profile data matrix; the data in the expression profile data matrix is the expression quantity of the characteristics under the condition of corresponding radiation dose exposure; the characteristics in the expression profile data matrix comprise genes, transcripts, methylation sites, miRNA and proteins;
fourthly, standardizing the expression spectrum matrix in the third step to obtain a standardized expression spectrum matrix;
fifthly, obtaining A according to the step IIiExtracting a normalized expression spectrum matrix based on the biological pathway of the radiation response from the normalized expression spectrum matrix obtained in the step (iv);
analyzing the expression spectrum matrix obtained in the fifth step by using an ANOVA method, setting the significance level p of the difference between groups as a corresponding threshold value, and obtaining a differential expression characteristic set BiAnd a corresponding difference expression feature matrix;
seventhly, utilizing PPI network database to carry out difference expression characteristic set B obtained in the step (c)iAnalyzing and constructing a protein interaction network; recording the characteristics of the protein molecule connection degree greater than the set threshold value to obtain important characteristicsSet of differentially expressed features CiAnd its corresponding important difference expression characteristic matrix;
using a biological statistical method or a machine learning algorithm to carry out statistical modeling analysis on the important difference expression characteristic matrix obtained in the step (C), and establishing a set CiFeature C ofij(j-1, 2, …, s) is independent variable, and the radiation exposure dose Yk(k ═ 1,2, …, m) for multivariate regression models of dependent variables;
when a machine learning algorithm is used for modeling and analyzing an expression characteristic matrix, firstly, the expression characteristic matrix is divided into a training matrix and a testing matrix according to the proportion of 70% to 30% or a leave-one-cross verification method, secondly, parameters of the machine learning algorithm are subjected to stepwise regression analysis, the accuracy of statistical modeling is optimized, and the optimal algorithm execution parameters are determined;
ninthly, calculating evaluation indexes of the model established in the step eight, comprehensively comparing and analyzing the prediction performance of the model, and determining the optimal type of a biometric method or a machine learning algorithm; performing coefficient test according to the differential expression characteristics in the optimal biometry or machine learning algorithm, adopting t test, determining that the significance exists when the p value is smaller than a set threshold value, and determining a significant differential expression characteristic set; the set threshold values are 0.05, 0.01 and 0.001;
the obtained significant difference expression characteristic set is a radiation biomarker screened based on a radiation response biological pathway; the optimal biometric method or machine learning algorithm type obtained is a radiation quantity prediction method based on the radiation biomarkers.
2. The method of claim 1, wherein: the gene ontology search tool in the step I is a QuickGO database; gene ontology GO semantics of known radiation response biological pathways can be repeatedly retrieved using the QuickGO database.
3. The method of claim 1, wherein: the species i in the step II is a species containing known genome information in a genome database, and comprises a human, a mouse and a rat; the genome database in the step II is an animal genome annotation database Ensembl and a plant genome annotation database EnsemblPlants.
4. The method of claim 1, wherein: the high-throughput sequencing technology in the third step comprises gene expression profiling, transcriptome sequencing, methylation sequencing, miRNA expression profiling and protein expression profiling.
5. The method of claim 1, wherein: the expression profile data of any species in the species i in the third step comprises a gene expression profile, an miRNA expression profile and a protein expression profile.
6. The method of claim 1, wherein: the standardization processing method in the step (IV) comprises a Max-Min standardization method, a standard deviation standardization method and a logarithm standardization method.
7. The method of claim 1, wherein: the corresponding threshold values in the step (sixthly) are 0.05, 0.01 and 0.001.
8. The method of claim 1, wherein: the PPI network database in the step (c) is an STRING network online database; the protein molecule connectivity is the score of the interaction between the protein molecules, and comprises the highest confidence, the high confidence, the medium confidence and the low confidence, wherein the highest confidence is 0.90, the high confidence is 0.70, the medium confidence is 0.40 and the low confidence is 0.15; the set threshold is a high confidence or above, or adjusted based on the number of features and the actual protein molecule connectivity.
9. The method of claim 1, wherein: the biometric method in the step (viii) includes a multiple linear regression method and a multiple non-linear regression method; the machine learning algorithm comprises a neural network algorithm, a support vector machine algorithm and a random forest algorithm.
10. The method of claim 1, wherein: the evaluation index in the step (ninthly) includes a square R of a correlation coefficient2Root mean square error RMSE and mean absolute error MAE, the calculation formulas are R2=[cor(fk,Yk)]2、Wherein f iskModel prediction value, Y, representing the kth samplekThe true value of the kth sample is represented and m represents the sample size.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910631911.7A CN110349633B (en) | 2019-07-12 | 2019-07-12 | Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910631911.7A CN110349633B (en) | 2019-07-12 | 2019-07-12 | Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110349633A CN110349633A (en) | 2019-10-18 |
CN110349633B true CN110349633B (en) | 2021-03-16 |
Family
ID=68176198
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910631911.7A Active CN110349633B (en) | 2019-07-12 | 2019-07-12 | Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110349633B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111524594A (en) * | 2020-06-12 | 2020-08-11 | 山东大学 | Target population blood system malignant tumor screening system |
CN113053453B (en) * | 2021-03-15 | 2022-01-04 | 中国农业科学院农业质量标准与检测技术研究所 | Method for screening perfluorooctane sulfonate toxicity pivot gene and key signal path by using transcriptomics |
CN113537280A (en) * | 2021-05-21 | 2021-10-22 | 北京中医药大学 | Intelligent manufacturing industry big data analysis method based on feature selection |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104582477A (en) * | 2012-06-21 | 2015-04-29 | 社会福祉法人三星生命公益财团 | Method for preparing patient-specific glioblastoma animal model, and use thereof |
CN109584955A (en) * | 2018-11-27 | 2019-04-05 | 大连海事大学 | A method of mankind's rdaiation response biomarker is identified based on various plants genome |
CN109584968A (en) * | 2018-11-27 | 2019-04-05 | 大连海事大学 | A method of biological process regulation new gene is participated in for screening |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105117617B (en) * | 2015-08-26 | 2017-10-24 | 大连海事大学 | A kind of method for screening environmental sensitivity biomolecule |
EP3678684A1 (en) * | 2016-09-06 | 2020-07-15 | Southwick, Graeme | A clinical management protocol |
CN107766697A (en) * | 2017-09-18 | 2018-03-06 | 西安电子科技大学 | A kind of general cancer gene expression and the association analysis method that methylates |
-
2019
- 2019-07-12 CN CN201910631911.7A patent/CN110349633B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104582477A (en) * | 2012-06-21 | 2015-04-29 | 社会福祉法人三星生命公益财团 | Method for preparing patient-specific glioblastoma animal model, and use thereof |
CN109584955A (en) * | 2018-11-27 | 2019-04-05 | 大连海事大学 | A method of mankind's rdaiation response biomarker is identified based on various plants genome |
CN109584968A (en) * | 2018-11-27 | 2019-04-05 | 大连海事大学 | A method of biological process regulation new gene is participated in for screening |
Also Published As
Publication number | Publication date |
---|---|
CN110349633A (en) | 2019-10-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110349633B (en) | Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway | |
EP3520006B1 (en) | Phenotype/disease specific gene ranking using curated, gene library and network based data structures | |
Ball et al. | An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers | |
JP5496650B2 (en) | System, method and computer program product for analyzing spectroscopic data to identify and quantify individual elements in a sample | |
Zhan et al. | A fast small-sample kernel independence test for microbiome community-level association analysis | |
Kinney et al. | Precise physical models of protein–DNA interaction from high-throughput data | |
Boldt et al. | A frequency-based gene selection method to identify robust biomarkers for radiation dose prediction | |
IL147349A (en) | Method for evaluating an inflammatory condition using calibrated gene expression profiles | |
CN111524554B (en) | Cell activity prediction method based on LINCS-L1000 perturbation signal | |
CN111653314B (en) | Method for analyzing and identifying lymphatic infiltration | |
Yao et al. | Potential role of a three-gene signature in predicting diagnosis in patients with myocardial infarction | |
CN108920889B (en) | Chemical health hazard screening method | |
Ansari et al. | A novel pathway analysis approach based on the unexplained disregulation of genes | |
Leek et al. | A statistical approach to selecting and confirming validation targets in-omics experiments | |
CN109584955B (en) | Method for identifying human radiation response biomarker based on multiple plant genomes | |
CN109145403B (en) | Near infrared spectrum modeling method based on sample consensus | |
Nongrum et al. | Identification and preliminary validation of radiation response protein (s) in human blood for a high-throughput molecular biodosimetry technology for the future | |
Lyles et al. | Likelihood‐based methods for regression analysis with binary exposure status assessed by pooling | |
Cook et al. | Characterizing the extracellular matrix transcriptome of endometriosis | |
US11435357B2 (en) | System and method for discovery of gene-environment interactions | |
Long et al. | Landscape of co-expressed genes between the myocardium and blood in sepsis and ceRNA network construction: a bioinformatic approach | |
CN112802546B (en) | Biological state characterization method, device, equipment and storage medium | |
ASCHENBRENNER | MOVING BEYOND THE SINGLE GENE: INTEGRATIVE GENE SET ANALYSIS FOR RNA-SEQ | |
Qiao et al. | A spatio-temporal model and inference tools for longitudinal count data on multicolor cell growth | |
Jayanetti | Statistical Methods for Meta-Analysis in Large-Scale Genomic Experiments |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |