CN110349633B - Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway - Google Patents

Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway Download PDF

Info

Publication number
CN110349633B
CN110349633B CN201910631911.7A CN201910631911A CN110349633B CN 110349633 B CN110349633 B CN 110349633B CN 201910631911 A CN201910631911 A CN 201910631911A CN 110349633 B CN110349633 B CN 110349633B
Authority
CN
China
Prior art keywords
radiation
expression
matrix
gene
species
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910631911.7A
Other languages
Chinese (zh)
Other versions
CN110349633A (en
Inventor
赵磊
汪燕
李安琪
陈鑫鹏
宓东
孙野青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian Maritime University
Original Assignee
Dalian Maritime University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian Maritime University filed Critical Dalian Maritime University
Priority to CN201910631911.7A priority Critical patent/CN110349633B/en
Publication of CN110349633A publication Critical patent/CN110349633A/en
Application granted granted Critical
Publication of CN110349633B publication Critical patent/CN110349633B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B5/00ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/10Ontologies; Annotations

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Physiology (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The invention discloses a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway, and belongs to the technical field of biological information. Giving a radiation response biological channel and retrieving gene ontology semantics, acquiring an annotation gene set in a specific species, combining expression profile data after the same species is exposed by radiation, and extracting the expression profile data; obtaining differential expression characteristics and a matrix through single-factor variance analysis and protein interaction network analysis; establishing a multivariate regression model by using a biological statistical method or a machine learning algorithm, and determining a significant expression characteristic set and an optimal statistical model type through comparative analysis on the model predictive performance, namely a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological channel. The method has good prediction effect, and can provide a new technical method for nuclear radiation biological dose monitoring, nuclear radiation damage diagnosis and risk early warning under emergency conditions.

Description

Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway
Technical Field
The invention belongs to the technical field of biological information, and relates to a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway.
Background
With the wide application of nuclear energy in military or medicine, people have made practical requirements on the monitoring and diagnosis of nuclear radiation amount under emergency conditions such as nuclear terrorism attack or nuclear accident. The physical detection method for monitoring the radiation dose received by organisms/personnel and diagnosing radiation damage has many limitations and disadvantages, such as long time period required for monitoring, difficulty in comprehensively and accurately reflecting the absorbed dose of organisms/personnel, evaluation of radiation damage effect and risk and the like. Therefore, several biomarker-based methods have been developed to assess the radiation dose received by an individual, and the radiation response of these biomarkers is generally proportional to the radiation dose, and is referred to as a radiation dosimeter.
Currently, the main detection methods as a biomembrane meter are: cytology detection technology, cytogenetics detection technology and molecular biology detection technology. The ideal radiation biological dosimeter has strong specificity and high sensitivity, is suitable for measuring a larger radiation dose range, can determine the radiation quantity received by organisms and the radiation damage degree in the shortest time after irradiation, and has the operation method which is as simple as possible. The first two methods usually require complicated experimental means, and generally require at least 2-3 days to obtain corresponding results, so that the methods cannot effectively cope with nuclear radiation monitoring and diagnosis in emergency. The dose of nuclear radiation can be quickly and accurately estimated by using the change of the biomolecules after the radiation exposure. Therefore, how to implement the method becomes a key problem which needs to be solved urgently for nuclear radiation risk assessment and early warning and dealing with nuclear threats.
With the development of high-throughput technology, DNA expression profiles or next-generation sequencing data provide a good data support for accurate nuclear radiation monitoring and diagnosis. In recent years, several studies have shown the effectiveness and accuracy of predicting radiation dose based on gene expression levels. For example, Paul et al (Int J Radiat Oncol Biol Phys,2008,71,1236-1244.) found that the expression levels of 74 signature genes, most of which are regulated by the TP53 gene, can be used to predict the nuclear radiation dose received by human peripheral blood. The study by Dressman et al (PLoS Med,2007,4, e106.) found that specific gene expression levels accurately reflect the radiation dose received by human or mouse peripheral blood cells. However, most of the approaches involved in these studies ignore the unique biological properties, and genes do not actually exist and function in isolation, but rather, generally function in the form of a gene regulatory network or biological pathway. Therefore, how to screen radiation biomarkers and predict radiation dose based on the radiation response biological pathway becomes a key problem to be solved urgently in the field.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, provides a method for screening radiation response biomarkers based on a specific radiation response biological pathway, a method for accurately predicting radiation dose based on the markers, and provides a new technical method for nuclear radiation biological dose monitoring, nuclear radiation damage diagnosis and risk early warning under emergency conditions.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway, comprising the steps of:
retrieving a radiation response biological pathway and obtaining a Gene Ontology GO semantic of the radiation response biological pathway in a Gene Ontology (GO) search tool;
using the gene ontology GO semantic of the radiation response biological pathway obtained in the step I as a filtering condition, searching and obtaining a gene set A annotated to the GO semantic obtained in the step I in the species i in a genome database of the species iiThe information of (a); the gene set AiThe information includes gene number (gene table ID), gene name (gene name), protein number (protein table ID), and protein reference sequence information (RefSeq peptide ID);
selecting any one of the species i, and acquiring expression profile data of any one of the species i by adopting a high-throughput sequencing technology after radiation exposure with different doses is carried out on any one of the species i; the expression profile data is an n multiplied by p matrix, n is the number of features in the expression profile data matrix, and p is the number of species radiation exposure treatment in the expression profile data matrix; the data in the expression profile data matrix is the expression quantity of the characteristics under the condition of corresponding radiation dose exposure; the characteristics in the expression profile data matrix comprise genes, transcripts, methylation sites, miRNA and proteins;
standardizing the expression spectrum matrix in the step III to obtain a standardized expression spectrum matrix;
fifthly, according to the A obtained in the step IIiExtracting a normalized expression spectrum matrix based on the biological pathway of the radiation response from the normalized expression spectrum matrix obtained in the step (iv);
analyzing the expression spectrum matrix obtained in the fifth step by using an ANOVA method, setting the significance level p of the difference among the groups as a corresponding threshold value, and obtaining a differential expression characteristic set BiAnd a corresponding difference expression feature matrix;
seventhly, utilizing PPI network database to carry out difference expression characteristic set B obtained in the step sixthlyiAnalyzing and constructing a protein interaction network; recording the characteristics of protein molecule connectivity greater than a set threshold value to obtain an important differential expression characteristic set CiAnd its corresponding important difference expression characteristic matrix;
carrying out statistical modeling analysis on the important difference expression characteristic matrix obtained in the step (C) by using a biometrical method or a machine learning algorithm, and establishing a set CiFeature C ofij(j-1, 2, …, s) is independent variable, and the radiation exposure dose Yk(k ═ 1,2, …, m) for multivariate regression models of dependent variables;
when a machine learning algorithm is used for modeling and analyzing an expression characteristic matrix, firstly, the expression characteristic matrix is divided into a training matrix and a testing matrix according to the proportion of 70% to 30% or a Leave-One-Out-Cross-Validation method (when the proportion of the training matrix and the testing matrix is selected, the training matrix and the testing matrix are determined according to the number of sample quantities, when the number of the sample quantities is large, the training matrix and the testing matrix are divided according to the proportion of 70% to 30%, and when the number of the sample quantities is small, the testing matrix and the testing matrix are divided according to the Leave-One-Cross-Validation method), secondly, parameters of the machine learning algorithm are subjected to stepwise regression analysis, the accuracy of statistical modeling is optimized, and the optimal algorithm execution parameters are determined;
ninthly, calculating evaluation indexes of the model established in the step eight, comprehensively comparing and analyzing the prediction performance of the model, and determining the optimal type of a biological statistical method or a machine learning algorithm; performing coefficient test according to the differential expression characteristics in an optimal biometry method or a machine learning algorithm, generally adopting t test, determining that the significance exists when a p value is smaller than a set threshold value, and determining a significant differential expression characteristic set; the set threshold values are 0.05, 0.01 and 0.001;
the obtained significant difference expression characteristic set is a radiation biomarker screened based on a radiation response biological pathway; the optimal biometric method or machine learning algorithm type obtained is a radiation quantity prediction method based on the radiation biomarkers.
Further, in the above technical solution, the gene ontology search tool in step (i) can be a QuickGO database (https:// www.ebi.ac.uk/QuickGO /); gene ontology GO semantics of known radiation response biological pathways can be repeatedly retrieved using the QuickGO database.
Further, in the above technical solution, the specific species i in the step (ii) is a species containing known genome information in a genome database, including human, mouse, and rat; the genome database in the step (II) is an animal genome annotation database Ensembl (http:// asia. ensemblel.org/index. html) and a plant genome annotation database Ensembl Plants (http:// Plants. ensemblel.org/index. html).
Further, in the above technical solution, the high throughput sequencing technology in step (c) includes gene expression profiling, transcriptome sequencing, methylation sequencing, miRNA expression profiling, and protein expression profiling.
Further, in the above technical solution, the expression profile data of any one of the species i in the third step may include a gene expression profile, a miRNA expression profile, and a protein expression profile.
Further, in the above technical solution, the method of the normalization processing in the step (iv) may include a Max-Min normalization method, a standard deviation normalization method, and a logarithmic normalization method.
Further, in the above technical solution, the corresponding threshold values in step (c) are 0.05, 0.01, 0.001.
Further, in the above technical solution, the PPI network database in step (c) may be an STRING network online database (https:// STRING-db.org/cgi/input.pl); the protein molecule connectivity is the score of the interaction between the protein molecules and can be the highest confidence, the high confidence, the medium confidence and the low confidence, wherein the highest confidence is 0.90, the high confidence is 0.70, the medium confidence is 0.40 and the low confidence is 0.15; the set threshold is a high confidence or above, or adjusted based on the number of features and the actual protein molecule connectivity. The adjustment according to the number of features and the actual protein molecule connectivity is specifically performed according to the fact that if the number of features and the actual protein molecules are mostly not connected, the connectivity can be reduced.
Further, in the above technical solution, the biometric method in the step (viii) may include a multiple linear regression method, a multiple non-linear regression method; the machine learning algorithm can comprise a neural network algorithm, a support vector machine algorithm and a random forest algorithm.
Further, in the above aspect, the evaluation index at the step ninthly includes a square of a correlation coefficient (R)2) The root-mean-square error (RMSE) and Mean Absolute Error (MAE) are respectively represented by R2=[cor(fk,Yk)]2
Figure BDA0002128945880000041
Figure BDA0002128945880000042
Wherein f iskModel prediction value, Y, representing the kth samplekThe true value of the kth sample is represented and m represents the sample size.
Further, in the above technical solution, the radiation biomarker screening and radiation dose prediction method is a feature selection and statistical modeling algorithm based on a radiation response biological pathway in a genome database.
According to the technical scheme, the invention has the following beneficial effects:
the invention considers that the gene usually plays a role in a gene regulation network form, searches gene information participating in a radiation response biological pathway in a gene ontology search tool and a genome database by utilizing a bioinformatics method, performs single-factor analysis of variance and protein interaction network analysis on characteristic expression matrixes, and screens and obtains corresponding radiation biomarkers; and carrying out statistical modeling analysis on the obtained radiation biomarkers by using a biometry method or a machine learning algorithm, and establishing a method for screening the radiation biomarkers and predicting the radiation dose based on a radiation response biological channel. Therefore, the invention provides a novel method for screening the radiation biomarkers and predicting the radiation dose, and can provide a novel technical method for nuclear radiation biological dose monitoring, nuclear radiation damage diagnosis and risk early warning under emergency conditions.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention;
FIG. 2 is a diagram of a protein interaction network obtained using a PPI network database;
FIG. 3 is a graph comparing the predicted radiation dose and the actual radiation dose of a multiple linear regression model.
Detailed Description
The following description will be made in detail with reference to the accompanying drawings.
FIG. 1 is a flow chart of an implementation of the method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway according to the present invention.
The present invention will be described in detail below with reference to the data of peripheral blood gene expression profiles after X-ray exposure of mice (similar data are readily available in other relevant experiments or databases).
The gene expression profile data comprises 4 groups of data, namely 4 groups of C57BL/6 male mice are irradiated by X-rays with the dose rate of 1.03Gy/min and are subjected to radiation exposure doses of 0Gy, 1.1Gy, 2.2Gy, 4.4Gy and the like, the corresponding sample amount is 12, 6 and 6, 30 samples are counted, and the peripheral blood expression profile data of the mice are acquired 24h after irradiation. The data of the radiation Gene Expression profile can be downloaded from the Gene Expression profile (GEO) database of the National Center for Biotechnology Information (NCBI), which is numbered GSE 62623.
Screening a radiation sensitivity marker by using the peripheral blood gene expression profile data of the mouse after exposure of different doses according to the following method and establishing a prediction method of radiation dose based on the radiation sensitivity marker, wherein the method comprises the following steps:
s1: in a gene ontology search tool QuickGO database (https:// www.ebi.ac.uk/QuickGO /), retrieving and obtaining GO semantics of biological channels related to DNA Damage Response (DDR) (previous researches show that DDR is one of important radiation Response biological channels), namely 'DNA repair' (GO:0006281), 'adaptive processes' (GO:0006915), 'cell cycle array' (GO:0007050), 'cell Response to DNA damagiticulus' (GO:0006974) and 'telomemaintemance' (GO: 0000723);
s2: loading the GO semantic of the DDR-related radiation response biological pathway obtained in the step S1 into a BioMart tool of an animal genome annotation database Ensembl (http:// asia. ensemble. org/index. html), searching and acquiring information of a mouse Gene set annotated by the database to the GO semantic obtained in the step S1, wherein the information comprises Gene numbers (Gene stable ID), Gene names (Gene name), Gene descriptions (Gene description) and the like, the total Gene number of the mouse Gene set totals 1026, and specific results are shown in Table 1; on the basis, 949 genes are obtained after the repeated genes are deleted;
TABLE 1 number of DDR-related pathway GO semantically annotated genes obtained based on Ensembl
Figure BDA0002128945880000061
S3: acquiring peripheral blood gene expression profile data GSE62623 of the C57BL/6 mouse after the different X-ray exposure doses through a GEO database; the data type of the expression profile is a 44397 row-30 column matrix, wherein the rows in the matrix represent 44397 gene information, the columns represent 30 different samples (including 12 control groups and 18 radiation treatment groups) information, and the data in the matrix is the expression amount of genes under the condition of corresponding radiation dose exposure;
s4: performing Max-Min standardization processing on the expression spectrum matrix in the step S3 to obtain a standardized expression spectrum matrix;
s5: extracting standardized expression spectrum matrixes of the 5 kinds of DDR-related radiation response biological pathway GO semantic gene sets from the standardized expression spectrum matrixes obtained in the step S4 by using the gene set information annotated to the 5 kinds of DDR-related radiation response biological pathway GO semantics obtained in the step S2; as a result, it was found that the 949 genes obtained in step S2 retrieved 857 matched gene information from the expression profile matrix in step S4, and the expression profile data finally obtained was a 857 × 30 matrix;
s6: performing one-factor analysis of variance on the expression spectrum matrix obtained in the step S5, setting the significance level p value of differences among groups to be 0.01, and obtaining 152 differentially expressed genes and expression matrixes corresponding to the differentially expressed genes in total;
s7: analyzing the differential expression gene set obtained in the step S6 by using an STRING network online database (https:// STRING-db.org/cgi/input.pl), setting the score of the interaction between protein molecules to be 0.70 (high confidence), and constructing a protein interaction network map as shown in FIG. 2; recording the protein molecule connectivity degree more than or equal to 3 genes to obtain 28 important differential expression genes and expression matrixes corresponding to the genes;
s8: performing statistical modeling analysis on the important difference expression gene matrix obtained in the step S7 by using a multiple linear regression model, and establishing a multiple regression model with the gene characteristics in the expression matrix as independent variables and the radiation exposure dose as dependent variables; through stepwise regression analysis, a multivariate linear regression model is optimized to obtain 13 significant differentially expressed genes (p <0.05), namely Cdk7, Foxo1, Fzr1, Mcrs1, Nsmce4a, Pold1, Psmd14, Rad51c, Rfc1, Rnf144b, Sirt1, Usp1 and Xrcc6, which are specifically shown in Table 2; these genes can be considered as radiobiomarkers obtained based on DDR-related radiation response biological pathways;
TABLE 2 significant differential expression Gene information obtained based on multiple Linear regression analysis
Name of Gene Description of Gene function p value
Cdk7 cyclin-dependent kinase 7 5.38e-05
Foxo1 forkhead box O1 6.63e-03
Fzr1 fizzy and cell division cycle 20related 1 2.42e-05
Mcrs1 microspherule protein 1 4.94e-05
Nsmce4a NSE4homolog A,SMC5-SMC6complex component 6.81e-06
Pold1 polymerase(DNA directed),delta 1,catalytic subunit 2.44e-04
Psmd14 proteasome(prosome,macropain)26S subunit,non-ATPase,14 2.27e-05
Rad51c RAD51paralog C 6.13e-06
Rfc1 replication factor C(activator 1)1 2.40e-05
Rnf144b ring finger protein 144B 2.63e-05
Sirt1 sirtuin 1 1.11e-04
Usp1 ubiquitin specific peptidase 1 4.22e-04
Xrcc6 X-ray repair complementing defective repair in Chinese hamster cells 6 1.31e-04
S9: through stepwise regression analysis, a multiple linear regression model based on the 13 significant differentially expressed genes in step S8 was also obtained, and the biological prediction expression of the radiation dose was Y-27.58-2.37 × [ Cdk7 ]]+4.92×[Foxo1]+6.86×[Fzr1]-12.64×[Mcrs1]-18.61×[Nsmce4a]-4.90×[Pold1]+9.66×[Psmd14]-5.65×[Rad51c]+8.42×[Rfc1]+3.53×[Rnf144b]+3.40×[Sirt1]+8.94×[Usp1]-9.64×[Xrcc6](ii) a By comparing the predicted radiation dose of the multiple linear regression model with the true radiation dose, the result is shown in fig. 3, and the square of the correlation coefficient (R) is calculated by the formula2) Root-mean-square error (RMSE) and Mean Absolute Error (MAE), the formula for calculation is R2=[cor(fk,Yk)]2
Figure BDA0002128945880000081
Figure BDA0002128945880000082
(fkModel prediction value, Y, representing the kth samplekRepresenting the true value of the kth sample, m representing the sample amount), and obtaining an evaluation index R of the multiple linear regression model2The multivariate linear regression model based on the characteristics of 13 genes has good prediction effect and can be considered as a method for predicting the radiation dose based on the radiation biomarkers.
The above description is only one embodiment of the present invention, and the description is specific and detailed, but it should not be understood as the limitation of the scope of the invention, and any person skilled in the art can be considered as the technical solution of the present invention and the inventive concept thereof, and equivalent alternatives or modifications thereof, within the technical scope of the present invention.

Claims (10)

1. A method for screening radiation biomarkers and predicting radiation dose based on a radiation response biological pathway, comprising: the method comprises the following steps:
searching a radiation response biological pathway and obtaining a gene ontology GO semantic meaning of the radiation response biological pathway in a gene ontology search tool;
secondly, taking the gene ontology GO semantic of the radiation response biological pathway obtained in the step I as a filtering condition, searching and obtaining a gene set A annotated to the GO semantic obtained in the step I in the species i in a genome database of the species iiThe information of (a); the gene set AiThe information comprises gene number, gene name, protein number and protein reference sequence information;
selecting any one of the species i, and acquiring expression profile data of any one of the species i by adopting a high-throughput sequencing technology after radiation exposure with different doses is carried out on any one of the species i; the expression profile data is an n multiplied by p matrix, n is the number of features in the expression profile data matrix, and p is the number of species radiation exposure treatment in the expression profile data matrix; the data in the expression profile data matrix is the expression quantity of the characteristics under the condition of corresponding radiation dose exposure; the characteristics in the expression profile data matrix comprise genes, transcripts, methylation sites, miRNA and proteins;
fourthly, standardizing the expression spectrum matrix in the third step to obtain a standardized expression spectrum matrix;
fifthly, obtaining A according to the step IIiExtracting a normalized expression spectrum matrix based on the biological pathway of the radiation response from the normalized expression spectrum matrix obtained in the step (iv);
analyzing the expression spectrum matrix obtained in the fifth step by using an ANOVA method, setting the significance level p of the difference between groups as a corresponding threshold value, and obtaining a differential expression characteristic set BiAnd a corresponding difference expression feature matrix;
seventhly, utilizing PPI network database to carry out difference expression characteristic set B obtained in the step (c)iAnalyzing and constructing a protein interaction network; recording the characteristics of the protein molecule connection degree greater than the set threshold value to obtain important characteristicsSet of differentially expressed features CiAnd its corresponding important difference expression characteristic matrix;
using a biological statistical method or a machine learning algorithm to carry out statistical modeling analysis on the important difference expression characteristic matrix obtained in the step (C), and establishing a set CiFeature C ofij(j-1, 2, …, s) is independent variable, and the radiation exposure dose Yk(k ═ 1,2, …, m) for multivariate regression models of dependent variables;
when a machine learning algorithm is used for modeling and analyzing an expression characteristic matrix, firstly, the expression characteristic matrix is divided into a training matrix and a testing matrix according to the proportion of 70% to 30% or a leave-one-cross verification method, secondly, parameters of the machine learning algorithm are subjected to stepwise regression analysis, the accuracy of statistical modeling is optimized, and the optimal algorithm execution parameters are determined;
ninthly, calculating evaluation indexes of the model established in the step eight, comprehensively comparing and analyzing the prediction performance of the model, and determining the optimal type of a biometric method or a machine learning algorithm; performing coefficient test according to the differential expression characteristics in the optimal biometry or machine learning algorithm, adopting t test, determining that the significance exists when the p value is smaller than a set threshold value, and determining a significant differential expression characteristic set; the set threshold values are 0.05, 0.01 and 0.001;
the obtained significant difference expression characteristic set is a radiation biomarker screened based on a radiation response biological pathway; the optimal biometric method or machine learning algorithm type obtained is a radiation quantity prediction method based on the radiation biomarkers.
2. The method of claim 1, wherein: the gene ontology search tool in the step I is a QuickGO database; gene ontology GO semantics of known radiation response biological pathways can be repeatedly retrieved using the QuickGO database.
3. The method of claim 1, wherein: the species i in the step II is a species containing known genome information in a genome database, and comprises a human, a mouse and a rat; the genome database in the step II is an animal genome annotation database Ensembl and a plant genome annotation database EnsemblPlants.
4. The method of claim 1, wherein: the high-throughput sequencing technology in the third step comprises gene expression profiling, transcriptome sequencing, methylation sequencing, miRNA expression profiling and protein expression profiling.
5. The method of claim 1, wherein: the expression profile data of any species in the species i in the third step comprises a gene expression profile, an miRNA expression profile and a protein expression profile.
6. The method of claim 1, wherein: the standardization processing method in the step (IV) comprises a Max-Min standardization method, a standard deviation standardization method and a logarithm standardization method.
7. The method of claim 1, wherein: the corresponding threshold values in the step (sixthly) are 0.05, 0.01 and 0.001.
8. The method of claim 1, wherein: the PPI network database in the step (c) is an STRING network online database; the protein molecule connectivity is the score of the interaction between the protein molecules, and comprises the highest confidence, the high confidence, the medium confidence and the low confidence, wherein the highest confidence is 0.90, the high confidence is 0.70, the medium confidence is 0.40 and the low confidence is 0.15; the set threshold is a high confidence or above, or adjusted based on the number of features and the actual protein molecule connectivity.
9. The method of claim 1, wherein: the biometric method in the step (viii) includes a multiple linear regression method and a multiple non-linear regression method; the machine learning algorithm comprises a neural network algorithm, a support vector machine algorithm and a random forest algorithm.
10. The method of claim 1, wherein: the evaluation index in the step (ninthly) includes a square R of a correlation coefficient2Root mean square error RMSE and mean absolute error MAE, the calculation formulas are R2=[cor(fk,Yk)]2
Figure FDA0002857886140000031
Wherein f iskModel prediction value, Y, representing the kth samplekThe true value of the kth sample is represented and m represents the sample size.
CN201910631911.7A 2019-07-12 2019-07-12 Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway Active CN110349633B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910631911.7A CN110349633B (en) 2019-07-12 2019-07-12 Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910631911.7A CN110349633B (en) 2019-07-12 2019-07-12 Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway

Publications (2)

Publication Number Publication Date
CN110349633A CN110349633A (en) 2019-10-18
CN110349633B true CN110349633B (en) 2021-03-16

Family

ID=68176198

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910631911.7A Active CN110349633B (en) 2019-07-12 2019-07-12 Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway

Country Status (1)

Country Link
CN (1) CN110349633B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111524594A (en) * 2020-06-12 2020-08-11 山东大学 Target population blood system malignant tumor screening system
CN113053453B (en) * 2021-03-15 2022-01-04 中国农业科学院农业质量标准与检测技术研究所 Method for screening perfluorooctane sulfonate toxicity pivot gene and key signal path by using transcriptomics
CN113537280A (en) * 2021-05-21 2021-10-22 北京中医药大学 Intelligent manufacturing industry big data analysis method based on feature selection

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104582477A (en) * 2012-06-21 2015-04-29 社会福祉法人三星生命公益财团 Method for preparing patient-specific glioblastoma animal model, and use thereof
CN109584955A (en) * 2018-11-27 2019-04-05 大连海事大学 A method of mankind's rdaiation response biomarker is identified based on various plants genome
CN109584968A (en) * 2018-11-27 2019-04-05 大连海事大学 A method of biological process regulation new gene is participated in for screening

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117617B (en) * 2015-08-26 2017-10-24 大连海事大学 A kind of method for screening environmental sensitivity biomolecule
EP3678684A1 (en) * 2016-09-06 2020-07-15 Southwick, Graeme A clinical management protocol
CN107766697A (en) * 2017-09-18 2018-03-06 西安电子科技大学 A kind of general cancer gene expression and the association analysis method that methylates

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104582477A (en) * 2012-06-21 2015-04-29 社会福祉法人三星生命公益财团 Method for preparing patient-specific glioblastoma animal model, and use thereof
CN109584955A (en) * 2018-11-27 2019-04-05 大连海事大学 A method of mankind's rdaiation response biomarker is identified based on various plants genome
CN109584968A (en) * 2018-11-27 2019-04-05 大连海事大学 A method of biological process regulation new gene is participated in for screening

Also Published As

Publication number Publication date
CN110349633A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110349633B (en) Method for screening radiation biomarkers and predicting radiation dose based on radiation response biological pathway
EP3520006B1 (en) Phenotype/disease specific gene ranking using curated, gene library and network based data structures
Ball et al. An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers
JP5496650B2 (en) System, method and computer program product for analyzing spectroscopic data to identify and quantify individual elements in a sample
Zhan et al. A fast small-sample kernel independence test for microbiome community-level association analysis
Kinney et al. Precise physical models of protein–DNA interaction from high-throughput data
Boldt et al. A frequency-based gene selection method to identify robust biomarkers for radiation dose prediction
IL147349A (en) Method for evaluating an inflammatory condition using calibrated gene expression profiles
CN111524554B (en) Cell activity prediction method based on LINCS-L1000 perturbation signal
CN111653314B (en) Method for analyzing and identifying lymphatic infiltration
Yao et al. Potential role of a three-gene signature in predicting diagnosis in patients with myocardial infarction
CN108920889B (en) Chemical health hazard screening method
Ansari et al. A novel pathway analysis approach based on the unexplained disregulation of genes
Leek et al. A statistical approach to selecting and confirming validation targets in-omics experiments
CN109584955B (en) Method for identifying human radiation response biomarker based on multiple plant genomes
CN109145403B (en) Near infrared spectrum modeling method based on sample consensus
Nongrum et al. Identification and preliminary validation of radiation response protein (s) in human blood for a high-throughput molecular biodosimetry technology for the future
Lyles et al. Likelihood‐based methods for regression analysis with binary exposure status assessed by pooling
Cook et al. Characterizing the extracellular matrix transcriptome of endometriosis
US11435357B2 (en) System and method for discovery of gene-environment interactions
Long et al. Landscape of co-expressed genes between the myocardium and blood in sepsis and ceRNA network construction: a bioinformatic approach
CN112802546B (en) Biological state characterization method, device, equipment and storage medium
ASCHENBRENNER MOVING BEYOND THE SINGLE GENE: INTEGRATIVE GENE SET ANALYSIS FOR RNA-SEQ
Qiao et al. A spatio-temporal model and inference tools for longitudinal count data on multicolor cell growth
Jayanetti Statistical Methods for Meta-Analysis in Large-Scale Genomic Experiments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant