CN114628031A - Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients - Google Patents
Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients Download PDFInfo
- Publication number
- CN114628031A CN114628031A CN202210126121.5A CN202210126121A CN114628031A CN 114628031 A CN114628031 A CN 114628031A CN 202210126121 A CN202210126121 A CN 202210126121A CN 114628031 A CN114628031 A CN 114628031A
- Authority
- CN
- China
- Prior art keywords
- genes
- module
- gene
- modal
- ppcc
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 206010028980 Neoplasm Diseases 0.000 title claims abstract description 77
- 201000011510 cancer Diseases 0.000 title claims abstract description 58
- 238000000034 method Methods 0.000 title claims abstract description 47
- 239000000090 biomarker Substances 0.000 title claims abstract description 43
- 238000005457 optimization Methods 0.000 title claims abstract description 33
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 148
- 230000003993 interaction Effects 0.000 claims abstract description 6
- 230000035772 mutation Effects 0.000 claims description 24
- 238000012937 correction Methods 0.000 claims description 10
- 230000004186 co-expression Effects 0.000 claims description 9
- 238000012217 deletion Methods 0.000 claims description 9
- 230000037430 deletion Effects 0.000 claims description 9
- 206010069754 Acquired gene mutation Diseases 0.000 claims description 7
- 230000037439 somatic mutation Effects 0.000 claims description 7
- 238000011156 evaluation Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 claims description 5
- 101150084750 1 gene Proteins 0.000 claims description 3
- 230000001174 ascending effect Effects 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 claims description 3
- 239000013612 plasmid Substances 0.000 claims description 3
- 230000002238 attenuated effect Effects 0.000 claims description 2
- 239000003814 drug Substances 0.000 abstract description 9
- 239000003550 marker Substances 0.000 abstract description 2
- 230000000007 visual effect Effects 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 19
- 239000000523 sample Substances 0.000 description 17
- 208000010507 Adenocarcinoma of Lung Diseases 0.000 description 7
- 239000003596 drug target Substances 0.000 description 7
- 238000011161 development Methods 0.000 description 6
- 230000018109 developmental process Effects 0.000 description 6
- 229940079593 drug Drugs 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 101000798015 Homo sapiens RAC-beta serine/threonine-protein kinase Proteins 0.000 description 1
- 108700020796 Oncogene Proteins 0.000 description 1
- 102100032315 RAC-beta serine/threonine-protein kinase Human genes 0.000 description 1
- CBPNZQVSJQDFBE-FUXHJELOSA-N Temsirolimus Chemical compound C1C[C@@H](OC(=O)C(C)(CO)CO)[C@H](OC)C[C@@H]1C[C@@H](C)[C@H]1OC(=O)[C@@H]2CCCCN2C(=O)C(=O)[C@](O)(O2)[C@H](C)CC[C@H]2C[C@H](OC)/C(C)=C/C=C/C=C/[C@@H](C)C[C@@H](C)C(=O)[C@H](OC)[C@H](O)/C(C)=C/[C@@H](C)C(=O)C1 CBPNZQVSJQDFBE-FUXHJELOSA-N 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 230000032683 aging Effects 0.000 description 1
- 230000008827 biological function Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000000481 breast Anatomy 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006866 deterioration Effects 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000002068 genetic effect Effects 0.000 description 1
- 208000024312 invasive carcinoma Diseases 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000009456 molecular mechanism Effects 0.000 description 1
- 231100000590 oncogenic Toxicity 0.000 description 1
- 230000002246 oncogenic effect Effects 0.000 description 1
- 239000013610 patient sample Substances 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 102000004169 proteins and genes Human genes 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 206010041823 squamous cell carcinoma Diseases 0.000 description 1
- 229960000235 temsirolimus Drugs 0.000 description 1
- QFJCIRLUMZQUOT-UHFFFAOYSA-N temsirolimus Natural products C1CC(O)C(OC)CC1CC(C)C1OC(=O)C2CCCCN2C(=O)C(=O)C(O)(O2)C(C)CCC2CC(OC)C(C)=CC=CC=CC(C)CC(C)C(=O)C(OC)C(O)C(C)=CC(C)C(=O)C1 QFJCIRLUMZQUOT-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/30—Detection of binding sites or motifs
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/40—Population genetics; Linkage disequilibrium
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Physics & Mathematics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Genetics & Genomics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Biophysics (AREA)
- Theoretical Computer Science (AREA)
- Public Health (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Evolutionary Biology (AREA)
- Biotechnology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Molecular Biology (AREA)
- Pathology (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Databases & Information Systems (AREA)
- Physiology (AREA)
- Epidemiology (AREA)
- Ecology (AREA)
- Primary Health Care (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to the technical field of cancer individual biomarker identification, in particular to a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients, which comprises the following steps: (1) constructing a personalized gene interaction network PGIN of the individual patient from genome data of the individual patient; (2) designing an optimization objective function; (3) and searching for personalized dynamic network marker sets PDNBs by utilizing a multi-modal evolutionary algorithm. The multi-modal optimization method can effectively identify the PDNB of an individual cancer patient, explore the multi-modal nature of network biomarkers, and provide a new visual angle for understanding tumor heterogeneity in precise medicine.
Description
Technical Field
The invention relates to the technical field of cancer individual biomarker identification, in particular to a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients.
Background
In recent years, with the progress of aging, industrialization and urbanization of the population of China being accelerated, along with a series of reasons such as unhealthy life style and environmental exposure, cancer has become one of the main causes of death of residents of China in recent years.
Currently, many studies use network biomarkers from individual cancer patients as a basis for disease diagnosis in individual patients and quantify the biomarker from patient samples to detect key states in the cancer progression process. Relevant research has proved that the multi-modal problem exists in the biomedical field widely, and the existence of the multi-modal network biomarker can provide more choices for decision makers.
With the development of network science, the wide application of single-sample network technology provides an effective clue for identifying personalized biomarkers on a single sample by using high-dimensional dynamic data of cancer individual patients. Recently, many scholars have proposed bioinformatics tools for personalized biomarker identification on single sample networks. According to the main features of these methods, they can be divided into two categories: (i) personalized network biomarker identification methods, such as network elasticity-based methods, network controllability-and observability-based methods. These methods take into account intergenic associations and network information to reliably predict personalized biomarkers, as compared to traditional methods of molecular biomarkers. However, these methods still fail to detect dynamic changes before normal and critical states, which may not identify early warning signals of critical states during cancer development; (ii) dynamic Network Biomarkers (DNB) identification method. The methods mainly comprise a personalized dynamic network biomarker method, and the personalized dynamic network biomarker can be identified by using genome data of an individual patient and is used for predicting an early warning signal before acute deterioration.
Notably, DNBs may have multimodal properties (i.e. multiple equivalent network biomarkers, called multimodal network biomarkers, with the same predictive performance and the same number of genes, but consisting of different genes or proteins). Therefore, it is necessary to explore the relationship between genes from multi-modal characteristics to elucidate the molecular mechanisms of complex biological phenomena. However, most of the existing methods ignore the existence of the multi-modal network biomarkers by clustering methods or selecting the molecular module with the highest score of the early warning signal as the network biomarkers.
Disclosure of Invention
The invention provides a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients, and personalized dynamic network biomarkers of cancer individual patients can be effectively identified through multi-modal optimization.
The multi-modal optimization method for detecting the dynamic network biomarkers of the cancer individual patients comprises the following steps:
(1) constructing a personalized gene interaction network PGIN from genome data of an individual patient;
(2) designing an optimization objective function;
(3) and (3) searching for a personalized dynamic network biomarker set PDNBs by utilizing a multi-modal evolution algorithm.
Preferably, in step (1), the method for constructing PGIN of individual cancer patients comprises:
firstly, calculating a P value of an edge between any two genes in a tumor sample and a normal sample, wherein the P value is obtained based on a Z fraction of a Pearson correlation coefficient difference of the two genes; pearson correlation coefficient Δ PCC of genes i and j on individual patient kij,kAnd its Z-score Zij,kThe following formula can be used for calculation:
wherein ,are PCCs of genes i and j on n reference samples,PCC on n +1 reference samples representing genes i and j of individual patient k; the P value may be based on Zij,kCalculating the standard normal distribution of (a); when the co-expression relationship of the two genes interaction is significant in the tumor sample network but not in the normal sample network, the margin between the two genes is preserved to constitute PGIN, and vice versa; furthermore, methods to score PGIN-margined personalized pPCC were calculated by integrating somatic mutation data across cancer type specific data into PGIN, as follows:
wherein co-mutation is indicated by co-mutation, co-expression is indicated by co-expression,representing the personalized Pearson coefficient, Jaccard, of genes i and j in patient k samplesijRepresenting the Jacobian coefficients of genes i and j,representing the pearson correlation coefficient of genes i and j in the tumor sample of individual patient k,pearson correlation coefficient representation of genes i and j in Normal samples of individual patient k, T (i) and T (j) being the presence of mutant Gene i and Gene j, respectively, after examination of somatic mutations in a given cancer data setSet of tumors, D10Indicates that 10% of the data in D are sorted in ascending order10The following is a description.
Preferably, in step (2), the optimization objective function is:
wherein n represents the number of genes in PGIN, and X ═ X (X)1,x2,...,xn) Is a binary-valued decision variable, if gene i is selected in a module, xi1, otherwise xi=0;pDEin(X) represents the standard deviation of interphased pPCC within the PDNB module; i pPCCin(X) | represents the absolute value of the mean value of pccc between genes within the PDNB module; i pPCCout(X) | represents the absolute value of the mean value of pPCC between the intra-and extra-module genes of PDNB; the goal of the first objective function is to minimize the number of genes in the module and the goal of the second objective function is to maximize the module's early warning signal score, since the higher the module's early warning signal score, the more representative of the critical state of the individual patient with cancer.
Preferably, in step (3), the multi-modal evolutionary algorithm is as follows:
(3.1) randomly dividing PGIN into initial population P with population size N by improved population initialization algorithm0(ii) a A singular solution correction strategy is added to an original initialization strategy to improve the quality of an initial population;
(3.2) tournament selection strategy from Current population P Using decision space niche-basedtSelecting N/2 parents from (t is 0,1, 2.);
(3.3) generating N number of descendants Q by using crossover operatortMutation operator based on offspring QtGenerate a multi-modal solution, generate a new combined population Rt=Qt∪Pt;
(3.4) selecting N solutions from the combined population as a population P according to an environment selection mechanismt+1As the population of the next iteration;
(3.5) outputting a plurality of PDNBs until the termination condition of the iteration, i.e. the maximum number of function evaluations is satisfied.
Preferably, in step (3.3), in the crossover operator, first, two parent generations X are randomly selected from the parent generationsi and XjFor variables with the same decision vector value in the two parents, the values of the corresponding variables in the filial generations are consistent with those of the two parents; for the condition that the values of decision variables are different, calculating the selection probability p of the variablesn:
wherein ,pnRepresenting the probability that the decision variable takes the value of 1, M is the number of parents in the current iteration and when the gene n of the ith parent X is taken as the gene in PDNB, thenOtherwiseIf random number rand < pnThe variable in the corresponding in child O is set to 1, otherwise to 0.
Preferably, the mutation operator introduces a new mutation strategy comprising a singular solution modification mutation strategy and an addition/deletion mutation strategy to process the multi-modal optimization problem.
Preferably, in the singular solution correction strategy, it is first necessary to know which genes have been selected for the solutions and calculate degrees of the genes, and the gene with the highest degree is used as a candidate gene; then, calculating the degree of gene connection with the candidate gene; finally, selecting the gene with the minimum degree as the added gene, and setting the binary decision variable corresponding to the gene as 1; eventually, the degree of genes within the module is enhanced and the degree of genes within the module to genes outside the module is attenuated.
8. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 7, wherein: in the strategy of addition/deletion mutation, | pPCC with a module of 1 gene can be added or subtractedoutWithout changing | pPCCin| and pDEin(ii) a Meanwhile, deletion of isolated genes in the module can also achieve the same effect as the above operation; furthermore, the add/delete mutation strategy can result in multi-modal solutions that are different modules but have the same objective function value; the specific operation is as follows: firstly, calculating the edge weight pPCC of a gene with the module degree of externality of 1 and an isolated gene in the module; then, | pPCC for the current module is calculatedoutL, |; finally, the following two operations are performed with the same probability: one is to select a module capable of minimizing the module | pPCC after adding a gene having a degree of 1 to each moduleoutThe gene of | is set as 1 according to the binary variable corresponding to the gene; another alternative is to select a plasmid that minimizes the module | pPCC after deleting isolated genes within the module individuallyoutI and sets its corresponding binary variable to 0.
The multi-modal optimization of the invention can effectively identify the personalized dynamic network biomarkers of cancer individual patients, explore the multi-modal behavior of the network biomarkers and provide a new visual angle for understanding tumor heterogeneity in precise medicine.
Drawings
FIG. 1 is a flow chart of a multi-modal optimization method for detecting dynamic network biomarkers of cancer patients in example 1;
FIG. 2 is an explanatory diagram of the crossover operator in example 1;
FIG. 3 is a diagram illustrating a singular solution correction strategy in a transform operator in example 1;
FIG. 4 is a diagram illustrating an addition/deletion mutation strategy in mutation operators in example 1;
FIG. 5(a) is a graph showing the degree of cancer driver gene enrichment for PDNB with the greatest early warning score in example 1;
FIG. 5(b) is a graph showing the degree of cancer driver gene enrichment of all PDNBs in individual patients in example 1;
FIG. 6 is a diagram illustrating multi-modal comparison of the four evolutionary algorithms in the example.
Detailed Description
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples. It is to be understood that the examples are illustrative of the invention and not restrictive.
Example 1
This embodiment provides a Multi-Modal Personalized Dynamic Network marker recognition model (MMPDNB) with input of gene expression data of a matched sample of individual cancer patients and output of PDNBs of individual patients. The described MMPDNB consists essentially of two parts, one to construct PGIN from genomic data of individual patients; and designing and identifying an optimization objective function and a multi-modal evolution strategy of the PDNB.
As shown in FIG. 1, the present embodiment provides a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients, which comprises the following steps:
step 1: construction of PGIN for Individual cancer patients
Paired samples of individual cancer patients (i.e., normal and tumor samples from the same patient) were filtered and obtained from the TCGA data portal. The PGIN of individual cancer patients was constructed using a paired single sample network, following the following principles. First, the P value of the edge between any two genes in the tumor sample and the normal sample is calculated, and the P value can be obtained based on the Z fraction of the Pearson correlation coefficient difference of the two genes. Pearson correlation coefficient Δ PCC for genes i and j on sample kij,kAnd its Z fraction Zij,kThe following formula can be used for calculation:
wherein ,are PCCs of genes i and j on n reference samples,PCC on n +1 reference samples representing genes i and j of individual patient k. The P value may be based onIs calculated from the standard normal distribution of (a). When the co-expression relationship of two gene interactions is significant in the tumor sample network (P value)<0.05) but not significant in the normal sample network (P value)>0.05), the border between the two genes was preserved to constitute the PGIN, and vice versa. Furthermore, by integrating somatic mutation data across cancer type specific data into PGINs, a method to score PGIN-margined personalized pcc (ppcc) was calculated as follows:
wherein co-mutation is indicated by co-mutation, co-expression is indicated by co-expression,representing the personalized Pearson coefficient, Jaccard, of genes i and j in patient k samplesijRepresenting the Jacard coefficients of genes i and j,representing the pearson correlation coefficient of genes i and j in the tumor sample of individual patient k,denotes the Pearson's correlation coefficient for genes i and j in normal samples of individual patient k, T (i) and T (j) are the set of tumors in which mutant gene i and gene j are present after examination of somatic mutations in a given cancer data set, respectively, D10Indicates that 10% of the data in D are sorted in ascending order10The following is a description.
Thus, rather than simply differentiating the differences in gene expression between normal and tumor samples, the pPCC of an individual patient can reflect the association between genes in the dynamic process of cancer progression by taking into account somatic mutation data.
Step 2: designing an optimization objective function
It was found that the formation of PDNB, which can serve as an early warning signal for cancer, does not require a large number of genes. Thus, MMPDNB expects PDNB with fewer genes and still accurately predict key states in the development of cancer. The MMPDNB identifies the PDNB by two objective functions:
wherein n represents the number of genes in PGIN. Solving X ═ X1,x2,...,xn) Is a binary-valued decision variable, if gene i is selected in a module, xi1, otherwise xi=0。pDEin(X) represents the standard deviation of interphased pPCC within the PDNB module; i pPCCin(X) | represents the absolute value of the mean value of pccc between genes within the PDNB module; i pPCCout(X) | represents the absolute value of the mean value of pPCC between the intra-and extra-module genes of the PDNB module. In short, the goal of the first objective function is to minimize the number of genes in the module. The goal of the second objective function is to maximize the module's early warning signal score because the module's early warning signal scoreThe higher the number of patients with cancer, the more critical the number of patients with cancer.
And step 3: finding personalized dynamic network biomarker set PDNBs by utilizing multi-modal evolution algorithm
Based on an NSGA-II (non-normalized partitioning genetic algorithm, NSGA-II) framework, a new population generation mechanism, a crossover operator and a mutation operator are introduced, and a new multi-modal evolution algorithm is provided. The process is as follows: firstly, PGIN is randomly divided into an initial population P with the population size N through an improved population initialization algorithm0. Adding a singular solution correction strategy to the original initialization strategy to improve the quality of the initial population, and then utilizing a tournament selection strategy based on decision space niches to select the current population PtN/2 parents were selected from (t ═ 0,1, 2.). Then using crossover operator to produce N number of filial generation QtThe mutation operator is based on the offspring QtA multi-modal solution is generated. Then a new combined population Rt=Qt∪PtGenerating, selecting N solutions from the combined population as a population P according to an environment selection mechanismt+1As the population for the next iteration. Finally, the MMPDNB outputs a plurality of PDNBs until the termination condition of the iteration (i.e., the maximum number of function evaluations is satisfied). The proposed MMPDNB framework is given in algorithm 1.
Algorithm 1 framework of MMPDNB
Inputting: i isD(data of individual cancer patients), CD(cancerpatient Data),
N (population size), T (end conditions)
And (3) outputting: multiple PDNB for individual patients
1 PGIN ← paired single sample method (I)D,CD) (ii) a Establishing PGIN for individual patients
2P ← population initializer (PGIN, N); v/initial population of size N generated by initialization operator based on singular solution correction strategy
3F ← function evaluation (P, PGIN); // two objective functions of the calculated solution
4 while (T)// not satisfying the termination condition, then loop
5 Pa ← tournament selection policy (P, F, N/2); v/generating N/2 solutions as parents through a tournament selection strategy based on decision space niches;
6Q ← crossover operator (Pa, N); // generating N generations
7 MMQ ← add/delete variation policy (Q, N); // generating children comprising multimodal solutions based on child Q
8F ← function evaluation (MMQ, PGIN);
9[ MMQ, F ] ← singular solution modification variation strategy (MMQ, F); v/correction of singular solutions in MMQ
10R ← P ═ MMQ; // generating a pooled population of size 2N
11 Pt+1Axle context selection operator (R, F, N); v/select N solution as population for next iteration
12 End
12 individual patients' multiple PDNB ← pareto optimal PDNB in the population.
13, returning: multiple PDNB for individual patients
And (3) a crossover operator: for binary decision variable optimization problems, existing crossover operators select decision variables by comparing scores of any two decision variables. However, the PDNB module is a set of decision variables, and thus assigning scores to individual decision variables is not trivial. In order to solve the problem, a new cross operator is adopted in the MMPDNB, and the concept of comprehensive learning is introduced, so that the multi-modal problem can be effectively solved (fig. 2). Firstly, two parents X are randomly selected from the parentsi and XjFor variables with the same decision vector value in the two parents, the values of the corresponding variables in the filial generations are consistent with those of the two parents; for the case that the decision variables have different values, the selection probability p of the variables is calculatedn:
wherein ,pnRepresenting the probability of the decision variable taking a value of 1. M is the number of parents in the current iteration and when the gene n of the ith parent X doesIs a gene in PDNB, thenOtherwiseIf the random number rand < pnThe variable in the corresponding in child O is set to 1, otherwise to 0.
Mutation operator: it should be noted that most of the existing mutation operators do not make a specific strategy to identify the PDNB, and it is difficult to generate a multi-modal solution to the problem. To overcome this difficulty, a new mutation strategy (fig. 3 and 4) including a singular solution of a modified mutation strategy and an add/delete mutation strategy is introduced into the algorithm to handle the multi-modal optimization problem (equations (5) - (6)).
Singular solution correction strategy: there may be some singular solutions when generating the offspring whose second objective function value is not present or equal to 0. These solutions (e.g., f)1=2,f20) remains in the population all the time under the action of the selection mechanism of the environmental selection strategy, since it is a non-dominant solution. They therefore reduce the diversity of the population, which may lead to the population falling into a local optimum. In this regard, a special mutation strategy will be implemented for these solutions to optimize the value of the second objective function. First, it is necessary to know which genes are selected for these solutions and calculate the degrees of these genes, and the gene with the greatest degree is taken as a candidate gene. A greater degree of a gene means that it interacts with more genes, which provides more opportunities to select the appropriate gene to form the PDNB module. Therefore, the gene with the greatest degree is selected as a candidate gene (e.g., fig. 3). Thereafter, the degree of gene ligation to the candidate gene was calculated. Finally, the least aggressive gene is selected as the added gene (e.g., fig. 3) and its corresponding binary decision variable is set to 1. Finally, the degree of genes in the module is strengthened, and the degree of genes in the module and genes outside the module is weakened, so that the PDNB module is more consistent with the properties of the PDNB module.
Addition/deletion mutation strategy: addition of 1 Gene to a Module can increase or decrease the Module|pPCCoutWithout changing | pPCCin| and pDEin. Also, deletion of isolated genes in the module (i.e., genes not linked to other genes in the module) can achieve the same effect as described above. Furthermore, the add/delete variation strategy has the opportunity to obtain multi-modal solutions that are different modules but have the same objective function value. The specific operation is as follows: first, the edge weight (pPCC) of the gene with a module degree of exology of 1 and the isolated gene within the module was calculated. Then, | pPCC for the current module is calculatedoutL. Finally, the following two operations are performed with the same probability. One is to select a module capable of minimizing the module | pPCC after adding a gene having a degree of 1 to each moduleoutI (e.g., g1 and g2 in fig. 4) and sets their corresponding binary variables to 1. Another alternative is to select a plasmid that minimizes the module | pPCC after deleting isolated genes within the module individuallyoutI (e.g., g3 and g4 in fig. 4), and sets their corresponding binary variables to 0.
This example will demonstrate the proposed MMPDNB method on three cancer data. The data were 112 Breast Invasive carcinomas (BRCA), 49 lung squamous carcinomas (lucc), and 57 lung adenocarcinomas (LUAD), respectively.
Experiment of
Parameter setting
And (3) population setting: the algorithm population size was set to 300 according to the scale of the problem. Independent runs and termination conditions to reduce the chance of results, MMPDNB runs independently 30 times on each patient PGIN and the results of the 30 independent runs are pooled and the Pareto optimal solution of the pool is output. The termination criteria were the number of objective function evaluations, set to 30,000 per independent run.
Comparison algorithm
9 algorithms are selected to be compared with the MMPDNB, and the algorithms are divided into 3 groups according to the characteristics of the algorithms. A multi-modal evolutionary algorithm: DN-NSGA-II, MO _ Ring _ PSO _ SCD and MP-MMEA. NDB algorithm: Cluster-DNB, L-DNB. The Clustering-DNB algorithm divides the PGIN into a plurality of sub-networks by hierarchical Clustering, calculates the early warning score of each sub-network, and selects the sub-network with the highest score as the PDNB. The L-DNB method scores each gene by calculating the early warning signal score of a network formed by one gene and a first-order connection matrix thereof. All genes were ranked from high to low by score and the top 20 genes were selected as PDNB for individual patients. And (3) network control algorithm: the network control algorithm utilizes the controllability and objectivity of the network to find a group of nodes with the minimum number, and the nodes can control or reconstruct the state of the whole network.
Results of the experiment
The MMPDNB can effectively detect early warning signals of cancer development
PDNB mean score peaks for BRCA, lucc and LUAD were detected at stage IIIB, IIB and IB and were considered early warning signs of the development of three cancers. Compared with other comparison algorithms, the MMPDNB not only can more effectively detect the early warning signal of cancer development, but also has the advantage that the early warning signal score of the PDNB at a key stage is remarkably different from that of other stages.
MMPDNB and MP-MMEA detect early warning signals better than other multi-modal evolutionary algorithms, however, for MP-MMEA, although the key stages of the three cancer datasets are similar to MMPDNB, the average score is much lower than MMPDNB.
Compared with the Clustering-DNB method and the L-DNB algorithm, the MMPDNB with the multi-modal optimization capability can obtain a larger PDNB early warning signal score, which shows that the MMPDNB can more effectively detect the early warning signals of 3 cancer patients.
In the network controllability method result, the average scores of different stages have no obvious difference, and the key stage cannot be effectively detected through the early warning signal.
MMPDNB can effectively identify cancer driver genes
Since the output of MMPDNB is multiple PDNBs for individual patients (i.e., Pareto optimal solution), the validity of MMPDNB to identify oncogenic driver genes was verified from two aspects (fig. 5(a), 5 (b)). One is to find the PDNB with the highest score of early warning signals among the PDNBs for each patient and with a F-score and then calculate the average F-score of these PDNBs. The other is to calculate the average of all PDNBs with F-score separately. It should be noted that the same processing method is also applied to other multi-modal evolutionary algorithms. Fig. 5(a) and 5(b) demonstrate that MMPDNB can have a higher degree of cancer driver enrichment on BRCA, lucc and LUAD cancer datasets.
MMPDNB can effectively identify multi-modal PDNB
Figure 6 plots MMPDNB and the other three multi-modal evolutionary algorithms identify the number of individual patients with multi-modal PDNB on the three cancer datasets. As shown in fig. 6, MMPDNB found 69, 41 and 47 individual patients with multi-modal PDNB on BRCA, lucc and LUAD datasets, respectively. It can be concluded that MMPDNB has significant advantages over other multi-modal evolutionary algorithms in detecting multi-modal PDNBs.
The multi-modal PDNB identified by MMPDNB presents drug targets.
It is noted that the significance of multimodal PDNBs is that their distinct genes have different biological functions. The differential genes of multimodal PDNBs are defined as genes that one of the multimodal PDNBs does not have and the other multimodal PDNBs do. By analyzing the drug information of the differential genes of the multi-modal PDNBs, whether the multi-modal PDNBs of the MMPDNB can provide important drug targets for early treatment of cancers is evaluated. By searching the information of drug target points from the iGMDR database, experiments show that the differential genes of the PDNB belonging to the multiple modes of early BRCA patients contain 14 drug target points, and related drugs aiming at BRCA can act on the target point genes. For example, AKT2 is a recognized oncogene target for which Temsirolimus drugs, which are clinically sensitive in BRCA patients, can act, as shown in table 1. The results relating to LUSC and LUAD are shown in tables 2-3.
TABLE 1 drug target genes and effective drugs obtained in early stage of BRCA in MMPDNB model
TABLE 2 drug target genes and effective drugs obtained in early LUSC by MMPDNB model
TABLE 3 drug target genes and effective drugs obtained in early LUAD by MMPDNB model
Conclusion
All experimental results show that the multi-modal optimization can effectively identify personalized dynamic network biomarkers of cancer individual patients, explore the multi-modal behavior of the network biomarkers, and the MMPDNB can provide a new perspective for understanding tumor heterogeneity in precise medicine.
The present invention and its embodiments have been described above schematically, and the description is not intended to be limiting, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, without departing from the spirit of the present invention, a person of ordinary skill in the art should understand that the present invention shall not be limited to the embodiments and the similar structural modes without creative design.
Claims (8)
1. A multimodal optimization method for detecting dynamic network biomarkers in cancer individual patients, characterized by: the method comprises the following steps:
(1) constructing a personalized gene interaction network PGIN from genome data of an individual patient;
(2) designing an optimization objective function;
(3) and (3) searching for a personalized dynamic network biomarker set PDNBs by utilizing a multi-modal evolution algorithm.
2. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 1, wherein: in step (1), the method for constructing PGIN of individual cancer patients comprises:
firstly, calculating a P value of an edge between any two genes in a tumor sample and a normal sample, wherein the P value is obtained based on a Z fraction of a Pearson correlation coefficient difference of the two genes; pearson correlation coefficient Δ PCC of genes i and j on individual patient kij,kAnd its Z fraction Zij,kThe following formula can be used for calculation:
wherein ,are PCCs of genes i and j on n reference samples,PCC on n +1 reference samples representing genes i and j of individual patient k; the P value may be based on Zij,kCalculating the standard normal distribution of (a); when the co-expression relationship of the two genes interaction is significant in the tumor sample network but not in the normal sample network, the margin between the two genes is preserved to constitute PGIN, and vice versa; furthermore, methods to score PGIN-margined personalized pPCC were calculated by integrating somatic mutation data across cancer type specific data into PGIN, as follows:
wherein co-mutation is indicated by co-mutation, co-expression is indicated by co-expression,representing the personalized Pearson coefficient, Jaccard, of genes i and j in patient k samplesijRepresenting the Jacard coefficients of genes i and j,representing the pearson correlation coefficient of genes i and j in the tumor sample of individual patient k,denotes the Pearson's correlation coefficient for genes i and j in normal samples of individual patient k, T (i) and T (j) are the set of tumors in which mutant gene i and gene j are present after examination of somatic mutations in a given cancer data set, respectively, D10Indicates that 10% of the data in D are sorted in ascending order10The following is a description.
3. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 2, wherein: in the step (2), the optimization objective function is as follows:
wherein n represents the number of genes in PGIN, and X ═ X (X)1,x2,...,xn) Is a binary-valued decision variable, if gene i is selected in a module, xi1, otherwise xi=0;pDEin(X) represents the standard deviation of interphased pPCC within the PDNB module; i pPCCin(X) | represents the absolute value of the mean value of pccc between genes within the PDNB module; i pPCCout(X) | represents the absolute value of the mean value of pPCC between the intra-and extra-module genes of PDNB; the goal of the first objective function is to minimize the number of genes in the module and the goal of the second objective function is to maximize the module's early warning signal score, since the higher the module's early warning signal score, the more representative of the critical state of the cancer individual patient.
4. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 3, wherein: in the step (3), the multi-modal evolutionary algorithm is as follows:
(3.1) randomly dividing PGIN into initial population P with population size N by improved population initialization algorithm0(ii) a A singular solution correction strategy is added to an original initialization strategy to improve the quality of an initial population;
(3.2) tournament selection strategy from Current population P Using decision space niche-basedtSelecting N/2 parents from (t-0, 1, 2.);
(3.3) generating N number of descendants Q by using crossover operatortThe mutation operator is based on the offspring QtGenerate a multi-modal solution, generate a new combined population Rt=Qt∪Pt;
(3.4) selecting N solutions from the combined population as a population P according to an environment selection mechanismt+1As the population for the next iteration;
(3.5) outputting a plurality of PDNBs until the termination condition of the iteration, i.e. the maximum number of function evaluations is satisfied.
5. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 4, wherein: in the step (3.3), in the crossover operator, firstly, two parent generations X are randomly selected from the parent generationsi and XjFor variables with the same decision vector value in the two parents, the variables in their children correspond to the variablesThe value of the quantity is consistent with both parents; for the case that the decision variables have different values, the selection probability p of the variables is calculatedn:
wherein ,pnRepresenting the probability that the decision variable takes the value of 1, M is the number of parents in the current iteration and when the gene n of the ith parent X is taken as the gene in PDNB, thenOtherwiseIf the random number rand < pnThe variable in the corresponding in child O is set to 1, otherwise to 0.
6. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 5, wherein: in the mutation operator, a new mutation strategy comprising a singular solution correction mutation strategy and an addition/deletion mutation strategy is introduced to process a multi-modal optimization problem.
7. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 6, wherein: in the singular solution correction strategy, firstly, it is necessary to know which genes are selected for the solutions and calculate the degrees of the genes, and the gene with the highest degree is taken as a candidate gene; then, calculating the degree of gene connection with the candidate gene; finally, selecting the gene with the minimum degree as the added gene, and setting the binary decision variable corresponding to the gene as 1; eventually, the degree of genes within the module is enhanced and the degree of genes within the module and genes outside the module are attenuated.
8. The method of claim 7 for detecting individual patient dynamics of cancerA method for multimodal optimization of network biomarkers, characterized by: in the strategy of addition/deletion mutation, | pPCC with a module of 1 gene can be added or subtractedoutWithout changing | pPCCin| and pDEin(ii) a Meanwhile, deletion of isolated genes in the module can also achieve the same effect as the above operation; furthermore, the add/delete mutation strategy can result in multi-modal solutions that are different modules but have the same objective function value; the specific operation is as follows: firstly, calculating the edge weight pPCC of a gene with the module degree of exteriorness of 1 and an isolated gene in the module; then, | pPCC for the current module is calculatedoutL, |; finally, the following two operations are performed with the same probability: one is to select a module capable of minimizing the module | pPCC after adding a gene having a degree of 1 to each moduleoutThe gene of | is set as 1 according to the binary variable corresponding to the gene; another alternative is to select a plasmid that minimizes the module | pPCC after deleting isolated genes within the module individuallyout| and sets its corresponding binary variable to 0.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210126121.5A CN114628031B (en) | 2022-02-10 | 2022-02-10 | Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210126121.5A CN114628031B (en) | 2022-02-10 | 2022-02-10 | Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114628031A true CN114628031A (en) | 2022-06-14 |
CN114628031B CN114628031B (en) | 2023-06-20 |
Family
ID=81898660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210126121.5A Active CN114628031B (en) | 2022-02-10 | 2022-02-10 | Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114628031B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103237901A (en) * | 2010-03-01 | 2013-08-07 | 卡里斯生命科学卢森堡控股有限责任公司 | Biomarkers for theranostics |
US9495515B1 (en) * | 2009-12-09 | 2016-11-15 | Veracyte, Inc. | Algorithms for disease diagnostics |
CN109493916A (en) * | 2018-06-29 | 2019-03-19 | 北京大学 | A kind of Gene-gene interactions recognition methods based on sparsity factorial analysis |
CN110444248A (en) * | 2019-07-22 | 2019-11-12 | 山东大学 | Cancer Biology molecular marker screening technique and system based on network topology parameters |
CN113256636A (en) * | 2021-07-15 | 2021-08-13 | 北京小蝇科技有限责任公司 | Bottom-up parasite species development stage and image pixel classification method |
-
2022
- 2022-02-10 CN CN202210126121.5A patent/CN114628031B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9495515B1 (en) * | 2009-12-09 | 2016-11-15 | Veracyte, Inc. | Algorithms for disease diagnostics |
CN103237901A (en) * | 2010-03-01 | 2013-08-07 | 卡里斯生命科学卢森堡控股有限责任公司 | Biomarkers for theranostics |
CN109493916A (en) * | 2018-06-29 | 2019-03-19 | 北京大学 | A kind of Gene-gene interactions recognition methods based on sparsity factorial analysis |
CN110444248A (en) * | 2019-07-22 | 2019-11-12 | 山东大学 | Cancer Biology molecular marker screening technique and system based on network topology parameters |
CN113256636A (en) * | 2021-07-15 | 2021-08-13 | 北京小蝇科技有限责任公司 | Bottom-up parasite species development stage and image pixel classification method |
Non-Patent Citations (1)
Title |
---|
WEI-FENG GUO 等: "A novel network control model for identifying personalized driver genes in cancer", PLOS COMPUTATIONAL BIOLOGY, pages 1 - 27 * |
Also Published As
Publication number | Publication date |
---|---|
CN114628031B (en) | 2023-06-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109872772B (en) | Method for excavating colorectal cancer radiotherapy specific genes by using weight gene co-expression network | |
CN111462823B (en) | Homologous recombination defect judgment method based on DNA sequencing data | |
CN112837753B (en) | MicroRNA-disease associated prediction method based on multi-mode stacking automatic coding machine | |
CN111368891B (en) | K-Means text classification method based on immune clone gray wolf optimization algorithm | |
US20220310199A1 (en) | Methods for identifying chromosomal spatial instability such as homologous repair deficiency in low coverage next- generation sequencing data | |
CN109448794B (en) | Genetic taboo and Bayesian network-based epistatic site mining method | |
CN110110753B (en) | Effective mixed characteristic selection method based on elite flower pollination algorithm and ReliefF | |
CN112466404A (en) | Unsupervised clustering method and unsupervised clustering system for metagenome contigs | |
CN113903395A (en) | BP neural network copy number variation detection method and system for improving particle swarm optimization | |
CN114974435A (en) | Cell similarity measurement method for unifying cell type and state characteristics | |
CN115019883A (en) | Cancer driver gene identification method based on multi-network graph convolution | |
Ramos et al. | An interpretable approach for lung cancer prediction and subtype classification using gene expression | |
CN109801681B (en) | SNP (Single nucleotide polymorphism) selection method based on improved fuzzy clustering algorithm | |
Zhang et al. | MaLAdapt reveals novel targets of adaptive introgression from Neanderthals and Denisovans in worldwide human populations | |
CN112259163B (en) | Cancer driving module identification method based on biological network and subcellular localization data | |
CN114628031A (en) | Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients | |
Shahweli et al. | In Silico Molecular Classification of Breast and Prostate Cancers using Back Propagation Neural Network | |
Ricatto et al. | Interpretable CNV-based tumour classification using fuzzy rule based classifiers | |
CN111755074B (en) | Method for predicting DNA replication origin in saccharomyces cerevisiae | |
CN114360642A (en) | Cancer transcriptome data processing method based on gene co-expression network analysis | |
fengao et al. | Exploring multi-omics latent embedding spaces for characterizing tumor heterogeneity and tumoral fitness effects | |
Madjar | Survival models with selection of genomic covariates in heterogeneous cancer studies | |
CN117292755A (en) | Multi-mode critical edge biomarker identification method | |
Zheng et al. | A structural variation genotyping algorithm enhanced by CNV quantitative transfer | |
Li et al. | A multi-source fusion method to identify biomarkers for breast cancer prognosis based on dual-layer heterogeneous network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |