CN114628031A - Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients - Google Patents

Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients Download PDF

Info

Publication number
CN114628031A
CN114628031A CN202210126121.5A CN202210126121A CN114628031A CN 114628031 A CN114628031 A CN 114628031A CN 202210126121 A CN202210126121 A CN 202210126121A CN 114628031 A CN114628031 A CN 114628031A
Authority
CN
China
Prior art keywords
genes
module
gene
modal
ppcc
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210126121.5A
Other languages
Chinese (zh)
Other versions
CN114628031B (en
Inventor
梁静
李宗玮
郭伟峰
程涵
岳彩通
乔康加
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University
Original Assignee
Zhengzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University filed Critical Zhengzhou University
Priority to CN202210126121.5A priority Critical patent/CN114628031B/en
Publication of CN114628031A publication Critical patent/CN114628031A/en
Application granted granted Critical
Publication of CN114628031B publication Critical patent/CN114628031B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/30Detection of binding sites or motifs
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/40Population genetics; Linkage disequilibrium
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Genetics & Genomics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Public Health (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Evolutionary Biology (AREA)
  • Biotechnology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Physiology (AREA)
  • Epidemiology (AREA)
  • Ecology (AREA)
  • Primary Health Care (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of cancer individual biomarker identification, in particular to a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients, which comprises the following steps: (1) constructing a personalized gene interaction network PGIN of the individual patient from genome data of the individual patient; (2) designing an optimization objective function; (3) and searching for personalized dynamic network marker sets PDNBs by utilizing a multi-modal evolutionary algorithm. The multi-modal optimization method can effectively identify the PDNB of an individual cancer patient, explore the multi-modal nature of network biomarkers, and provide a new visual angle for understanding tumor heterogeneity in precise medicine.

Description

Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients
Technical Field
The invention relates to the technical field of cancer individual biomarker identification, in particular to a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients.
Background
In recent years, with the progress of aging, industrialization and urbanization of the population of China being accelerated, along with a series of reasons such as unhealthy life style and environmental exposure, cancer has become one of the main causes of death of residents of China in recent years.
Currently, many studies use network biomarkers from individual cancer patients as a basis for disease diagnosis in individual patients and quantify the biomarker from patient samples to detect key states in the cancer progression process. Relevant research has proved that the multi-modal problem exists in the biomedical field widely, and the existence of the multi-modal network biomarker can provide more choices for decision makers.
With the development of network science, the wide application of single-sample network technology provides an effective clue for identifying personalized biomarkers on a single sample by using high-dimensional dynamic data of cancer individual patients. Recently, many scholars have proposed bioinformatics tools for personalized biomarker identification on single sample networks. According to the main features of these methods, they can be divided into two categories: (i) personalized network biomarker identification methods, such as network elasticity-based methods, network controllability-and observability-based methods. These methods take into account intergenic associations and network information to reliably predict personalized biomarkers, as compared to traditional methods of molecular biomarkers. However, these methods still fail to detect dynamic changes before normal and critical states, which may not identify early warning signals of critical states during cancer development; (ii) dynamic Network Biomarkers (DNB) identification method. The methods mainly comprise a personalized dynamic network biomarker method, and the personalized dynamic network biomarker can be identified by using genome data of an individual patient and is used for predicting an early warning signal before acute deterioration.
Notably, DNBs may have multimodal properties (i.e. multiple equivalent network biomarkers, called multimodal network biomarkers, with the same predictive performance and the same number of genes, but consisting of different genes or proteins). Therefore, it is necessary to explore the relationship between genes from multi-modal characteristics to elucidate the molecular mechanisms of complex biological phenomena. However, most of the existing methods ignore the existence of the multi-modal network biomarkers by clustering methods or selecting the molecular module with the highest score of the early warning signal as the network biomarkers.
Disclosure of Invention
The invention provides a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients, and personalized dynamic network biomarkers of cancer individual patients can be effectively identified through multi-modal optimization.
The multi-modal optimization method for detecting the dynamic network biomarkers of the cancer individual patients comprises the following steps:
(1) constructing a personalized gene interaction network PGIN from genome data of an individual patient;
(2) designing an optimization objective function;
(3) and (3) searching for a personalized dynamic network biomarker set PDNBs by utilizing a multi-modal evolution algorithm.
Preferably, in step (1), the method for constructing PGIN of individual cancer patients comprises:
firstly, calculating a P value of an edge between any two genes in a tumor sample and a normal sample, wherein the P value is obtained based on a Z fraction of a Pearson correlation coefficient difference of the two genes; pearson correlation coefficient Δ PCC of genes i and j on individual patient kij,kAnd its Z-score Zij,kThe following formula can be used for calculation:
Figure BDA0003500577170000021
wherein ,
Figure BDA0003500577170000022
are PCCs of genes i and j on n reference samples,
Figure BDA0003500577170000023
PCC on n +1 reference samples representing genes i and j of individual patient k; the P value may be based on Zij,kCalculating the standard normal distribution of (a); when the co-expression relationship of the two genes interaction is significant in the tumor sample network but not in the normal sample network, the margin between the two genes is preserved to constitute PGIN, and vice versa; furthermore, methods to score PGIN-margined personalized pPCC were calculated by integrating somatic mutation data across cancer type specific data into PGIN, as follows:
Figure BDA0003500577170000031
Figure BDA0003500577170000032
Figure BDA0003500577170000033
wherein co-mutation is indicated by co-mutation, co-expression is indicated by co-expression,
Figure BDA0003500577170000034
representing the personalized Pearson coefficient, Jaccard, of genes i and j in patient k samplesijRepresenting the Jacobian coefficients of genes i and j,
Figure BDA0003500577170000035
representing the pearson correlation coefficient of genes i and j in the tumor sample of individual patient k,
Figure BDA0003500577170000036
pearson correlation coefficient representation of genes i and j in Normal samples of individual patient k, T (i) and T (j) being the presence of mutant Gene i and Gene j, respectively, after examination of somatic mutations in a given cancer data setSet of tumors, D10Indicates that 10% of the data in D are sorted in ascending order10The following is a description.
Preferably, in step (2), the optimization objective function is:
Figure BDA0003500577170000037
Figure BDA0003500577170000038
wherein n represents the number of genes in PGIN, and X ═ X (X)1,x2,...,xn) Is a binary-valued decision variable, if gene i is selected in a module, xi1, otherwise xi=0;pDEin(X) represents the standard deviation of interphased pPCC within the PDNB module; i pPCCin(X) | represents the absolute value of the mean value of pccc between genes within the PDNB module; i pPCCout(X) | represents the absolute value of the mean value of pPCC between the intra-and extra-module genes of PDNB; the goal of the first objective function is to minimize the number of genes in the module and the goal of the second objective function is to maximize the module's early warning signal score, since the higher the module's early warning signal score, the more representative of the critical state of the individual patient with cancer.
Preferably, in step (3), the multi-modal evolutionary algorithm is as follows:
(3.1) randomly dividing PGIN into initial population P with population size N by improved population initialization algorithm0(ii) a A singular solution correction strategy is added to an original initialization strategy to improve the quality of an initial population;
(3.2) tournament selection strategy from Current population P Using decision space niche-basedtSelecting N/2 parents from (t is 0,1, 2.);
(3.3) generating N number of descendants Q by using crossover operatortMutation operator based on offspring QtGenerate a multi-modal solution, generate a new combined population Rt=Qt∪Pt
(3.4) selecting N solutions from the combined population as a population P according to an environment selection mechanismt+1As the population of the next iteration;
(3.5) outputting a plurality of PDNBs until the termination condition of the iteration, i.e. the maximum number of function evaluations is satisfied.
Preferably, in step (3.3), in the crossover operator, first, two parent generations X are randomly selected from the parent generationsi and XjFor variables with the same decision vector value in the two parents, the values of the corresponding variables in the filial generations are consistent with those of the two parents; for the condition that the values of decision variables are different, calculating the selection probability p of the variablesn
Figure BDA0003500577170000041
wherein ,pnRepresenting the probability that the decision variable takes the value of 1, M is the number of parents in the current iteration and when the gene n of the ith parent X is taken as the gene in PDNB, then
Figure BDA0003500577170000042
Otherwise
Figure BDA0003500577170000043
If random number rand < pnThe variable in the corresponding in child O is set to 1, otherwise to 0.
Preferably, the mutation operator introduces a new mutation strategy comprising a singular solution modification mutation strategy and an addition/deletion mutation strategy to process the multi-modal optimization problem.
Preferably, in the singular solution correction strategy, it is first necessary to know which genes have been selected for the solutions and calculate degrees of the genes, and the gene with the highest degree is used as a candidate gene; then, calculating the degree of gene connection with the candidate gene; finally, selecting the gene with the minimum degree as the added gene, and setting the binary decision variable corresponding to the gene as 1; eventually, the degree of genes within the module is enhanced and the degree of genes within the module to genes outside the module is attenuated.
8. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 7, wherein: in the strategy of addition/deletion mutation, | pPCC with a module of 1 gene can be added or subtractedoutWithout changing | pPCCin| and pDEin(ii) a Meanwhile, deletion of isolated genes in the module can also achieve the same effect as the above operation; furthermore, the add/delete mutation strategy can result in multi-modal solutions that are different modules but have the same objective function value; the specific operation is as follows: firstly, calculating the edge weight pPCC of a gene with the module degree of externality of 1 and an isolated gene in the module; then, | pPCC for the current module is calculatedoutL, |; finally, the following two operations are performed with the same probability: one is to select a module capable of minimizing the module | pPCC after adding a gene having a degree of 1 to each moduleoutThe gene of | is set as 1 according to the binary variable corresponding to the gene; another alternative is to select a plasmid that minimizes the module | pPCC after deleting isolated genes within the module individuallyoutI and sets its corresponding binary variable to 0.
The multi-modal optimization of the invention can effectively identify the personalized dynamic network biomarkers of cancer individual patients, explore the multi-modal behavior of the network biomarkers and provide a new visual angle for understanding tumor heterogeneity in precise medicine.
Drawings
FIG. 1 is a flow chart of a multi-modal optimization method for detecting dynamic network biomarkers of cancer patients in example 1;
FIG. 2 is an explanatory diagram of the crossover operator in example 1;
FIG. 3 is a diagram illustrating a singular solution correction strategy in a transform operator in example 1;
FIG. 4 is a diagram illustrating an addition/deletion mutation strategy in mutation operators in example 1;
FIG. 5(a) is a graph showing the degree of cancer driver gene enrichment for PDNB with the greatest early warning score in example 1;
FIG. 5(b) is a graph showing the degree of cancer driver gene enrichment of all PDNBs in individual patients in example 1;
FIG. 6 is a diagram illustrating multi-modal comparison of the four evolutionary algorithms in the example.
Detailed Description
For a further understanding of the invention, reference should be made to the following detailed description taken in conjunction with the accompanying drawings and examples. It is to be understood that the examples are illustrative of the invention and not restrictive.
Example 1
This embodiment provides a Multi-Modal Personalized Dynamic Network marker recognition model (MMPDNB) with input of gene expression data of a matched sample of individual cancer patients and output of PDNBs of individual patients. The described MMPDNB consists essentially of two parts, one to construct PGIN from genomic data of individual patients; and designing and identifying an optimization objective function and a multi-modal evolution strategy of the PDNB.
As shown in FIG. 1, the present embodiment provides a multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients, which comprises the following steps:
step 1: construction of PGIN for Individual cancer patients
Paired samples of individual cancer patients (i.e., normal and tumor samples from the same patient) were filtered and obtained from the TCGA data portal. The PGIN of individual cancer patients was constructed using a paired single sample network, following the following principles. First, the P value of the edge between any two genes in the tumor sample and the normal sample is calculated, and the P value can be obtained based on the Z fraction of the Pearson correlation coefficient difference of the two genes. Pearson correlation coefficient Δ PCC for genes i and j on sample kij,kAnd its Z fraction Zij,kThe following formula can be used for calculation:
Figure BDA0003500577170000061
wherein ,
Figure BDA0003500577170000062
are PCCs of genes i and j on n reference samples,
Figure BDA0003500577170000063
PCC on n +1 reference samples representing genes i and j of individual patient k. The P value may be based on
Figure BDA0003500577170000064
Is calculated from the standard normal distribution of (a). When the co-expression relationship of two gene interactions is significant in the tumor sample network (P value)<0.05) but not significant in the normal sample network (P value)>0.05), the border between the two genes was preserved to constitute the PGIN, and vice versa. Furthermore, by integrating somatic mutation data across cancer type specific data into PGINs, a method to score PGIN-margined personalized pcc (ppcc) was calculated as follows:
Figure BDA0003500577170000071
Figure BDA0003500577170000072
Figure BDA0003500577170000073
wherein co-mutation is indicated by co-mutation, co-expression is indicated by co-expression,
Figure BDA0003500577170000074
representing the personalized Pearson coefficient, Jaccard, of genes i and j in patient k samplesijRepresenting the Jacard coefficients of genes i and j,
Figure BDA0003500577170000075
representing the pearson correlation coefficient of genes i and j in the tumor sample of individual patient k,
Figure BDA0003500577170000076
denotes the Pearson's correlation coefficient for genes i and j in normal samples of individual patient k, T (i) and T (j) are the set of tumors in which mutant gene i and gene j are present after examination of somatic mutations in a given cancer data set, respectively, D10Indicates that 10% of the data in D are sorted in ascending order10The following is a description.
Thus, rather than simply differentiating the differences in gene expression between normal and tumor samples, the pPCC of an individual patient can reflect the association between genes in the dynamic process of cancer progression by taking into account somatic mutation data.
Step 2: designing an optimization objective function
It was found that the formation of PDNB, which can serve as an early warning signal for cancer, does not require a large number of genes. Thus, MMPDNB expects PDNB with fewer genes and still accurately predict key states in the development of cancer. The MMPDNB identifies the PDNB by two objective functions:
Figure BDA0003500577170000077
Figure BDA0003500577170000081
wherein n represents the number of genes in PGIN. Solving X ═ X1,x2,...,xn) Is a binary-valued decision variable, if gene i is selected in a module, xi1, otherwise xi=0。pDEin(X) represents the standard deviation of interphased pPCC within the PDNB module; i pPCCin(X) | represents the absolute value of the mean value of pccc between genes within the PDNB module; i pPCCout(X) | represents the absolute value of the mean value of pPCC between the intra-and extra-module genes of the PDNB module. In short, the goal of the first objective function is to minimize the number of genes in the module. The goal of the second objective function is to maximize the module's early warning signal score because the module's early warning signal scoreThe higher the number of patients with cancer, the more critical the number of patients with cancer.
And step 3: finding personalized dynamic network biomarker set PDNBs by utilizing multi-modal evolution algorithm
Based on an NSGA-II (non-normalized partitioning genetic algorithm, NSGA-II) framework, a new population generation mechanism, a crossover operator and a mutation operator are introduced, and a new multi-modal evolution algorithm is provided. The process is as follows: firstly, PGIN is randomly divided into an initial population P with the population size N through an improved population initialization algorithm0. Adding a singular solution correction strategy to the original initialization strategy to improve the quality of the initial population, and then utilizing a tournament selection strategy based on decision space niches to select the current population PtN/2 parents were selected from (t ═ 0,1, 2.). Then using crossover operator to produce N number of filial generation QtThe mutation operator is based on the offspring QtA multi-modal solution is generated. Then a new combined population Rt=Qt∪PtGenerating, selecting N solutions from the combined population as a population P according to an environment selection mechanismt+1As the population for the next iteration. Finally, the MMPDNB outputs a plurality of PDNBs until the termination condition of the iteration (i.e., the maximum number of function evaluations is satisfied). The proposed MMPDNB framework is given in algorithm 1.
Algorithm 1 framework of MMPDNB
Inputting: i isD(data of individual cancer patients), CD(cancerpatient Data),
N (population size), T (end conditions)
And (3) outputting: multiple PDNB for individual patients
1 PGIN ← paired single sample method (I)D,CD) (ii) a Establishing PGIN for individual patients
2P ← population initializer (PGIN, N); v/initial population of size N generated by initialization operator based on singular solution correction strategy
3F ← function evaluation (P, PGIN); // two objective functions of the calculated solution
4 while (T)// not satisfying the termination condition, then loop
5 Pa ← tournament selection policy (P, F, N/2); v/generating N/2 solutions as parents through a tournament selection strategy based on decision space niches;
6Q ← crossover operator (Pa, N); // generating N generations
7 MMQ ← add/delete variation policy (Q, N); // generating children comprising multimodal solutions based on child Q
8F ← function evaluation (MMQ, PGIN);
9[ MMQ, F ] ← singular solution modification variation strategy (MMQ, F); v/correction of singular solutions in MMQ
10R ← P ═ MMQ; // generating a pooled population of size 2N
11 Pt+1Axle context selection operator (R, F, N); v/select N solution as population for next iteration
12 End
12 individual patients' multiple PDNB ← pareto optimal PDNB in the population.
13, returning: multiple PDNB for individual patients
And (3) a crossover operator: for binary decision variable optimization problems, existing crossover operators select decision variables by comparing scores of any two decision variables. However, the PDNB module is a set of decision variables, and thus assigning scores to individual decision variables is not trivial. In order to solve the problem, a new cross operator is adopted in the MMPDNB, and the concept of comprehensive learning is introduced, so that the multi-modal problem can be effectively solved (fig. 2). Firstly, two parents X are randomly selected from the parentsi and XjFor variables with the same decision vector value in the two parents, the values of the corresponding variables in the filial generations are consistent with those of the two parents; for the case that the decision variables have different values, the selection probability p of the variables is calculatedn
Figure BDA0003500577170000101
wherein ,pnRepresenting the probability of the decision variable taking a value of 1. M is the number of parents in the current iteration and when the gene n of the ith parent X doesIs a gene in PDNB, then
Figure BDA0003500577170000102
Otherwise
Figure BDA0003500577170000103
If the random number rand < pnThe variable in the corresponding in child O is set to 1, otherwise to 0.
Mutation operator: it should be noted that most of the existing mutation operators do not make a specific strategy to identify the PDNB, and it is difficult to generate a multi-modal solution to the problem. To overcome this difficulty, a new mutation strategy (fig. 3 and 4) including a singular solution of a modified mutation strategy and an add/delete mutation strategy is introduced into the algorithm to handle the multi-modal optimization problem (equations (5) - (6)).
Singular solution correction strategy: there may be some singular solutions when generating the offspring whose second objective function value is not present or equal to 0. These solutions (e.g., f)1=2,f20) remains in the population all the time under the action of the selection mechanism of the environmental selection strategy, since it is a non-dominant solution. They therefore reduce the diversity of the population, which may lead to the population falling into a local optimum. In this regard, a special mutation strategy will be implemented for these solutions to optimize the value of the second objective function. First, it is necessary to know which genes are selected for these solutions and calculate the degrees of these genes, and the gene with the greatest degree is taken as a candidate gene. A greater degree of a gene means that it interacts with more genes, which provides more opportunities to select the appropriate gene to form the PDNB module. Therefore, the gene with the greatest degree is selected as a candidate gene (e.g., fig. 3). Thereafter, the degree of gene ligation to the candidate gene was calculated. Finally, the least aggressive gene is selected as the added gene (e.g., fig. 3) and its corresponding binary decision variable is set to 1. Finally, the degree of genes in the module is strengthened, and the degree of genes in the module and genes outside the module is weakened, so that the PDNB module is more consistent with the properties of the PDNB module.
Addition/deletion mutation strategy: addition of 1 Gene to a Module can increase or decrease the Module|pPCCoutWithout changing | pPCCin| and pDEin. Also, deletion of isolated genes in the module (i.e., genes not linked to other genes in the module) can achieve the same effect as described above. Furthermore, the add/delete variation strategy has the opportunity to obtain multi-modal solutions that are different modules but have the same objective function value. The specific operation is as follows: first, the edge weight (pPCC) of the gene with a module degree of exology of 1 and the isolated gene within the module was calculated. Then, | pPCC for the current module is calculatedoutL. Finally, the following two operations are performed with the same probability. One is to select a module capable of minimizing the module | pPCC after adding a gene having a degree of 1 to each moduleoutI (e.g., g1 and g2 in fig. 4) and sets their corresponding binary variables to 1. Another alternative is to select a plasmid that minimizes the module | pPCC after deleting isolated genes within the module individuallyoutI (e.g., g3 and g4 in fig. 4), and sets their corresponding binary variables to 0.
This example will demonstrate the proposed MMPDNB method on three cancer data. The data were 112 Breast Invasive carcinomas (BRCA), 49 lung squamous carcinomas (lucc), and 57 lung adenocarcinomas (LUAD), respectively.
Experiment of
Parameter setting
And (3) population setting: the algorithm population size was set to 300 according to the scale of the problem. Independent runs and termination conditions to reduce the chance of results, MMPDNB runs independently 30 times on each patient PGIN and the results of the 30 independent runs are pooled and the Pareto optimal solution of the pool is output. The termination criteria were the number of objective function evaluations, set to 30,000 per independent run.
Comparison algorithm
9 algorithms are selected to be compared with the MMPDNB, and the algorithms are divided into 3 groups according to the characteristics of the algorithms. A multi-modal evolutionary algorithm: DN-NSGA-II, MO _ Ring _ PSO _ SCD and MP-MMEA. NDB algorithm: Cluster-DNB, L-DNB. The Clustering-DNB algorithm divides the PGIN into a plurality of sub-networks by hierarchical Clustering, calculates the early warning score of each sub-network, and selects the sub-network with the highest score as the PDNB. The L-DNB method scores each gene by calculating the early warning signal score of a network formed by one gene and a first-order connection matrix thereof. All genes were ranked from high to low by score and the top 20 genes were selected as PDNB for individual patients. And (3) network control algorithm: the network control algorithm utilizes the controllability and objectivity of the network to find a group of nodes with the minimum number, and the nodes can control or reconstruct the state of the whole network.
Results of the experiment
The MMPDNB can effectively detect early warning signals of cancer development
PDNB mean score peaks for BRCA, lucc and LUAD were detected at stage IIIB, IIB and IB and were considered early warning signs of the development of three cancers. Compared with other comparison algorithms, the MMPDNB not only can more effectively detect the early warning signal of cancer development, but also has the advantage that the early warning signal score of the PDNB at a key stage is remarkably different from that of other stages.
MMPDNB and MP-MMEA detect early warning signals better than other multi-modal evolutionary algorithms, however, for MP-MMEA, although the key stages of the three cancer datasets are similar to MMPDNB, the average score is much lower than MMPDNB.
Compared with the Clustering-DNB method and the L-DNB algorithm, the MMPDNB with the multi-modal optimization capability can obtain a larger PDNB early warning signal score, which shows that the MMPDNB can more effectively detect the early warning signals of 3 cancer patients.
In the network controllability method result, the average scores of different stages have no obvious difference, and the key stage cannot be effectively detected through the early warning signal.
MMPDNB can effectively identify cancer driver genes
Since the output of MMPDNB is multiple PDNBs for individual patients (i.e., Pareto optimal solution), the validity of MMPDNB to identify oncogenic driver genes was verified from two aspects (fig. 5(a), 5 (b)). One is to find the PDNB with the highest score of early warning signals among the PDNBs for each patient and with a F-score and then calculate the average F-score of these PDNBs. The other is to calculate the average of all PDNBs with F-score separately. It should be noted that the same processing method is also applied to other multi-modal evolutionary algorithms. Fig. 5(a) and 5(b) demonstrate that MMPDNB can have a higher degree of cancer driver enrichment on BRCA, lucc and LUAD cancer datasets.
MMPDNB can effectively identify multi-modal PDNB
Figure 6 plots MMPDNB and the other three multi-modal evolutionary algorithms identify the number of individual patients with multi-modal PDNB on the three cancer datasets. As shown in fig. 6, MMPDNB found 69, 41 and 47 individual patients with multi-modal PDNB on BRCA, lucc and LUAD datasets, respectively. It can be concluded that MMPDNB has significant advantages over other multi-modal evolutionary algorithms in detecting multi-modal PDNBs.
The multi-modal PDNB identified by MMPDNB presents drug targets.
It is noted that the significance of multimodal PDNBs is that their distinct genes have different biological functions. The differential genes of multimodal PDNBs are defined as genes that one of the multimodal PDNBs does not have and the other multimodal PDNBs do. By analyzing the drug information of the differential genes of the multi-modal PDNBs, whether the multi-modal PDNBs of the MMPDNB can provide important drug targets for early treatment of cancers is evaluated. By searching the information of drug target points from the iGMDR database, experiments show that the differential genes of the PDNB belonging to the multiple modes of early BRCA patients contain 14 drug target points, and related drugs aiming at BRCA can act on the target point genes. For example, AKT2 is a recognized oncogene target for which Temsirolimus drugs, which are clinically sensitive in BRCA patients, can act, as shown in table 1. The results relating to LUSC and LUAD are shown in tables 2-3.
TABLE 1 drug target genes and effective drugs obtained in early stage of BRCA in MMPDNB model
Figure BDA0003500577170000131
TABLE 2 drug target genes and effective drugs obtained in early LUSC by MMPDNB model
Figure BDA0003500577170000141
TABLE 3 drug target genes and effective drugs obtained in early LUAD by MMPDNB model
Figure BDA0003500577170000151
Conclusion
All experimental results show that the multi-modal optimization can effectively identify personalized dynamic network biomarkers of cancer individual patients, explore the multi-modal behavior of the network biomarkers, and the MMPDNB can provide a new perspective for understanding tumor heterogeneity in precise medicine.
The present invention and its embodiments have been described above schematically, and the description is not intended to be limiting, and what is shown in the drawings is only one of the embodiments of the present invention, and the actual structure is not limited thereto. Therefore, without departing from the spirit of the present invention, a person of ordinary skill in the art should understand that the present invention shall not be limited to the embodiments and the similar structural modes without creative design.

Claims (8)

1. A multimodal optimization method for detecting dynamic network biomarkers in cancer individual patients, characterized by: the method comprises the following steps:
(1) constructing a personalized gene interaction network PGIN from genome data of an individual patient;
(2) designing an optimization objective function;
(3) and (3) searching for a personalized dynamic network biomarker set PDNBs by utilizing a multi-modal evolution algorithm.
2. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 1, wherein: in step (1), the method for constructing PGIN of individual cancer patients comprises:
firstly, calculating a P value of an edge between any two genes in a tumor sample and a normal sample, wherein the P value is obtained based on a Z fraction of a Pearson correlation coefficient difference of the two genes; pearson correlation coefficient Δ PCC of genes i and j on individual patient kij,kAnd its Z fraction Zij,kThe following formula can be used for calculation:
Figure FDA0003500577160000011
wherein ,
Figure FDA0003500577160000012
are PCCs of genes i and j on n reference samples,
Figure FDA0003500577160000013
PCC on n +1 reference samples representing genes i and j of individual patient k; the P value may be based on Zij,kCalculating the standard normal distribution of (a); when the co-expression relationship of the two genes interaction is significant in the tumor sample network but not in the normal sample network, the margin between the two genes is preserved to constitute PGIN, and vice versa; furthermore, methods to score PGIN-margined personalized pPCC were calculated by integrating somatic mutation data across cancer type specific data into PGIN, as follows:
Figure FDA0003500577160000014
Figure FDA0003500577160000015
Figure FDA0003500577160000016
wherein co-mutation is indicated by co-mutation, co-expression is indicated by co-expression,
Figure FDA0003500577160000021
representing the personalized Pearson coefficient, Jaccard, of genes i and j in patient k samplesijRepresenting the Jacard coefficients of genes i and j,
Figure FDA0003500577160000022
representing the pearson correlation coefficient of genes i and j in the tumor sample of individual patient k,
Figure FDA0003500577160000023
denotes the Pearson's correlation coefficient for genes i and j in normal samples of individual patient k, T (i) and T (j) are the set of tumors in which mutant gene i and gene j are present after examination of somatic mutations in a given cancer data set, respectively, D10Indicates that 10% of the data in D are sorted in ascending order10The following is a description.
3. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 2, wherein: in the step (2), the optimization objective function is as follows:
Figure FDA0003500577160000024
Figure FDA0003500577160000025
wherein n represents the number of genes in PGIN, and X ═ X (X)1,x2,...,xn) Is a binary-valued decision variable, if gene i is selected in a module, xi1, otherwise xi=0;pDEin(X) represents the standard deviation of interphased pPCC within the PDNB module; i pPCCin(X) | represents the absolute value of the mean value of pccc between genes within the PDNB module; i pPCCout(X) | represents the absolute value of the mean value of pPCC between the intra-and extra-module genes of PDNB; the goal of the first objective function is to minimize the number of genes in the module and the goal of the second objective function is to maximize the module's early warning signal score, since the higher the module's early warning signal score, the more representative of the critical state of the cancer individual patient.
4. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 3, wherein: in the step (3), the multi-modal evolutionary algorithm is as follows:
(3.1) randomly dividing PGIN into initial population P with population size N by improved population initialization algorithm0(ii) a A singular solution correction strategy is added to an original initialization strategy to improve the quality of an initial population;
(3.2) tournament selection strategy from Current population P Using decision space niche-basedtSelecting N/2 parents from (t-0, 1, 2.);
(3.3) generating N number of descendants Q by using crossover operatortThe mutation operator is based on the offspring QtGenerate a multi-modal solution, generate a new combined population Rt=Qt∪Pt
(3.4) selecting N solutions from the combined population as a population P according to an environment selection mechanismt+1As the population for the next iteration;
(3.5) outputting a plurality of PDNBs until the termination condition of the iteration, i.e. the maximum number of function evaluations is satisfied.
5. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 4, wherein: in the step (3.3), in the crossover operator, firstly, two parent generations X are randomly selected from the parent generationsi and XjFor variables with the same decision vector value in the two parents, the variables in their children correspond to the variablesThe value of the quantity is consistent with both parents; for the case that the decision variables have different values, the selection probability p of the variables is calculatedn
Figure FDA0003500577160000031
wherein ,pnRepresenting the probability that the decision variable takes the value of 1, M is the number of parents in the current iteration and when the gene n of the ith parent X is taken as the gene in PDNB, then
Figure FDA0003500577160000032
Otherwise
Figure FDA0003500577160000033
If the random number rand < pnThe variable in the corresponding in child O is set to 1, otherwise to 0.
6. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 5, wherein: in the mutation operator, a new mutation strategy comprising a singular solution correction mutation strategy and an addition/deletion mutation strategy is introduced to process a multi-modal optimization problem.
7. The multimodal optimization method for detecting dynamic network biomarkers in cancer patients according to claim 6, wherein: in the singular solution correction strategy, firstly, it is necessary to know which genes are selected for the solutions and calculate the degrees of the genes, and the gene with the highest degree is taken as a candidate gene; then, calculating the degree of gene connection with the candidate gene; finally, selecting the gene with the minimum degree as the added gene, and setting the binary decision variable corresponding to the gene as 1; eventually, the degree of genes within the module is enhanced and the degree of genes within the module and genes outside the module are attenuated.
8. The method of claim 7 for detecting individual patient dynamics of cancerA method for multimodal optimization of network biomarkers, characterized by: in the strategy of addition/deletion mutation, | pPCC with a module of 1 gene can be added or subtractedoutWithout changing | pPCCin| and pDEin(ii) a Meanwhile, deletion of isolated genes in the module can also achieve the same effect as the above operation; furthermore, the add/delete mutation strategy can result in multi-modal solutions that are different modules but have the same objective function value; the specific operation is as follows: firstly, calculating the edge weight pPCC of a gene with the module degree of exteriorness of 1 and an isolated gene in the module; then, | pPCC for the current module is calculatedoutL, |; finally, the following two operations are performed with the same probability: one is to select a module capable of minimizing the module | pPCC after adding a gene having a degree of 1 to each moduleoutThe gene of | is set as 1 according to the binary variable corresponding to the gene; another alternative is to select a plasmid that minimizes the module | pPCC after deleting isolated genes within the module individuallyout| and sets its corresponding binary variable to 0.
CN202210126121.5A 2022-02-10 2022-02-10 Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients Active CN114628031B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210126121.5A CN114628031B (en) 2022-02-10 2022-02-10 Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210126121.5A CN114628031B (en) 2022-02-10 2022-02-10 Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients

Publications (2)

Publication Number Publication Date
CN114628031A true CN114628031A (en) 2022-06-14
CN114628031B CN114628031B (en) 2023-06-20

Family

ID=81898660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210126121.5A Active CN114628031B (en) 2022-02-10 2022-02-10 Multi-mode optimization method for detecting dynamic network biomarkers of cancer individual patients

Country Status (1)

Country Link
CN (1) CN114628031B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103237901A (en) * 2010-03-01 2013-08-07 卡里斯生命科学卢森堡控股有限责任公司 Biomarkers for theranostics
US9495515B1 (en) * 2009-12-09 2016-11-15 Veracyte, Inc. Algorithms for disease diagnostics
CN109493916A (en) * 2018-06-29 2019-03-19 北京大学 A kind of Gene-gene interactions recognition methods based on sparsity factorial analysis
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN113256636A (en) * 2021-07-15 2021-08-13 北京小蝇科技有限责任公司 Bottom-up parasite species development stage and image pixel classification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9495515B1 (en) * 2009-12-09 2016-11-15 Veracyte, Inc. Algorithms for disease diagnostics
CN103237901A (en) * 2010-03-01 2013-08-07 卡里斯生命科学卢森堡控股有限责任公司 Biomarkers for theranostics
CN109493916A (en) * 2018-06-29 2019-03-19 北京大学 A kind of Gene-gene interactions recognition methods based on sparsity factorial analysis
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN113256636A (en) * 2021-07-15 2021-08-13 北京小蝇科技有限责任公司 Bottom-up parasite species development stage and image pixel classification method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WEI-FENG GUO 等: "A novel network control model for identifying personalized driver genes in cancer", PLOS COMPUTATIONAL BIOLOGY, pages 1 - 27 *

Also Published As

Publication number Publication date
CN114628031B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
CN109872772B (en) Method for excavating colorectal cancer radiotherapy specific genes by using weight gene co-expression network
CN111462823B (en) Homologous recombination defect judgment method based on DNA sequencing data
CN112837753B (en) MicroRNA-disease associated prediction method based on multi-mode stacking automatic coding machine
CN111368891B (en) K-Means text classification method based on immune clone gray wolf optimization algorithm
US20220310199A1 (en) Methods for identifying chromosomal spatial instability such as homologous repair deficiency in low coverage next- generation sequencing data
CN109448794B (en) Genetic taboo and Bayesian network-based epistatic site mining method
CN110110753B (en) Effective mixed characteristic selection method based on elite flower pollination algorithm and ReliefF
CN112466404A (en) Unsupervised clustering method and unsupervised clustering system for metagenome contigs
CN113903395A (en) BP neural network copy number variation detection method and system for improving particle swarm optimization
CN114974435A (en) Cell similarity measurement method for unifying cell type and state characteristics
CN115019883A (en) Cancer driver gene identification method based on multi-network graph convolution
Ramos et al. An interpretable approach for lung cancer prediction and subtype classification using gene expression
CN109801681B (en) SNP (Single nucleotide polymorphism) selection method based on improved fuzzy clustering algorithm
Zhang et al. MaLAdapt reveals novel targets of adaptive introgression from Neanderthals and Denisovans in worldwide human populations
CN112259163B (en) Cancer driving module identification method based on biological network and subcellular localization data
CN114628031A (en) Multi-modal optimization method for detecting dynamic network biomarkers of cancer individual patients
Shahweli et al. In Silico Molecular Classification of Breast and Prostate Cancers using Back Propagation Neural Network
Ricatto et al. Interpretable CNV-based tumour classification using fuzzy rule based classifiers
CN111755074B (en) Method for predicting DNA replication origin in saccharomyces cerevisiae
CN114360642A (en) Cancer transcriptome data processing method based on gene co-expression network analysis
fengao et al. Exploring multi-omics latent embedding spaces for characterizing tumor heterogeneity and tumoral fitness effects
Madjar Survival models with selection of genomic covariates in heterogeneous cancer studies
CN117292755A (en) Multi-mode critical edge biomarker identification method
Zheng et al. A structural variation genotyping algorithm enhanced by CNV quantitative transfer
Li et al. A multi-source fusion method to identify biomarkers for breast cancer prognosis based on dual-layer heterogeneous network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant