CN115620808B - Cancer gene prognosis screening method and system based on improved Cox model - Google Patents
Cancer gene prognosis screening method and system based on improved Cox model Download PDFInfo
- Publication number
- CN115620808B CN115620808B CN202211631423.4A CN202211631423A CN115620808B CN 115620808 B CN115620808 B CN 115620808B CN 202211631423 A CN202211631423 A CN 202211631423A CN 115620808 B CN115620808 B CN 115620808B
- Authority
- CN
- China
- Prior art keywords
- matrix
- cox
- patient
- regression
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B35/00—ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
- G16B35/20—Screening of libraries
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
- G16B40/20—Supervised data analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Theoretical Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Medical Informatics (AREA)
- Evolutionary Biology (AREA)
- General Physics & Mathematics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Optimization (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Library & Information Science (AREA)
- Mathematical Physics (AREA)
- Pure & Applied Mathematics (AREA)
- Mathematical Analysis (AREA)
- Chemical & Material Sciences (AREA)
- Computational Mathematics (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Public Health (AREA)
- Epidemiology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Genetics & Genomics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Operations Research (AREA)
- Probability & Statistics with Applications (AREA)
- Analytical Chemistry (AREA)
- Algebra (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a cancer gene prognosis screening method and a cancer gene prognosis screening system based on an improved Cox model, which comprises the following steps: s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix; s2, inputting the survival data and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk; and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory. Compared with the traditional technology, the accuracy of regression is improved in the regression part through the addition of prior and the automatic updating of parameters, and guidance information is provided for predicting prognosis, recurrence and metastasis.
Description
Technical Field
The invention relates to the technical field of survival analysis Cox model regression, in particular to a cancer gene prognosis screening method and system based on an improved Cox model.
Background
With the rise and development of DNA microarray technology, the technology can monitor the expression level of thousands of genes simultaneously to study the effect of certain treatments, diseases and developmental stages on gene expression. Commonly used scenarios are: detecting the gene expression of cancer cells of a plurality of cancer patients, obtaining the survival data of the patients through follow-up, finally carrying out statistical analysis on the collected data by using a survival analysis means, and finally screening genes relevant to prognosis. The research on the relation between the prognostic gene and the tumor can provide information for predicting prognosis, recurrence, metastasis and even guiding treatment, and the final purpose is to provide help for individualized treatment of patients and further provide breakthrough for the treatment of cancer.
The collected survival data and gene expression quantity need to be subjected to systematic survival analysis, more than ten key prognostic genes are screened from tens of thousands of genes, the step is an indispensable loop in the whole prognostic analysis, and the risk of cancer patients can be evaluated through a gene set consisting of the more than ten genes, so that more treatment information is provided.
Among them, the Cox regression model is widely used in medical follow-up studies, and is the multifactorial analysis method most frequently used in survival analysis so far. The regression model is a semi-parameter model based on covariate linear combination, takes the survival outcome and the survival time as dependent variables, can simultaneously analyze the influence of a plurality of factors on the survival time, can analyze the data with the truncated survival time, does not require to estimate the survival distribution type of the data, has excellent properties, and has great importance in the screening of cancer prognosis genes.
It is shown from the open literature that the most commonly used solution in the Cox regression model is through coordinate descent, proposed by Noah Simon et al, and follows a regularization path using a hot start: (Norm sum>Norm as a penalty term) is fitted. However, the penalty coefficient is determined through cross validation, which makes the penalty coefficient not be solved accurately automatically, and since the fitting is calculated through an optimization method, the fitting is a point estimation, posterior distribution cannot be obtained, and prior parameters are automatically solved (i.e. the penalty coefficient) by combining an Expectation-maximization algorithm (Expectation-maximization), which makes the prognostic genes finally screened by the algorithm not be well associated with cancers.
Among them, cox regression is a survival analysis method, which is a loop in prognostic gene screening and plays an important role. The implication of the regression coefficients solved by the Cox regression model is to weight the risk of each corresponding gene, and only if the regression coefficients are accurate, the subsequent risk calculation for each patient will be accurate. Therefore, a method for solving the Cox regression model with higher accuracy is required.
To this end, in combination with the above needs and deficiencies of the prior art, the present application proposes a method and system for cancer gene prognosis screening based on an improved Cox model.
Disclosure of Invention
The invention provides a cancer gene prognosis screening method and system based on an improved Cox model, which improve regression precision in a regression part through prior addition and automatic updating of parameters, screen out corresponding genes with large absolute values in regression coefficients as prognosis genes, and provide information for subsequent prediction prognosis, relapse, transfer and even guide treatment.
The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:
the first aspect of the invention provides a cancer gene prognosis screening method based on an improved Cox model, which comprises the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrixFor the first matrix>Preprocessing is carried out to obtain a second matrix +>。
S2, survival data obtained in the step S1 and a second matrixXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Wherein, in the first matrixWherein the rows of the matrix represent patient information and the columns of the matrix represent gene segments of cancer cells; first matrix +>Indicates the expression level of the gene of the corresponding column in the patient of the corresponding row.
Wherein, the survival data comprises: covariate or secondary matrixXTime to live y and erasure index c.
The genes corresponding to the components with larger absolute values in the regression coefficients have larger influence on the survival time of the patient, and the prognostic gene set corresponding to high patient risk can be screened out by evaluating the regression coefficients.
The pretreatment process in the step S1 specifically comprises the following steps: removing irrelevant genes by biological information statistical means to obtain a second matrix with less columns。
Further, in step S2, first, a third matrix formed by combining the raw data and the second matrix is input into the preset Cox regression model; wherein the third matrix is denoted as [ X, y, c ]]Wherein X represents a covariate matrix, i.e. a second matrix, y represents the time-to-live, and c represents the erasure index; wherein the first stepiSurvival data for individual patients is。
Further, the firstiThe risk function for each of said patients is specifically:
whereinIs a shared benchmark risk function; />Obtaining a regression coefficient for solving the Cox regression model; />Is shown asiGene expression levels of individual patients.
Wherein the regression coefficient is fitted by regression using Cox regression modelWe can then base on the gene expression level of the patient->To assess patient risk, and regression coefficients>The larger absolute value of the components has a larger influence on the survival time of the patient, and the genes corresponding to the components are the prognostic gene set to be screened out.
Further, the step S2 of solving the Cox regression model to obtain the regression coefficient specifically includes the following steps:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters.
And S22, projecting the high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating and solving the model, and outputting a regression coefficient and an approximate posterior probability.
And S23, inputting the regression coefficient and the approximate posterior probability into an expectation maximization algorithm, and updating the prior parameter.
S24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not reached, the process returns to step S22 to perform the next iteration.
Wherein the third matrix is [ X, y, c ], X represents a covariate matrix, y represents the survival time, and c represents the deletion index.
The method comprises the steps of solving the problem of regression coefficient estimation by means of a complete Bayesian analysis method, converting maximum likelihood estimation with penalty terms into minimum mean square error estimation of Bayesian angles, adopting a factor graph as a tool, calculating messages transmitted among nodes by a message transmission method based on expected propagation, and acquiring approximate posterior probability of the regression coefficient, wherein the approximate posterior probability is substantially the probability distribution obeyed by the approximation deduction of the regression coefficient.
Further, the prior parameters include: mean valueVariance->And a sparseness ratio>(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting a Cox partial likelihood function to initialize the prior parameter and the message transfer function.
Wherein, the prior parameter and the regression coefficient both obey Gaussian-Bernoulli distribution and have sparsity.
The projection operation of the likelihood function nodes is simplified approximately by adopting a Laplace method and a moment generating function, so that the complex calculation is simplified, and a more accurate regression coefficient is solved under the condition of less loss.
Further, the normalizing the covariate matrix X specifically includes:
wherein mean (a)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix.
The Cox partial likelihood function is specifically:
wherein the content of the first and second substances,indicates that the function is->Is transferred to->For representing ≥ a transition probability>AboutIs normalized; />The partial likelihood function of Cox is not normalized and represents a direct proportion relation; the function is based on>Is a variable, the firstiElement/element->,/>Is->To (1) aiAnd (4) each element.
The initialization of the prior parameter specifically comprises: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:
wherein the content of the first and second substances,representing a dirac Delta function; />Means are->The variance is greater or less>A gaussian distribution of (d); the function in +>Is a variable; initializing a prior parameter>,/>,/>。
The initialization of the message transfer function specifically includes: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:
wherein, the first and the second end of the pipe are connected with each other,is an n-dimensional column direction with elements all being 0An amount; />The method is characterized in that the method is an n-dimensional column vector with elements all being 1, and subscripts represent the dimension of the vector; />Is a random variable obeying independent same variance multidimensional Gaussian distribution; />Is an n-column dimensional vector with element 1; initialization of a device>,/>,/>。
In the determinant vector factor graph of the Cox regression model, four multidimensional random variables are used for representing messages transmitted on the factor graph, namely, the messages are regarded as a multidimensional Gaussian probability density function, and the moment matching process requires that the messages obey the following distribution:
wherein the content of the first and second substances,is a random variable obeying independent same variance multidimensional Gaussian distribution; />The vector is an n-column dimensional vector with an element of 1, and subscripts represent the dimension of the vector; />Is a p-column dimensional vector with element 1, the subscript representing the vector dimension; when the elements of the multidimensional gaussian random variables are independent of each other, i.e., the off-diagonal elements of the covariance matrix are 0, the diagonal matrix can be represented by vectors.
Further, the step S22 is specifically to perform message transmission on the determinant vector factor graph of the Cox regression model based on the moment matching rule, and includes the following steps:
s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically:
at a nodeOn, will>And->Multiplying and projecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the result obtained by projection>Is divided to get->The message of (2).
Wherein, the first and the second end of the pipe are connected with each other,is a projection operation, i.e. evaluating &>About>Is based on the mean vector->And the variance vector pick>Because it is a multidimensional Gaussian of independent covariance, the vector @>Is equal and the non-diagonal element is 0, and outputs ÷ greater than ∑>。
S222, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically: />
At a nodeOn, will>And->Multiply and then accumulate the variable>And projected to independent covarianceOn a multi-dimensional Gaussian distribution, the results obtained by projection are then summed>Is divided into>The message of (2); wherein->Is the dirac Delta function.
S223, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairingUpdating, specifically:
in thatOn the node, will->And->The result obtained by multiplication is projected on the multidimensional Gaussian distribution with independent covariance, and the result obtained by projection is combined with->Is divided into>The message of (2); wherein the mean value obtained by the projection operation>Are the Cox regression coefficients as the output result.
S224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairingUpdating, specifically:
in thatOn the node, will->And->Multiply and accumulate variablesProjecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the projected result>Is divided to get->The message of (2).
Wherein, due toHas an extremely complex form, and therefore the cumulant generation function and the Laplace method are used instead of->And carrying out projection operation.
Further, in step S223, the projection operation specifically includes:
wherein the content of the first and second substances,representing the approximate posterior probability of the regression coefficients; the mean value obtained by projection->I.e., the Cox regression coefficients of the model output.
Further, step S23 specifically includes: regression coefficient output from step S22And approximate a posteriori probability>In conjunction with the expected maximum algorithm, the prior parameter is pick>Carrying out automatic updating; the updated expression is specifically:
wherein the content of the first and second substances,for the vector point divide, greater or lesser>Is a vector dot product.
The prior parameters are self-learned, and are automatically updated along with iteration of the whole algorithm without manual adjustment, so that the uncertainty of cross validation can be further avoided.
Further, the preset iteration ending condition in step S24 is specifically:
determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration(ii) a If the Crit value does not start to rise, continuing iteration; whereinRepresenting a norm.
The second aspect of the present invention provides a cancer gene prognosis screening system based on an improved Cox model, which comprises a memory and a processor, wherein the memory includes a cancer gene prognosis screening program based on the improved Cox model, and the processor executes the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrixFor the first matrix->A pre-processing is carried out, resulting in a second matrix->。
S2, survival data obtained in the step S1 and a second matrixXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a cancer gene prognosis screening method and a cancer gene prognosis screening system based on an improved Cox model.A factor graph is used as a tool, and the approximate posterior probability of a Cox regression coefficient is deduced by a moment matching message transmission method based on expected propagation; the method of minimum mean square error estimation is adopted to realize accurate estimation of the regression coefficient estimation value; in the aspect of prior parameters, an expectation maximization algorithm is adopted for automatic solution, so that cross validation is omitted, and the regression coefficient estimation is more accurate; in the specific implementation aspect, the Laplace method and the cumulant generation function are simplified to simplify the complex formAnd the iteration is successfully projected by the Gaussian multiplication, so that the problem of regression precision can be solved, a corresponding gene with a large absolute value in the regression coefficient is screened out to be used as a prognosis gene, and information is provided for subsequent prediction prognosis, relapse, transfer and even treatment guidance.
Drawings
FIG. 1 is a flow chart of the cancer gene prognosis screening method based on the improved Cox model of the present invention.
FIG. 2 is a flow chart of solving a Cox model in the cancer gene prognosis screening method based on the improved Cox model.
FIG. 3 is a flow chart of an embodiment of the invention for solving the Cox model.
FIG. 4 is a diagram of a determinant vector factor graph in an embodiment of the present invention.
FIG. 5 is a diagram illustrating a method of matching message delivery based on a desired propagation in accordance with an embodiment of the present invention.
FIG. 6 is a graph illustrating performance of regression performed on simulated data in an embodiment of the present invention.
FIG. 7 is a schematic structural diagram of a cancer gene prognosis screening system based on an improved Cox model according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1
As shown in FIG. 1, the present invention provides a method for screening cancer gene prognosis based on an improved Cox model, which comprises the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrixFor the first matrix->Preprocessing is carried out to obtain a second matrix +>。
S2, survival data obtained in the step S1 and a second matrixXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guidance information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Wherein, in the first matrixWherein the rows of the matrix represent patient information and the columns of the matrix represent gene segments of cancer cells; the first matrix->Indicates the expression level of the gene of the corresponding column in the patient of the corresponding row.
Wherein, the survival data comprises: covariate or secondary matrixXTime to live y and erasure index c.
The genes corresponding to the components with larger absolute values in the regression coefficients have larger influence on the survival time of the patient, and the prognostic gene set corresponding to high patient risk can be screened out by evaluating the regression coefficients.
The pretreatment process in the step S1 specifically comprises the following steps: removing irrelevant genes by biological information statistical means to obtain a second matrix with less columns。
Further, in step S2, first, a third matrix formed by combining the raw data and the second matrix is input into the preset Cox regression model; wherein the third matrix is denoted as [ X, y, c]Wherein X represents a covariate matrix, i.e. a second matrix, y represents the time-to-live, and c represents the erasure index; wherein the first stepiSurvival data for individual patients is。
Further, the firstiThe risk function for each of said patients is specifically:
whereinIs a shared benchmark risk function; />Obtaining a regression coefficient for solving the Cox regression model; />Is shown asiGene expression levels of individual patients.
Wherein the regression coefficient is fitted by regression using a Cox regression modelWe can then base on the gene expression level of the patient->To assess the risk of the patient, and the regression coefficient->The larger absolute value of the components has a larger influence on the survival time of the patient, and the genes corresponding to the components are the prognostic gene set to be screened out.
Further, in step S2, solving the Cox regression model to obtain a regression coefficient, as shown in fig. 2, specifically includes the following steps:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters.
And S22, projecting the high-dimensional information to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating to solve the model, and outputting a regression coefficient and an approximate posterior probability.
And S23, inputting the regression coefficient and the approximate posterior probability into an expectation maximization algorithm, and updating the prior parameter.
S24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not reached, the process returns to step S22 to perform the next iteration.
Wherein the third matrix is [ X, y, c ], X represents a covariate matrix, y represents survival time, and c represents a deletion index.
The method comprises the steps of solving the problem of regression coefficient estimation by means of a complete Bayesian analysis method, converting maximum likelihood estimation with penalty terms into minimum mean square error estimation of Bayesian angles, adopting a factor graph as a tool, calculating messages transmitted among nodes by a message transmission method based on expected propagation, and acquiring approximate posterior probability of the regression coefficient, wherein the approximate posterior probability is substantially the probability distribution obeyed by the approximation deduction of the regression coefficient.
Further, the prior parameters include: mean valueVariance->And a sparseness ratio>(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting Cox partial likelihood function to initialize prior parameter and message transfer function.
In a specific embodiment, the covariate matrix can be a gene expression matrix, wherein each row represents a different patient, each column represents a different gene, and an element in the matrix represents the expression of a gene of a person.
Wherein, the prior parameter and the regression coefficient both obey Gaussian-Bernoulli distribution and have sparsity.
The projection operation of the likelihood function nodes is approximately simplified by adopting a Laplace method and a moment generating function, so that the complex calculation is simplified, and a more accurate regression coefficient is solved under the condition of less loss.
Further, the normalizing the covariate matrix X specifically includes:
wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix.
The Cox partial likelihood function is specifically:
wherein, the first and the second end of the pipe are connected with each other,indicates that the function is->Is transferred to->The probability of the transition of (a) is, for representing +>AboutIs normalized; />The partial likelihood function of Cox is not normalized and represents a direct proportion relation; the function in +>Is a variable, the firstiElement/element->,/>Is->To (1) aiAnd (4) each element.
The initialization of the prior parameter specifically comprises: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:
wherein the content of the first and second substances,representing a dirac Delta function; />Means is->Variance of ^ er>(ii) a gaussian distribution of; the function is based on>Is a variable; initializing a prior parameter ≥>,/>,/>。
The initialization of the message transfer function specifically includes: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:
wherein, the first and the second end of the pipe are connected with each other,is an n-dimensional column vector with elements all 0; />Is an n-dimensional column vector with elements all being 1; />Is a random variable obeying independent same variance multidimensional Gaussian distribution; />Is an n-column dimensional vector with element 1; initialization->,,/>。
In a specific embodiment, the determinant vector factor graph of the Cox regression model is shown in fig. 4.
In the determinant vector factor graph of the Cox regression model, as shown in fig. 5, four multidimensional random variables are used to represent messages passing through the factor graph, i.e., the messages are regarded as a multidimensional gaussian probability density function, and the moment matching process requires that the messages obey the following distribution:
wherein the content of the first and second substances,is a random variable obeying independent same variance multidimensional Gaussian distribution; />The vector is an n-column dimensional vector with the element of 1, and the subscript represents the dimension of the vector; />Is a p-column dimensional vector with element 1, the subscript representing the vector dimension; when it is muchWhen the elements of the dimensional gaussian random variables are independent of each other, i.e., when the off-diagonal elements of the covariance matrix are 0, the diagonal matrix can be represented by vectors.
In a specific embodiment, a priori parameters, i.e. a priori distribution, are setIn>-a sparsity parameter, <' > based on>-a mean value parameter->-the initial value of the variance parameter is &>,/>,/>And then automatically updating the prior parameters by adopting an expected maximum algorithm.
Further, the step S22 is specifically to perform message transmission on the determinant vector factor graph of the Cox regression model based on the moment matching rule, and includes the following steps:
s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically:
at a nodeUp, will>And->Multiplying and projecting the result to a multidimensional Gaussian distribution with independent covariance, and summing and->Is divided to get->The message of (2).
Wherein the content of the first and second substances,is a projection operation, i.e. determines->About>Is based on the mean vector->And the variance vector pick>Because it is a multidimensional Gaussian of independent covariance, the vector @>Is equal and the off-diagonal element is 0, and outputs ≥>。
S222, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairingUpdating, specifically:
at a nodeUp, will>And->Multiply and then accumulate the variable>And projected on a multidimensional Gaussian distribution with independent covariance, and the projected result is summed>Is divided to get->The message of (2); wherein +>Is the dirac Delta function.
S223, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically:
in thatOn the node, will->And->The result obtained by multiplication is projected on the multidimensional Gaussian distribution with independent covariance, and the result obtained by projection is combined with->Is divided into>The message of (2); wherein the mean value obtained by the projection operation->Are the Cox regression coefficients as the output result.
S224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairingUpdating, specifically:
in thatOn a node, will &>And->Multiply and accumulate the variable pick>Projecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the projected result>Is divided by the message to obtainThe message of (2).
Wherein, due toHas an extremely complex form and therefore uses the cumulant generation function and the Laplace method instead of->And carrying out projection operation.
Further, in step S223, the projection operation specifically includes:
wherein the content of the first and second substances,representing an approximate posterior probability of the regression coefficient; projected mean value>I.e., the Cox regression coefficients of the model output.
Further, step S23 specifically includes: regression coefficient output from step S22And approximate posterior probabilityIn conjunction with the expected maximum algorithm, the prior parameter is pick>Carrying out automatic updating; the updated expression is specifically:
wherein the content of the first and second substances,and &>Are all related to>Is expressed as follows:
wherein the content of the first and second substances,for the vector point divide, greater or lesser>Is a vector dot product.
The prior parameters are self-learned, and are automatically updated along with iteration of the whole algorithm without manual adjustment, so that the uncertainty of cross validation can be further avoided.
Further, the preset iteration ending condition in step S24 is specifically:
determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration(ii) a If the Crit value does not start to rise, continuing iteration; whereinRepresenting a norm.
In a specific embodiment, the performance of regression on simulated data in a single experiment is shown in FIG. 6, where the black line is the true value and the asterisk is the estimated value.
The generation mode of the analog data is as follows:
For is toIndependently sampled and/or sampled in a binomial distribution B (1,0.8)>Wherein the deletion rate is 0.2.
example 2
Based on the above embodiment 1, with reference to fig. 3, this embodiment describes in detail a specific process of solving the Cox model in the present invention.
In one particular embodiment, as shown in FIG. 3, the known data is,/>,The regression coefficient is->。
Step 1:
S 1.1:XInitialization
Wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix.
S1.2: merging the existing survival data (covariate matrix-X, survival time-y, deletion index-c) into a matrix [ X, y, c ] and sorting according to y descending order;
s1.3: substituting the sorted [ X, y, c ] into a Cox partial likelihood function:
indicates that the function is->Is transferred to->Which implies a->About>Is normalized (characteristic of the probability density function), and->The partial likelihood function is a Cox partial likelihood function, which is not normalized, so the partial likelihood function is in a direct proportion relation; the function is based on>Is a variable, the firstiElement/element->,/>Is->To (1) aiAnd (4) each element.
S1.4: it is assumed that the prior obeys a gaussian-bernoulli distribution:
S1.5: initializing a positive direction message:
wherein initialization is carried out,/>,/>;/>Is an n-dimensional column vector with elements all 0; />An n-dimensional column vector with an element of 1, the subscripts denote the dimension of the vector.
Step 2: message passing on factor graph based on moment matching rule-expectation propagation algorithm (expectation propagation)
S2.1: updating: in or on>On a node, willAnd->Multiply and project onto a multidimensional Gaussian distribution of independent covariance and then remove->The message of (2):
wherein the content of the first and second substances,is a projection operation, i.e. evaluating &>About>Is based on the mean vector->And the variance vector pick>(diagonal of covariance matrix) because it is an independent covariance multi-dimensional Gaussian, so the vector &>Is equal and the off-diagonal element is 0, and outputs ≥>。
whereinI.e. based on>Is greater than or equal to>,/>Is composed ofIs detected (#) and>is paired and/or matched>Second order gradient of).
The meanings are as follows: when/is>Takes out its diagonal when it is a matrix, when->When the vector is a vector, the vector is stretched into a diagonal matrix.
Is to average the vector>In the form of a vector point divide, device for combining or screening>Is a vector dot product.
Wherein, the first and the second end of the pipe are connected with each other,by taking pairs>And (3) solving by using a coordinate ascending algorithm after quadratic approximation:
wherein the content of the first and second substances,is->In or on>At a gradient of->Is->In or on>A black plug matrix of (a). After rewriting, the following are obtained:
wherein, the first and the second end of the pipe are connected with each other,will eventually>The method is simplified into the following steps:
wherein the content of the first and second substances,is->To (1) aiElement, then apply Coordinate Ascent algorithm (Coordinate Ascent):
S2.1.3: updatingIs at>Is closed by a black plug matrix>For>To (1) akLine ofkColumn element(for accelerated calculations, only diagonal elements are kept to approximate the entire matrix):
If the change is still large, the iteration is continued by returning to S2.1.2.
S2.2: updating: is at>On a node, willAnd &>Multiply and then accumulate variablesProjected onto a multidimensional Gaussian distribution of independent covariance and then eliminated->The message of (2):
wherein the content of the first and second substances,the n-dimensional column vector with the element of 1 is represented by subscript, wherein the dimension of the vector is represented by the subscript; />The meaning is as follows: when +>If it is a matrix, its diagonal is taken out, when->Opens it into a diagonal matrix if it is a vector, then holds it>Is to calculate the average value of vector;means to determine->In relation to->Mean value vector>And the variance vector pick>And output;/>Finger matrix inversion, and/or on/off>Refers to matrix transposition.
S2.3: updating: in or on>On a node, willAnd &>The result of the multiplication is projected onto a multidimensional Gaussian distribution of independent covariance and then removed>The message of (2):
Wherein the approximation of the regression coefficients is a posteriori as follows:
and mean value obtained by projection operationIt is the Cox regression coefficients that are to be output. />
S 2.4:Updating: in or on>On a node, willAnd &>Multiply and then accumulate a variable>Projected onto a multidimensional Gaussian distribution of independent covariance and then eliminated->The message of (2):
wherein, the first and the second end of the pipe are connected with each other,calculating to obtain:
Step 3: output of approximate posterior probability according to S2.3In conjunction with an expectation maximization algorithm (expectelationrecommendation), a prior parameter is combined>And carrying out automatic updating.
Step 4: judging whether a preset iteration end condition is reached:
the end conditions are as follows:
determine whether it starts to rise, if soStarting to rise, stopping the iteration process and outputting the regression coefficient->(in S2.3). Wherein->Is a norm.
Example 3
Based on the above example 1 and example 2, and with reference to fig. 7, this example illustrates a cancer gene prognosis screening system based on an improved Cox model in the second aspect of the present invention.
In a specific embodiment, as shown in fig. 7, the present invention further provides a cancer gene prognosis screening system based on an improved Cox model, which includes a memory and a processor, wherein the memory includes a cancer gene prognosis screening program based on the improved Cox model, and the cancer gene prognosis screening program based on the improved Cox model implements the following steps when executed by the processor:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrixFor the first matrix->Preprocessing is carried out to obtain a second matrix +>。
S2, survival data obtained in the step S1 and a second matrix are usedXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
The drawings depicting the positional relationship of the structures are for illustrative purposes only and are not to be construed as limiting the present patent.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.
Claims (9)
1. A cancer gene prognosis screening method based on an improved Cox model is characterized by comprising the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix;
s2, inputting the survival data obtained in the step S1 and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; the specific solving method is as follows:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by utilizing the sequenced data, and initializing prior parameters and message transmission parameters;
s22, projecting the high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating and solving the model, and outputting a regression coefficient and an approximate posterior probability;
s23, inputting the regression coefficient and the approximate posterior probability into an expected maximum algorithm, and updating prior parameters;
s24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not reached, returning to the step S22 for the next iteration;
s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk;
and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
2. The method of claim 1, wherein in step S2, the survival data and the second matrix are combined to form a third matrix, and the third matrix is inputted into the predetermined Cox regression model; wherein the third matrix is denoted as [ X, y, c]X represents a covariate matrix, namely a second matrix, y represents survival time, and c represents deletion index; wherein the first stepiSurvival data for individual patients is。
3. The method of claim 2, wherein the first step is to select the improved Cox model based cancer gene prognosisiThe risk function for each of said patients is specifically:
4. The method of claim 1, wherein the prior parameters comprise: mean valueVariance->And sparsity ratio>(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting a Cox partial likelihood function to initialize the prior parameter and the message transfer function.
5. The method of claim 1, wherein the normalization process of the X matrix of covariate matrix is as follows:
wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix;
the Cox partial likelihood function is specifically:
the transition probability needs to normalize and use a partial likelihood function, specifically:
wherein, the first and the second end of the pipe are connected with each other,indicates that the function is->Is transferred to->For representing ≥ a transition probability>About>Is normalized; />Is a Cox partial likelihood function, not normalized; />The sign is a direct proportion sign and represents a direct proportion relation; the function is based on>Is a variable, the firstiElement/element->,/>Is->To (1) aiAn element;
the initialization of the prior parameters specifically comprises the following steps: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:
wherein the content of the first and second substances,representing a dirac Delta function; />Means is->The variance is greater or less>(ii) a gaussian distribution of; the function is based on>Is a variable; initializing a prior parameter ≥>,/>,/>;
The initialization of the message transfer function is specifically: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:
initialization parametersWherein is present>Is an n-dimensional column vector with elements all 0; />Is a p-dimensional column vector with elements all 0; />Is an n-dimensional column vector with elements all being 1; />Is a p-dimensional column vector with elements all being 1;
the message transfer function of the negative direction message is specifically:
is a random variable obeying independent same variance multidimensional Gaussian distribution; wherein-represents subject to a profile>Represents a mean vector of { [>The covariance matrix is a diagonal matrix and the diagonal element is ≥>Is multi-dimensional Gaussian distribution,. Is greater than or equal to >>Is->Is based on the mean value of>Is->Is greater than or equal to>Is->Is based on the mean value of>Is->The variance of (a); />Is->Is based on the mean value of>Is->In (b) based on the variance of (c), in>Is->In the mean value of (a)>Is->The variance of (c). />
6. The method for screening cancer gene prognosis based on improved Cox model as claimed in claim 5, wherein said step S22 is specifically for message transmission on determinant vector factor graph of Cox regression model based on moment matching rule, comprising the following steps:
s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically:
at a nodeUp, will>And->Multiplying and projecting the result to a multidimensional Gaussian distribution with independent covariance, and summing and->Is divided to get->The message of (a);
s222, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically:
at a nodeUp, will>And->Multiply and then accumulate the variable>And projected on a multidimensional Gaussian distribution of independent covariance, the projected result is summed>Is divided to get->The message of (2); wherein->Is a dirac Delta function;
s223, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression modelUpdating, specifically:
in thatOn the node, will->And->The result obtained by multiplication is projected on the multidimensional Gaussian distribution with independent covariance, and the result obtained by projection is combined with->Is divided into>The message of (a); wherein the result of the evaluation in S223 is output>As a regression coefficient->;
S224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairingUpdating, specifically:
7. The method of claim 1, wherein the step S23 is specifically to: regression coefficient output from step S22And the approximate a posteriori probability->In conjunction with the expectation maximization algorithm, a priori parameters are combined>Carrying out automatic updating; the updated expression is specifically:
8. The method for screening cancer gene prognosis based on the improved Cox model according to any one of claims 1 to 7, wherein the iteration ending conditions preset in step S24 are specifically:
determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration(ii) a If the Crit value does not start to rise, continuing iteration; wherein +>Representing a norm.
9. A cancer gene prognosis screening system based on an improved Cox model comprises a memory and a processor, wherein the memory comprises a cancer gene prognosis screening program based on the improved Cox model, and the cancer gene prognosis screening program based on the improved Cox model realizes the following steps when being executed by the processor:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix;
s2, inputting the survival data obtained in the step S1 and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; the specific solving method is as follows:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters;
s22, projecting the high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating and solving the model, and outputting a regression coefficient and an approximate posterior probability;
s23, inputting the regression coefficient and the approximate posterior probability into an expected maximum algorithm, and updating prior parameters;
s24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not met, returning to the step S22 for the next iteration;
s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk;
and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211631423.4A CN115620808B (en) | 2022-12-19 | 2022-12-19 | Cancer gene prognosis screening method and system based on improved Cox model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211631423.4A CN115620808B (en) | 2022-12-19 | 2022-12-19 | Cancer gene prognosis screening method and system based on improved Cox model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115620808A CN115620808A (en) | 2023-01-17 |
CN115620808B true CN115620808B (en) | 2023-03-31 |
Family
ID=84879866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211631423.4A Active CN115620808B (en) | 2022-12-19 | 2022-12-19 | Cancer gene prognosis screening method and system based on improved Cox model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115620808B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116321620B (en) * | 2023-05-11 | 2023-08-11 | 杭州行至云起科技有限公司 | Intelligent lighting switch control system and method thereof |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022048071A1 (en) * | 2020-09-03 | 2022-03-10 | 中国科学院深圳先进技术研究院 | Tumor risk grading method and system, terminal, and storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110320390A1 (en) * | 2009-03-10 | 2011-12-29 | Kuznetsov Vladimir A | Method for identification, prediction and prognosis of cancer aggressiveness |
AU2015101194A4 (en) * | 2015-07-26 | 2015-10-08 | Macau University Of Science And Technology | Semi-Supervised Learning Framework based on Cox and AFT Models with L1/2 Regularization for Patient’s Survival Prediction |
CN106407689A (en) * | 2016-09-27 | 2017-02-15 | 牟合(上海)生物科技有限公司 | Stomach cancer prognostic marker screening and classifying method based on gene expression profile |
CN113409946A (en) * | 2021-07-02 | 2021-09-17 | 中山大学 | System and method for predicting cancer prognosis risk under high-dimensional deletion data |
-
2022
- 2022-12-19 CN CN202211631423.4A patent/CN115620808B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022048071A1 (en) * | 2020-09-03 | 2022-03-10 | 中国科学院深圳先进技术研究院 | Tumor risk grading method and system, terminal, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN115620808A (en) | 2023-01-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Labach et al. | Survey of dropout methods for deep neural networks | |
Meeds et al. | GPS-ABC: Gaussian process surrogate approximate Bayesian computation | |
Yin et al. | Model selection and estimation in the matrix normal graphical model | |
CN110909926A (en) | TCN-LSTM-based solar photovoltaic power generation prediction method | |
Alexandridis et al. | A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models | |
CN115620808B (en) | Cancer gene prognosis screening method and system based on improved Cox model | |
Eftekhari et al. | Extracting compact fuzzy rules for nonlinear system modeling using subtractive clustering, GA and unscented filter | |
Rischard et al. | Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation | |
CN113241122A (en) | Gene data variable selection and classification method based on fusion of adaptive elastic network and deep neural network | |
Sukys et al. | Approximating solutions of the chemical master equation using neural networks | |
Rad et al. | GP-RVM: Genetic programing-based symbolic regression using relevance vector machine | |
CN116401555A (en) | Method, system and storage medium for constructing double-cell recognition model | |
CN116629352A (en) | Hundred million-level parameter optimizing platform | |
CN108876038B (en) | Big data, artificial intelligence and super calculation synergetic material performance prediction method | |
Miao et al. | Fisher-Pitman permutation tests based on nonparametric poisson mixtures with application to single cell genomics | |
CN116959585B (en) | Deep learning-based whole genome prediction method | |
Roy et al. | A hidden-state Markov model for cell population deconvolution | |
Evangelou et al. | Estimation and prediction for spatial generalized linear mixed models with parametric links via reparameterized importance sampling | |
Rajpal et al. | Balancing training time vs. performance with bayesian early pruning | |
Dhulipala et al. | Efficient Bayesian inference with latent Hamiltonian neural networks in No-U-Turn Sampling | |
González-Vargas et al. | Validation methods for population models of gene expression dynamics | |
Adewale et al. | Boosting for correlated binary classification | |
Dhulipala et al. | Bayesian Inference with Latent Hamiltonian Neural Networks | |
Jankowiak | Bayesian variable selection in a million dimensions | |
Park et al. | Stepwise feature selection using generalized logistic loss |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |