CN115620808B - Cancer gene prognosis screening method and system based on improved Cox model - Google Patents

Cancer gene prognosis screening method and system based on improved Cox model Download PDF

Info

Publication number
CN115620808B
CN115620808B CN202211631423.4A CN202211631423A CN115620808B CN 115620808 B CN115620808 B CN 115620808B CN 202211631423 A CN202211631423 A CN 202211631423A CN 115620808 B CN115620808 B CN 115620808B
Authority
CN
China
Prior art keywords
matrix
cox
patient
regression
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211631423.4A
Other languages
Chinese (zh)
Other versions
CN115620808A (en
Inventor
张善书
张浩川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202211631423.4A priority Critical patent/CN115620808B/en
Publication of CN115620808A publication Critical patent/CN115620808A/en
Application granted granted Critical
Publication of CN115620808B publication Critical patent/CN115620808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B35/00ICT specially adapted for in silico combinatorial libraries of nucleic acids, proteins or peptides
    • G16B35/20Screening of libraries
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/20Supervised data analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Evolutionary Biology (AREA)
  • General Physics & Mathematics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Biotechnology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Library & Information Science (AREA)
  • Mathematical Physics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Chemical & Material Sciences (AREA)
  • Computational Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Public Health (AREA)
  • Epidemiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Operations Research (AREA)
  • Probability & Statistics with Applications (AREA)
  • Analytical Chemistry (AREA)
  • Algebra (AREA)
  • Biochemistry (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a cancer gene prognosis screening method and a cancer gene prognosis screening system based on an improved Cox model, which comprises the following steps: s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix; s2, inputting the survival data and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk; and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory. Compared with the traditional technology, the accuracy of regression is improved in the regression part through the addition of prior and the automatic updating of parameters, and guidance information is provided for predicting prognosis, recurrence and metastasis.

Description

Cancer gene prognosis screening method and system based on improved Cox model
Technical Field
The invention relates to the technical field of survival analysis Cox model regression, in particular to a cancer gene prognosis screening method and system based on an improved Cox model.
Background
With the rise and development of DNA microarray technology, the technology can monitor the expression level of thousands of genes simultaneously to study the effect of certain treatments, diseases and developmental stages on gene expression. Commonly used scenarios are: detecting the gene expression of cancer cells of a plurality of cancer patients, obtaining the survival data of the patients through follow-up, finally carrying out statistical analysis on the collected data by using a survival analysis means, and finally screening genes relevant to prognosis. The research on the relation between the prognostic gene and the tumor can provide information for predicting prognosis, recurrence, metastasis and even guiding treatment, and the final purpose is to provide help for individualized treatment of patients and further provide breakthrough for the treatment of cancer.
The collected survival data and gene expression quantity need to be subjected to systematic survival analysis, more than ten key prognostic genes are screened from tens of thousands of genes, the step is an indispensable loop in the whole prognostic analysis, and the risk of cancer patients can be evaluated through a gene set consisting of the more than ten genes, so that more treatment information is provided.
Among them, the Cox regression model is widely used in medical follow-up studies, and is the multifactorial analysis method most frequently used in survival analysis so far. The regression model is a semi-parameter model based on covariate linear combination, takes the survival outcome and the survival time as dependent variables, can simultaneously analyze the influence of a plurality of factors on the survival time, can analyze the data with the truncated survival time, does not require to estimate the survival distribution type of the data, has excellent properties, and has great importance in the screening of cancer prognosis genes.
It is shown from the open literature that the most commonly used solution in the Cox regression model is through coordinate descent, proposed by Noah Simon et al, and follows a regularization path using a hot start: (
Figure DEST_PATH_IMAGE001
Norm sum>
Figure 470165DEST_PATH_IMAGE002
Norm as a penalty term) is fitted. However, the penalty coefficient is determined through cross validation, which makes the penalty coefficient not be solved accurately automatically, and since the fitting is calculated through an optimization method, the fitting is a point estimation, posterior distribution cannot be obtained, and prior parameters are automatically solved (i.e. the penalty coefficient) by combining an Expectation-maximization algorithm (Expectation-maximization), which makes the prognostic genes finally screened by the algorithm not be well associated with cancers.
Among them, cox regression is a survival analysis method, which is a loop in prognostic gene screening and plays an important role. The implication of the regression coefficients solved by the Cox regression model is to weight the risk of each corresponding gene, and only if the regression coefficients are accurate, the subsequent risk calculation for each patient will be accurate. Therefore, a method for solving the Cox regression model with higher accuracy is required.
To this end, in combination with the above needs and deficiencies of the prior art, the present application proposes a method and system for cancer gene prognosis screening based on an improved Cox model.
Disclosure of Invention
The invention provides a cancer gene prognosis screening method and system based on an improved Cox model, which improve regression precision in a regression part through prior addition and automatic updating of parameters, screen out corresponding genes with large absolute values in regression coefficients as prognosis genes, and provide information for subsequent prediction prognosis, relapse, transfer and even guide treatment.
The primary objective of the present invention is to solve the above technical problems, and the technical solution of the present invention is as follows:
the first aspect of the invention provides a cancer gene prognosis screening method based on an improved Cox model, which comprises the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrix
Figure DEST_PATH_IMAGE003
For the first matrix>
Figure 200355DEST_PATH_IMAGE004
Preprocessing is carried out to obtain a second matrix +>
Figure 511251DEST_PATH_IMAGE005
S2, survival data obtained in the step S1 and a second matrixXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Wherein, in the first matrix
Figure 437618DEST_PATH_IMAGE006
Wherein the rows of the matrix represent patient information and the columns of the matrix represent gene segments of cancer cells; first matrix +>
Figure 533882DEST_PATH_IMAGE006
Indicates the expression level of the gene of the corresponding column in the patient of the corresponding row.
Wherein, the survival data comprises: covariate or secondary matrixXTime to live y and erasure index c.
The genes corresponding to the components with larger absolute values in the regression coefficients have larger influence on the survival time of the patient, and the prognostic gene set corresponding to high patient risk can be screened out by evaluating the regression coefficients.
The pretreatment process in the step S1 specifically comprises the following steps: removing irrelevant genes by biological information statistical means to obtain a second matrix with less columns
Figure 887503DEST_PATH_IMAGE005
Further, in step S2, first, a third matrix formed by combining the raw data and the second matrix is input into the preset Cox regression model; wherein the third matrix is denoted as [ X, y, c ]]Wherein X represents a covariate matrix, i.e. a second matrix, y represents the time-to-live, and c represents the erasure index; wherein the first stepiSurvival data for individual patients is
Figure 685694DEST_PATH_IMAGE007
Further, the firstiThe risk function for each of said patients is specifically:
Figure 415753DEST_PATH_IMAGE008
wherein
Figure DEST_PATH_IMAGE009
Is a shared benchmark risk function; />
Figure 894751DEST_PATH_IMAGE010
Obtaining a regression coefficient for solving the Cox regression model; />
Figure 419273DEST_PATH_IMAGE011
Is shown asiGene expression levels of individual patients.
Wherein the regression coefficient is fitted by regression using Cox regression model
Figure 704761DEST_PATH_IMAGE010
We can then base on the gene expression level of the patient->
Figure 723664DEST_PATH_IMAGE011
To assess patient risk, and regression coefficients>
Figure 574945DEST_PATH_IMAGE010
The larger absolute value of the components has a larger influence on the survival time of the patient, and the genes corresponding to the components are the prognostic gene set to be screened out.
Further, the step S2 of solving the Cox regression model to obtain the regression coefficient specifically includes the following steps:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters.
And S22, projecting the high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating and solving the model, and outputting a regression coefficient and an approximate posterior probability.
And S23, inputting the regression coefficient and the approximate posterior probability into an expectation maximization algorithm, and updating the prior parameter.
S24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not reached, the process returns to step S22 to perform the next iteration.
Wherein the third matrix is [ X, y, c ], X represents a covariate matrix, y represents the survival time, and c represents the deletion index.
The method comprises the steps of solving the problem of regression coefficient estimation by means of a complete Bayesian analysis method, converting maximum likelihood estimation with penalty terms into minimum mean square error estimation of Bayesian angles, adopting a factor graph as a tool, calculating messages transmitted among nodes by a message transmission method based on expected propagation, and acquiring approximate posterior probability of the regression coefficient, wherein the approximate posterior probability is substantially the probability distribution obeyed by the approximation deduction of the regression coefficient.
Further, the prior parameters include: mean value
Figure 535948DEST_PATH_IMAGE012
Variance->
Figure DEST_PATH_IMAGE013
And a sparseness ratio>
Figure 590623DEST_PATH_IMAGE014
(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting a Cox partial likelihood function to initialize the prior parameter and the message transfer function.
Wherein, the prior parameter and the regression coefficient both obey Gaussian-Bernoulli distribution and have sparsity.
The projection operation of the likelihood function nodes is simplified approximately by adopting a Laplace method and a moment generating function, so that the complex calculation is simplified, and a more accurate regression coefficient is solved under the condition of less loss.
Further, the normalizing the covariate matrix X specifically includes:
Figure 396905DEST_PATH_IMAGE015
wherein mean (a)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix.
The Cox partial likelihood function is specifically:
Figure 571534DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 188592DEST_PATH_IMAGE017
indicates that the function is->
Figure 183092DEST_PATH_IMAGE018
Is transferred to->
Figure DEST_PATH_IMAGE019
For representing ≥ a transition probability>
Figure 324224DEST_PATH_IMAGE017
About
Figure 892041DEST_PATH_IMAGE019
Is normalized; />
Figure 194846DEST_PATH_IMAGE020
The partial likelihood function of Cox is not normalized and represents a direct proportion relation; the function is based on>
Figure 676643DEST_PATH_IMAGE018
Is a variable, the firstiElement/element->
Figure 575460DEST_PATH_IMAGE021
,/>
Figure 459102DEST_PATH_IMAGE022
Is->
Figure 932809DEST_PATH_IMAGE023
To (1) aiAnd (4) each element.
The initialization of the prior parameter specifically comprises: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:
Figure 167481DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 604410DEST_PATH_IMAGE025
representing a dirac Delta function; />
Figure 608138DEST_PATH_IMAGE026
Means are->
Figure 252746DEST_PATH_IMAGE027
The variance is greater or less>
Figure 709135DEST_PATH_IMAGE028
A gaussian distribution of (d); the function in +>
Figure 215334DEST_PATH_IMAGE029
Is a variable; initializing a prior parameter>
Figure 807989DEST_PATH_IMAGE030
,/>
Figure 623499DEST_PATH_IMAGE031
,/>
Figure 567184DEST_PATH_IMAGE032
The initialization of the message transfer function specifically includes: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:
Figure 670882DEST_PATH_IMAGE033
wherein, the first and the second end of the pipe are connected with each other,
Figure 383623DEST_PATH_IMAGE034
is an n-dimensional column direction with elements all being 0An amount; />
Figure 104454DEST_PATH_IMAGE035
The method is characterized in that the method is an n-dimensional column vector with elements all being 1, and subscripts represent the dimension of the vector; />
Figure 286168DEST_PATH_IMAGE036
Is a random variable obeying independent same variance multidimensional Gaussian distribution; />
Figure 383437DEST_PATH_IMAGE035
Is an n-column dimensional vector with element 1; initialization of a device>
Figure 950684DEST_PATH_IMAGE037
,/>
Figure 842417DEST_PATH_IMAGE038
,/>
Figure 308164DEST_PATH_IMAGE039
In the determinant vector factor graph of the Cox regression model, four multidimensional random variables are used for representing messages transmitted on the factor graph, namely, the messages are regarded as a multidimensional Gaussian probability density function, and the moment matching process requires that the messages obey the following distribution:
Figure 209124DEST_PATH_IMAGE040
wherein the content of the first and second substances,
Figure 630878DEST_PATH_IMAGE041
is a random variable obeying independent same variance multidimensional Gaussian distribution; />
Figure 444245DEST_PATH_IMAGE042
The vector is an n-column dimensional vector with an element of 1, and subscripts represent the dimension of the vector; />
Figure 584239DEST_PATH_IMAGE043
Is a p-column dimensional vector with element 1, the subscript representing the vector dimension; when the elements of the multidimensional gaussian random variables are independent of each other, i.e., the off-diagonal elements of the covariance matrix are 0, the diagonal matrix can be represented by vectors.
Further, the step S22 is specifically to perform message transmission on the determinant vector factor graph of the Cox regression model based on the moment matching rule, and includes the following steps:
s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure 288890DEST_PATH_IMAGE044
Updating, specifically:
Figure 830730DEST_PATH_IMAGE045
at a node
Figure 880244DEST_PATH_IMAGE046
On, will>
Figure 507534DEST_PATH_IMAGE047
And->
Figure 750297DEST_PATH_IMAGE048
Multiplying and projecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the result obtained by projection>
Figure 349905DEST_PATH_IMAGE049
Is divided to get->
Figure 770654DEST_PATH_IMAGE050
The message of (2).
Wherein, the first and the second end of the pipe are connected with each other,
Figure 681978DEST_PATH_IMAGE051
is a projection operation, i.e. evaluating &>
Figure 462852DEST_PATH_IMAGE052
About>
Figure 730016DEST_PATH_IMAGE053
Is based on the mean vector->
Figure 570933DEST_PATH_IMAGE054
And the variance vector pick>
Figure 172816DEST_PATH_IMAGE055
Because it is a multidimensional Gaussian of independent covariance, the vector @>
Figure 491802DEST_PATH_IMAGE055
Is equal and the non-diagonal element is 0, and outputs ÷ greater than ∑>
Figure 613473DEST_PATH_IMAGE056
S222, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure 625291DEST_PATH_IMAGE057
Updating, specifically: />
Figure 245628DEST_PATH_IMAGE058
At a node
Figure 381687DEST_PATH_IMAGE059
On, will>
Figure 607132DEST_PATH_IMAGE060
And->
Figure 789852DEST_PATH_IMAGE061
Multiply and then accumulate the variable>
Figure 100747DEST_PATH_IMAGE062
And projected to independent covarianceOn a multi-dimensional Gaussian distribution, the results obtained by projection are then summed>
Figure 777848DEST_PATH_IMAGE063
Is divided into>
Figure 857799DEST_PATH_IMAGE064
The message of (2); wherein->
Figure 273737DEST_PATH_IMAGE065
Is the dirac Delta function.
S223, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairing
Figure 822661DEST_PATH_IMAGE063
Updating, specifically:
Figure 552720DEST_PATH_IMAGE066
in that
Figure 487178DEST_PATH_IMAGE067
On the node, will->
Figure 11700DEST_PATH_IMAGE068
And->
Figure 47920DEST_PATH_IMAGE069
The result obtained by multiplication is projected on the multidimensional Gaussian distribution with independent covariance, and the result obtained by projection is combined with->
Figure 316090DEST_PATH_IMAGE070
Is divided into>
Figure 370634DEST_PATH_IMAGE063
The message of (2); wherein the mean value obtained by the projection operation>
Figure 331637DEST_PATH_IMAGE071
Are the Cox regression coefficients as the output result.
S224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairing
Figure 604222DEST_PATH_IMAGE072
Updating, specifically:
Figure 472821DEST_PATH_IMAGE073
in that
Figure 647451DEST_PATH_IMAGE074
On the node, will->
Figure 264508DEST_PATH_IMAGE075
And->
Figure 524588DEST_PATH_IMAGE074
Multiply and accumulate variables
Figure 868982DEST_PATH_IMAGE076
Projecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the projected result>
Figure 898117DEST_PATH_IMAGE077
Is divided to get->
Figure 686076DEST_PATH_IMAGE072
The message of (2).
Wherein, due to
Figure 433452DEST_PATH_IMAGE078
Has an extremely complex form, and therefore the cumulant generation function and the Laplace method are used instead of->
Figure 643854DEST_PATH_IMAGE078
And carrying out projection operation.
Further, in step S223, the projection operation specifically includes:
Figure 278228DEST_PATH_IMAGE079
wherein the content of the first and second substances,
Figure 751935DEST_PATH_IMAGE080
representing the approximate posterior probability of the regression coefficients; the mean value obtained by projection->
Figure 721028DEST_PATH_IMAGE081
I.e., the Cox regression coefficients of the model output.
Further, step S23 specifically includes: regression coefficient output from step S22
Figure 672804DEST_PATH_IMAGE082
And approximate a posteriori probability>
Figure 424334DEST_PATH_IMAGE083
In conjunction with the expected maximum algorithm, the prior parameter is pick>
Figure 537784DEST_PATH_IMAGE084
Carrying out automatic updating; the updated expression is specifically:
Figure 994173DEST_PATH_IMAGE085
Figure 749640DEST_PATH_IMAGE086
Figure 358607DEST_PATH_IMAGE087
wherein the content of the first and second substances,
Figure 174116DEST_PATH_IMAGE088
and &>
Figure 117801DEST_PATH_IMAGE089
Are all about>
Figure 676958DEST_PATH_IMAGE090
Is expressed as follows:
Figure 140432DEST_PATH_IMAGE091
wherein the content of the first and second substances,
Figure 126842DEST_PATH_IMAGE092
for the vector point divide, greater or lesser>
Figure 557824DEST_PATH_IMAGE093
Is a vector dot product.
The prior parameters are self-learned, and are automatically updated along with iteration of the whole algorithm without manual adjustment, so that the uncertainty of cross validation can be further avoided.
Further, the preset iteration ending condition in step S24 is specifically:
Figure 655093DEST_PATH_IMAGE094
determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration
Figure 973073DEST_PATH_IMAGE095
(ii) a If the Crit value does not start to rise, continuing iteration; wherein
Figure 130385DEST_PATH_IMAGE096
Representing a norm.
The second aspect of the present invention provides a cancer gene prognosis screening system based on an improved Cox model, which comprises a memory and a processor, wherein the memory includes a cancer gene prognosis screening program based on the improved Cox model, and the processor executes the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrix
Figure 251924DEST_PATH_IMAGE097
For the first matrix->
Figure 887305DEST_PATH_IMAGE097
A pre-processing is carried out, resulting in a second matrix->
Figure 574638DEST_PATH_IMAGE005
S2, survival data obtained in the step S1 and a second matrixXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a cancer gene prognosis screening method and a cancer gene prognosis screening system based on an improved Cox model.A factor graph is used as a tool, and the approximate posterior probability of a Cox regression coefficient is deduced by a moment matching message transmission method based on expected propagation; the method of minimum mean square error estimation is adopted to realize accurate estimation of the regression coefficient estimation value; in the aspect of prior parameters, an expectation maximization algorithm is adopted for automatic solution, so that cross validation is omitted, and the regression coefficient estimation is more accurate; in the specific implementation aspect, the Laplace method and the cumulant generation function are simplified to simplify the complex form
Figure 656514DEST_PATH_IMAGE078
And the iteration is successfully projected by the Gaussian multiplication, so that the problem of regression precision can be solved, a corresponding gene with a large absolute value in the regression coefficient is screened out to be used as a prognosis gene, and information is provided for subsequent prediction prognosis, relapse, transfer and even treatment guidance.
Drawings
FIG. 1 is a flow chart of the cancer gene prognosis screening method based on the improved Cox model of the present invention.
FIG. 2 is a flow chart of solving a Cox model in the cancer gene prognosis screening method based on the improved Cox model.
FIG. 3 is a flow chart of an embodiment of the invention for solving the Cox model.
FIG. 4 is a diagram of a determinant vector factor graph in an embodiment of the present invention.
FIG. 5 is a diagram illustrating a method of matching message delivery based on a desired propagation in accordance with an embodiment of the present invention.
FIG. 6 is a graph illustrating performance of regression performed on simulated data in an embodiment of the present invention.
FIG. 7 is a schematic structural diagram of a cancer gene prognosis screening system based on an improved Cox model according to the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited by the specific embodiments disclosed below.
Example 1
As shown in FIG. 1, the present invention provides a method for screening cancer gene prognosis based on an improved Cox model, which comprises the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrix
Figure 796508DEST_PATH_IMAGE097
For the first matrix->
Figure 501159DEST_PATH_IMAGE097
Preprocessing is carried out to obtain a second matrix +>
Figure 42998DEST_PATH_IMAGE005
S2, survival data obtained in the step S1 and a second matrixXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guidance information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
Wherein, in the first matrix
Figure 558425DEST_PATH_IMAGE006
Wherein the rows of the matrix represent patient information and the columns of the matrix represent gene segments of cancer cells; the first matrix->
Figure 185715DEST_PATH_IMAGE006
Indicates the expression level of the gene of the corresponding column in the patient of the corresponding row.
Wherein, the survival data comprises: covariate or secondary matrixXTime to live y and erasure index c.
The genes corresponding to the components with larger absolute values in the regression coefficients have larger influence on the survival time of the patient, and the prognostic gene set corresponding to high patient risk can be screened out by evaluating the regression coefficients.
The pretreatment process in the step S1 specifically comprises the following steps: removing irrelevant genes by biological information statistical means to obtain a second matrix with less columns
Figure 162898DEST_PATH_IMAGE005
Further, in step S2, first, a third matrix formed by combining the raw data and the second matrix is input into the preset Cox regression model; wherein the third matrix is denoted as [ X, y, c]Wherein X represents a covariate matrix, i.e. a second matrix, y represents the time-to-live, and c represents the erasure index; wherein the first stepiSurvival data for individual patients is
Figure 824824DEST_PATH_IMAGE007
Further, the firstiThe risk function for each of said patients is specifically:
Figure 245572DEST_PATH_IMAGE098
wherein
Figure 360159DEST_PATH_IMAGE099
Is a shared benchmark risk function; />
Figure 344295DEST_PATH_IMAGE100
Obtaining a regression coefficient for solving the Cox regression model; />
Figure 657465DEST_PATH_IMAGE101
Is shown asiGene expression levels of individual patients.
Wherein the regression coefficient is fitted by regression using a Cox regression model
Figure 701644DEST_PATH_IMAGE102
We can then base on the gene expression level of the patient->
Figure 54259DEST_PATH_IMAGE103
To assess the risk of the patient, and the regression coefficient->
Figure 638824DEST_PATH_IMAGE102
The larger absolute value of the components has a larger influence on the survival time of the patient, and the genes corresponding to the components are the prognostic gene set to be screened out.
Further, in step S2, solving the Cox regression model to obtain a regression coefficient, as shown in fig. 2, specifically includes the following steps:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters.
And S22, projecting the high-dimensional information to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating to solve the model, and outputting a regression coefficient and an approximate posterior probability.
And S23, inputting the regression coefficient and the approximate posterior probability into an expectation maximization algorithm, and updating the prior parameter.
S24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not reached, the process returns to step S22 to perform the next iteration.
Wherein the third matrix is [ X, y, c ], X represents a covariate matrix, y represents survival time, and c represents a deletion index.
The method comprises the steps of solving the problem of regression coefficient estimation by means of a complete Bayesian analysis method, converting maximum likelihood estimation with penalty terms into minimum mean square error estimation of Bayesian angles, adopting a factor graph as a tool, calculating messages transmitted among nodes by a message transmission method based on expected propagation, and acquiring approximate posterior probability of the regression coefficient, wherein the approximate posterior probability is substantially the probability distribution obeyed by the approximation deduction of the regression coefficient.
Further, the prior parameters include: mean value
Figure 275342DEST_PATH_IMAGE104
Variance->
Figure 287160DEST_PATH_IMAGE105
And a sparseness ratio>
Figure 858563DEST_PATH_IMAGE106
(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting Cox partial likelihood function to initialize prior parameter and message transfer function.
In a specific embodiment, the covariate matrix can be a gene expression matrix, wherein each row represents a different patient, each column represents a different gene, and an element in the matrix represents the expression of a gene of a person.
Wherein, the prior parameter and the regression coefficient both obey Gaussian-Bernoulli distribution and have sparsity.
The projection operation of the likelihood function nodes is approximately simplified by adopting a Laplace method and a moment generating function, so that the complex calculation is simplified, and a more accurate regression coefficient is solved under the condition of less loss.
Further, the normalizing the covariate matrix X specifically includes:
Figure 246819DEST_PATH_IMAGE015
wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix.
The Cox partial likelihood function is specifically:
Figure 472264DEST_PATH_IMAGE016
wherein, the first and the second end of the pipe are connected with each other,
Figure 654983DEST_PATH_IMAGE017
indicates that the function is->
Figure 982190DEST_PATH_IMAGE107
Is transferred to->
Figure 908558DEST_PATH_IMAGE108
The probability of the transition of (a) is, for representing +>
Figure 722930DEST_PATH_IMAGE017
About
Figure 342131DEST_PATH_IMAGE108
Is normalized; />
Figure 891055DEST_PATH_IMAGE020
The partial likelihood function of Cox is not normalized and represents a direct proportion relation; the function in +>
Figure 621113DEST_PATH_IMAGE107
Is a variable, the firstiElement/element->
Figure 821150DEST_PATH_IMAGE021
,/>
Figure 611252DEST_PATH_IMAGE022
Is->
Figure 381893DEST_PATH_IMAGE023
To (1) aiAnd (4) each element.
The initialization of the prior parameter specifically comprises: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:
Figure 915642DEST_PATH_IMAGE109
wherein the content of the first and second substances,
Figure 970186DEST_PATH_IMAGE025
representing a dirac Delta function; />
Figure 665610DEST_PATH_IMAGE026
Means is->
Figure 641656DEST_PATH_IMAGE027
Variance of ^ er>
Figure 455461DEST_PATH_IMAGE110
(ii) a gaussian distribution of; the function is based on>
Figure 364511DEST_PATH_IMAGE111
Is a variable; initializing a prior parameter ≥>
Figure 496415DEST_PATH_IMAGE030
,/>
Figure 490916DEST_PATH_IMAGE031
,/>
Figure 117200DEST_PATH_IMAGE032
The initialization of the message transfer function specifically includes: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:
Figure 146336DEST_PATH_IMAGE033
wherein, the first and the second end of the pipe are connected with each other,
Figure 183562DEST_PATH_IMAGE034
is an n-dimensional column vector with elements all 0; />
Figure 930938DEST_PATH_IMAGE035
Is an n-dimensional column vector with elements all being 1; />
Figure 360914DEST_PATH_IMAGE036
Is a random variable obeying independent same variance multidimensional Gaussian distribution; />
Figure 244556DEST_PATH_IMAGE035
Is an n-column dimensional vector with element 1; initialization->
Figure 452684DEST_PATH_IMAGE112
Figure 421777DEST_PATH_IMAGE113
,/>
Figure 389864DEST_PATH_IMAGE114
In a specific embodiment, the determinant vector factor graph of the Cox regression model is shown in fig. 4.
In the determinant vector factor graph of the Cox regression model, as shown in fig. 5, four multidimensional random variables are used to represent messages passing through the factor graph, i.e., the messages are regarded as a multidimensional gaussian probability density function, and the moment matching process requires that the messages obey the following distribution:
Figure 128013DEST_PATH_IMAGE033
wherein the content of the first and second substances,
Figure 772621DEST_PATH_IMAGE041
is a random variable obeying independent same variance multidimensional Gaussian distribution; />
Figure 229010DEST_PATH_IMAGE115
The vector is an n-column dimensional vector with the element of 1, and the subscript represents the dimension of the vector; />
Figure 732279DEST_PATH_IMAGE043
Is a p-column dimensional vector with element 1, the subscript representing the vector dimension; when it is muchWhen the elements of the dimensional gaussian random variables are independent of each other, i.e., when the off-diagonal elements of the covariance matrix are 0, the diagonal matrix can be represented by vectors.
In a specific embodiment, a priori parameters, i.e. a priori distribution, are set
Figure 590513DEST_PATH_IMAGE116
In>
Figure 406023DEST_PATH_IMAGE117
-a sparsity parameter, <' > based on>
Figure 287391DEST_PATH_IMAGE118
-a mean value parameter->
Figure 846548DEST_PATH_IMAGE119
-the initial value of the variance parameter is &>
Figure 310022DEST_PATH_IMAGE030
,/>
Figure 296432DEST_PATH_IMAGE031
,/>
Figure 727414DEST_PATH_IMAGE032
And then automatically updating the prior parameters by adopting an expected maximum algorithm.
Further, the step S22 is specifically to perform message transmission on the determinant vector factor graph of the Cox regression model based on the moment matching rule, and includes the following steps:
s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure 824683DEST_PATH_IMAGE120
Updating, specifically:
Figure 142663DEST_PATH_IMAGE121
at a node
Figure 565554DEST_PATH_IMAGE122
Up, will>
Figure 218252DEST_PATH_IMAGE123
And->
Figure 119212DEST_PATH_IMAGE124
Multiplying and projecting the result to a multidimensional Gaussian distribution with independent covariance, and summing and->
Figure 557278DEST_PATH_IMAGE049
Is divided to get->
Figure 885491DEST_PATH_IMAGE120
The message of (2).
Wherein the content of the first and second substances,
Figure 25485DEST_PATH_IMAGE125
is a projection operation, i.e. determines->
Figure 464557DEST_PATH_IMAGE052
About>
Figure 494479DEST_PATH_IMAGE053
Is based on the mean vector->
Figure 993594DEST_PATH_IMAGE126
And the variance vector pick>
Figure 620884DEST_PATH_IMAGE055
Because it is a multidimensional Gaussian of independent covariance, the vector @>
Figure 863647DEST_PATH_IMAGE055
Is equal and the off-diagonal element is 0, and outputs ≥>
Figure 276305DEST_PATH_IMAGE056
S222, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairing
Figure 946320DEST_PATH_IMAGE057
Updating, specifically:
Figure 795328DEST_PATH_IMAGE127
at a node
Figure 841781DEST_PATH_IMAGE128
Up, will>
Figure 108946DEST_PATH_IMAGE129
And->
Figure 949863DEST_PATH_IMAGE061
Multiply and then accumulate the variable>
Figure 82904DEST_PATH_IMAGE062
And projected on a multidimensional Gaussian distribution with independent covariance, and the projected result is summed>
Figure 418201DEST_PATH_IMAGE063
Is divided to get->
Figure 789140DEST_PATH_IMAGE130
The message of (2); wherein +>
Figure 800958DEST_PATH_IMAGE065
Is the dirac Delta function.
S223, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure 890137DEST_PATH_IMAGE063
Updating, specifically:
Figure 760616DEST_PATH_IMAGE131
in that
Figure 251640DEST_PATH_IMAGE067
On the node, will->
Figure 434360DEST_PATH_IMAGE132
And->
Figure 745256DEST_PATH_IMAGE069
The result obtained by multiplication is projected on the multidimensional Gaussian distribution with independent covariance, and the result obtained by projection is combined with->
Figure 422356DEST_PATH_IMAGE133
Is divided into>
Figure 767887DEST_PATH_IMAGE063
The message of (2); wherein the mean value obtained by the projection operation->
Figure 121508DEST_PATH_IMAGE071
Are the Cox regression coefficients as the output result.
S224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairing
Figure 919699DEST_PATH_IMAGE072
Updating, specifically:
Figure 400490DEST_PATH_IMAGE134
in that
Figure 600527DEST_PATH_IMAGE074
On a node, will &>
Figure 328312DEST_PATH_IMAGE075
And->
Figure 410538DEST_PATH_IMAGE074
Multiply and accumulate the variable pick>
Figure 695019DEST_PATH_IMAGE076
Projecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the projected result>
Figure 749563DEST_PATH_IMAGE135
Is divided by the message to obtain
Figure 648249DEST_PATH_IMAGE072
The message of (2).
Wherein, due to
Figure 421033DEST_PATH_IMAGE136
Has an extremely complex form and therefore uses the cumulant generation function and the Laplace method instead of->
Figure 492894DEST_PATH_IMAGE136
And carrying out projection operation.
Further, in step S223, the projection operation specifically includes:
Figure 979825DEST_PATH_IMAGE079
wherein the content of the first and second substances,
Figure 111729DEST_PATH_IMAGE080
representing an approximate posterior probability of the regression coefficient; projected mean value>
Figure 902968DEST_PATH_IMAGE137
I.e., the Cox regression coefficients of the model output.
Further, step S23 specifically includes: regression coefficient output from step S22
Figure 981782DEST_PATH_IMAGE138
And approximate posterior probability
Figure 496071DEST_PATH_IMAGE139
In conjunction with the expected maximum algorithm, the prior parameter is pick>
Figure 798877DEST_PATH_IMAGE084
Carrying out automatic updating; the updated expression is specifically:
Figure 280674DEST_PATH_IMAGE085
/>
Figure 694337DEST_PATH_IMAGE086
Figure 328712DEST_PATH_IMAGE087
wherein the content of the first and second substances,
Figure 802419DEST_PATH_IMAGE140
and &>
Figure 37091DEST_PATH_IMAGE141
Are all related to>
Figure 988867DEST_PATH_IMAGE142
Is expressed as follows:
Figure 743327DEST_PATH_IMAGE143
wherein the content of the first and second substances,
Figure 387935DEST_PATH_IMAGE144
for the vector point divide, greater or lesser>
Figure 844324DEST_PATH_IMAGE093
Is a vector dot product.
The prior parameters are self-learned, and are automatically updated along with iteration of the whole algorithm without manual adjustment, so that the uncertainty of cross validation can be further avoided.
Further, the preset iteration ending condition in step S24 is specifically:
Figure 599791DEST_PATH_IMAGE145
determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration
Figure 205828DEST_PATH_IMAGE146
(ii) a If the Crit value does not start to rise, continuing iteration; wherein
Figure 755758DEST_PATH_IMAGE096
Representing a norm.
In a specific embodiment, the performance of regression on simulated data in a single experiment is shown in FIG. 6, where the black line is the true value and the asterisk is the estimated value.
The generation mode of the analog data is as follows:
generated from independent standard normal samples
Figure 699443DEST_PATH_IMAGE147
For is to
Figure 258600DEST_PATH_IMAGE148
Independently sampled and/or sampled in a binomial distribution B (1,0.8)>
Figure 722074DEST_PATH_IMAGE149
Wherein the deletion rate is 0.2.
Generation from Laplace-Bernoulli samples
Figure 911747DEST_PATH_IMAGE150
Wherein the sparsity ratio is 0.2。
When in use
Figure 873887DEST_PATH_IMAGE151
And the firstiWhen no sample number is deleted:
Figure 971156DEST_PATH_IMAGE152
/>
wherein
Figure 554715DEST_PATH_IMAGE153
Independently sample from U (0,1) when & ->
Figure 915289DEST_PATH_IMAGE154
And the firstiWhen the number sample is deleted:
Figure 567987DEST_PATH_IMAGE155
example 2
Based on the above embodiment 1, with reference to fig. 3, this embodiment describes in detail a specific process of solving the Cox model in the present invention.
In one particular embodiment, as shown in FIG. 3, the known data is
Figure 468947DEST_PATH_IMAGE156
,/>
Figure 156280DEST_PATH_IMAGE157
Figure 235226DEST_PATH_IMAGE158
The regression coefficient is->
Figure 375220DEST_PATH_IMAGE159
Step 1:
S 1.1:XInitialization
Figure 79871DEST_PATH_IMAGE015
Wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix.
S1.2: merging the existing survival data (covariate matrix-X, survival time-y, deletion index-c) into a matrix [ X, y, c ] and sorting according to y descending order;
s1.3: substituting the sorted [ X, y, c ] into a Cox partial likelihood function:
Figure 621711DEST_PATH_IMAGE016
Figure 874488DEST_PATH_IMAGE017
indicates that the function is->
Figure 501778DEST_PATH_IMAGE107
Is transferred to->
Figure 744540DEST_PATH_IMAGE108
Which implies a->
Figure 140887DEST_PATH_IMAGE017
About>
Figure 561635DEST_PATH_IMAGE108
Is normalized (characteristic of the probability density function), and->
Figure 879484DEST_PATH_IMAGE020
The partial likelihood function is a Cox partial likelihood function, which is not normalized, so the partial likelihood function is in a direct proportion relation; the function is based on>
Figure 925937DEST_PATH_IMAGE107
Is a variable, the firstiElement/element->
Figure 176790DEST_PATH_IMAGE021
,/>
Figure 17707DEST_PATH_IMAGE022
Is->
Figure 370322DEST_PATH_IMAGE023
To (1) aiAnd (4) each element.
S1.4: it is assumed that the prior obeys a gaussian-bernoulli distribution:
Figure 954887DEST_PATH_IMAGE160
the function is as follows
Figure 591405DEST_PATH_IMAGE161
Is a variable; initializing a prior parameter ≥>
Figure 337644DEST_PATH_IMAGE030
,/>
Figure 443134DEST_PATH_IMAGE031
,/>
Figure 565811DEST_PATH_IMAGE162
S1.5: initializing a positive direction message:
Figure 791256DEST_PATH_IMAGE033
wherein initialization is carried out
Figure 239555DEST_PATH_IMAGE163
,/>
Figure 753713DEST_PATH_IMAGE164
,/>
Figure 427883DEST_PATH_IMAGE165
;/>
Figure 773414DEST_PATH_IMAGE034
Is an n-dimensional column vector with elements all 0; />
Figure 127035DEST_PATH_IMAGE035
An n-dimensional column vector with an element of 1, the subscripts denote the dimension of the vector.
Step 2: message passing on factor graph based on moment matching rule-expectation propagation algorithm (expectation propagation)
S2.1: updating
Figure 925227DEST_PATH_IMAGE166
: in or on>
Figure 406018DEST_PATH_IMAGE167
On a node, will
Figure 871634DEST_PATH_IMAGE166
And->
Figure 396156DEST_PATH_IMAGE122
Multiply and project onto a multidimensional Gaussian distribution of independent covariance and then remove->
Figure 681644DEST_PATH_IMAGE168
The message of (2):
Figure 700547DEST_PATH_IMAGE121
wherein the content of the first and second substances,
Figure 755091DEST_PATH_IMAGE169
is a projection operation, i.e. evaluating &>
Figure 716093DEST_PATH_IMAGE052
About>
Figure 488877DEST_PATH_IMAGE053
Is based on the mean vector->
Figure 498422DEST_PATH_IMAGE126
And the variance vector pick>
Figure 423783DEST_PATH_IMAGE055
(diagonal of covariance matrix) because it is an independent covariance multi-dimensional Gaussian, so the vector &>
Figure 290108DEST_PATH_IMAGE055
Is equal and the off-diagonal element is 0, and outputs ≥>
Figure 550188DEST_PATH_IMAGE056
By Laplace method and pairs of moment generating functions
Figure 160161DEST_PATH_IMAGE170
Simplifying to finally obtain:
Figure 931240DEST_PATH_IMAGE171
wherein
Figure 234046DEST_PATH_IMAGE172
I.e. based on>
Figure 715843DEST_PATH_IMAGE173
Is greater than or equal to>
Figure 395086DEST_PATH_IMAGE174
,/>
Figure 29461DEST_PATH_IMAGE175
Is composed of
Figure 503167DEST_PATH_IMAGE176
Is detected (#) and>
Figure 472260DEST_PATH_IMAGE176
is paired and/or matched>
Figure 424036DEST_PATH_IMAGE177
Second order gradient of).
Figure 178496DEST_PATH_IMAGE178
The meanings are as follows: when/is>
Figure 823104DEST_PATH_IMAGE179
Takes out its diagonal when it is a matrix, when->
Figure 279493DEST_PATH_IMAGE179
When the vector is a vector, the vector is stretched into a diagonal matrix.
Figure 34960DEST_PATH_IMAGE180
Is to average the vector>
Figure 830877DEST_PATH_IMAGE181
In the form of a vector point divide, device for combining or screening>
Figure 397119DEST_PATH_IMAGE182
Is a vector dot product.
Wherein, the first and the second end of the pipe are connected with each other,
Figure 340804DEST_PATH_IMAGE183
by taking pairs>
Figure 899962DEST_PATH_IMAGE184
And (3) solving by using a coordinate ascending algorithm after quadratic approximation:
firstly, the method is carried out
Figure 612703DEST_PATH_IMAGE185
Taylor expansion:
Figure 81337DEST_PATH_IMAGE186
/>
wherein the content of the first and second substances,
Figure 512318DEST_PATH_IMAGE187
is->
Figure 875166DEST_PATH_IMAGE188
In or on>
Figure 707993DEST_PATH_IMAGE189
At a gradient of->
Figure 350458DEST_PATH_IMAGE190
Is->
Figure 471998DEST_PATH_IMAGE191
In or on>
Figure 169696DEST_PATH_IMAGE189
A black plug matrix of (a). After rewriting, the following are obtained:
Figure 794712DEST_PATH_IMAGE192
wherein, the first and the second end of the pipe are connected with each other,
Figure 670395DEST_PATH_IMAGE193
will eventually>
Figure 810390DEST_PATH_IMAGE194
The method is simplified into the following steps:
Figure 515040DEST_PATH_IMAGE195
wherein the content of the first and second substances,
Figure 56880DEST_PATH_IMAGE196
is->
Figure 759257DEST_PATH_IMAGE197
To (1) aiElement, then apply Coordinate Ascent algorithm (Coordinate Ascent):
s2.1.1: initialization
Figure 137280DEST_PATH_IMAGE198
S2.1.2: updating
Figure 380042DEST_PATH_IMAGE199
Is at>
Figure 776389DEST_PATH_IMAGE200
Is taken to a gradient->
Figure 446404DEST_PATH_IMAGE201
For>
Figure 314653DEST_PATH_IMAGE201
To (1)kElement/element->
Figure 95527DEST_PATH_IMAGE202
Figure 877538DEST_PATH_IMAGE203
S2.1.3: updating
Figure 718455DEST_PATH_IMAGE204
Is at>
Figure 71070DEST_PATH_IMAGE189
Is closed by a black plug matrix>
Figure 655636DEST_PATH_IMAGE190
For>
Figure 26574DEST_PATH_IMAGE190
To (1) akLine ofkColumn element
Figure 38392DEST_PATH_IMAGE205
(for accelerated calculations, only diagonal elements are kept to approximate the entire matrix):
Figure 612724DEST_PATH_IMAGE206
s2.1.4: updating
Figure 204243DEST_PATH_IMAGE207
Figure 226425DEST_PATH_IMAGE208
S2.1.5: updating
Figure 409145DEST_PATH_IMAGE209
Figure 736352DEST_PATH_IMAGE210
S2.1.6: updating
Figure 662720DEST_PATH_IMAGE209
Is output if the change is small to a certain extent>
Figure 945934DEST_PATH_IMAGE209
Figure 565134DEST_PATH_IMAGE211
If the change is still large, the iteration is continued by returning to S2.1.2.
Finally, calculating the division part and outputting
Figure 363326DEST_PATH_IMAGE212
Figure 841187DEST_PATH_IMAGE213
Figure 775645DEST_PATH_IMAGE214
S2.2: updating
Figure 565746DEST_PATH_IMAGE215
: is at>
Figure 116813DEST_PATH_IMAGE216
On a node, will
Figure 135716DEST_PATH_IMAGE217
And &>
Figure 190260DEST_PATH_IMAGE216
Multiply and then accumulate variables
Figure 151263DEST_PATH_IMAGE218
Projected onto a multidimensional Gaussian distribution of independent covariance and then eliminated->
Figure 658467DEST_PATH_IMAGE219
The message of (2):
Figure 481061DEST_PATH_IMAGE127
wherein the content of the first and second substances,
Figure 858953DEST_PATH_IMAGE220
calculating to obtain:
Figure 725278DEST_PATH_IMAGE221
Figure 985358DEST_PATH_IMAGE222
Figure 595330DEST_PATH_IMAGE223
wherein the content of the first and second substances,
Figure 375199DEST_PATH_IMAGE224
the n-dimensional column vector with the element of 1 is represented by subscript, wherein the dimension of the vector is represented by the subscript; />
Figure 412425DEST_PATH_IMAGE225
The meaning is as follows: when +>
Figure 425380DEST_PATH_IMAGE226
If it is a matrix, its diagonal is taken out, when->
Figure 839044DEST_PATH_IMAGE227
Opens it into a diagonal matrix if it is a vector, then holds it>
Figure 488067DEST_PATH_IMAGE228
Is to calculate the average value of vector;
Figure 961774DEST_PATH_IMAGE229
means to determine->
Figure 930867DEST_PATH_IMAGE230
In relation to->
Figure 882643DEST_PATH_IMAGE227
Mean value vector>
Figure 89633DEST_PATH_IMAGE231
And the variance vector pick>
Figure 219394DEST_PATH_IMAGE232
And output
Figure 675783DEST_PATH_IMAGE233
;/>
Figure 431250DEST_PATH_IMAGE234
Finger matrix inversion, and/or on/off>
Figure 289484DEST_PATH_IMAGE235
Refers to matrix transposition.
Finally, calculating the division part and outputting
Figure 855726DEST_PATH_IMAGE236
Figure 64990DEST_PATH_IMAGE237
/>
Figure 358568DEST_PATH_IMAGE238
S2.3: updating
Figure 71310DEST_PATH_IMAGE239
: in or on>
Figure 808452DEST_PATH_IMAGE240
On a node, will
Figure 239434DEST_PATH_IMAGE241
And &>
Figure 336703DEST_PATH_IMAGE240
The result of the multiplication is projected onto a multidimensional Gaussian distribution of independent covariance and then removed>
Figure 107213DEST_PATH_IMAGE242
The message of (2):
Figure 264525DEST_PATH_IMAGE243
wherein the content of the first and second substances,
Figure 930605DEST_PATH_IMAGE244
the following calculation results:
Figure 565985DEST_PATH_IMAGE245
Figure 253319DEST_PATH_IMAGE246
wherein the content of the first and second substances,
Figure 581532DEST_PATH_IMAGE247
and &>
Figure 737838DEST_PATH_IMAGE248
Are all about>
Figure 442489DEST_PATH_IMAGE249
Is expressed as follows:
Figure 984328DEST_PATH_IMAGE250
finally, calculating the division part and outputting
Figure 483443DEST_PATH_IMAGE251
Figure 861466DEST_PATH_IMAGE252
Figure 838649DEST_PATH_IMAGE253
Wherein the approximation of the regression coefficients is a posteriori as follows:
Figure 500575DEST_PATH_IMAGE254
and mean value obtained by projection operation
Figure 170590DEST_PATH_IMAGE255
It is the Cox regression coefficients that are to be output. />
S 2.4:Updating
Figure 35909DEST_PATH_IMAGE256
: in or on>
Figure 20046DEST_PATH_IMAGE257
On a node, will
Figure 536478DEST_PATH_IMAGE258
And &>
Figure 174132DEST_PATH_IMAGE257
Multiply and then accumulate a variable>
Figure 529677DEST_PATH_IMAGE259
Projected onto a multidimensional Gaussian distribution of independent covariance and then eliminated->
Figure 114242DEST_PATH_IMAGE260
The message of (2):
Figure 688443DEST_PATH_IMAGE261
wherein, the first and the second end of the pipe are connected with each other,
Figure 496999DEST_PATH_IMAGE262
calculating to obtain:
Figure 71331DEST_PATH_IMAGE263
Figure 459587DEST_PATH_IMAGE264
Figure 685032DEST_PATH_IMAGE265
finally, calculating the division part and outputting
Figure 867752DEST_PATH_IMAGE266
Figure 194959DEST_PATH_IMAGE267
Figure 121327DEST_PATH_IMAGE268
Step 3: output of approximate posterior probability according to S2.3
Figure 201278DEST_PATH_IMAGE269
In conjunction with an expectation maximization algorithm (expectelationrecommendation), a prior parameter is combined>
Figure 820478DEST_PATH_IMAGE270
And carrying out automatic updating.
S3.1: updating
Figure 369402DEST_PATH_IMAGE271
Figure 833882DEST_PATH_IMAGE272
S3.2: updating
Figure 237181DEST_PATH_IMAGE273
Figure 27283DEST_PATH_IMAGE274
S3.3: updating
Figure 47191DEST_PATH_IMAGE275
:/>
Figure 328744DEST_PATH_IMAGE276
Step 4: judging whether a preset iteration end condition is reached:
the end conditions are as follows:
Figure 383287DEST_PATH_IMAGE277
determine whether it starts to rise, if so
Figure 78711DEST_PATH_IMAGE278
Starting to rise, stopping the iteration process and outputting the regression coefficient->
Figure 851495DEST_PATH_IMAGE279
(in S2.3). Wherein->
Figure 674088DEST_PATH_IMAGE280
Is a norm.
Example 3
Based on the above example 1 and example 2, and with reference to fig. 7, this example illustrates a cancer gene prognosis screening system based on an improved Cox model in the second aspect of the present invention.
In a specific embodiment, as shown in fig. 7, the present invention further provides a cancer gene prognosis screening system based on an improved Cox model, which includes a memory and a processor, wherein the memory includes a cancer gene prognosis screening program based on the improved Cox model, and the cancer gene prognosis screening program based on the improved Cox model implements the following steps when executed by the processor:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, and sorting the expression quantity of the different genes of the cancer cells and patient information into a first matrix
Figure 583139DEST_PATH_IMAGE281
For the first matrix->
Figure 715043DEST_PATH_IMAGE281
Preprocessing is carried out to obtain a second matrix +>
Figure 709544DEST_PATH_IMAGE282
S2, survival data obtained in the step S1 and a second matrix are usedXAnd inputting a preset Cox regression model, and solving to obtain a regression coefficient.
And S3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk.
And S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
The drawings depicting the positional relationship of the structures are for illustrative purposes only and are not to be construed as limiting the present patent.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (9)

1. A cancer gene prognosis screening method based on an improved Cox model is characterized by comprising the following steps:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix;
s2, inputting the survival data obtained in the step S1 and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; the specific solving method is as follows:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by utilizing the sequenced data, and initializing prior parameters and message transmission parameters;
s22, projecting the high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating and solving the model, and outputting a regression coefficient and an approximate posterior probability;
s23, inputting the regression coefficient and the approximate posterior probability into an expected maximum algorithm, and updating prior parameters;
s24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not reached, returning to the step S22 for the next iteration;
s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk;
and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
2. The method of claim 1, wherein in step S2, the survival data and the second matrix are combined to form a third matrix, and the third matrix is inputted into the predetermined Cox regression model; wherein the third matrix is denoted as [ X, y, c]X represents a covariate matrix, namely a second matrix, y represents survival time, and c represents deletion index; wherein the first stepiSurvival data for individual patients is
Figure QLYQS_1
3. The method of claim 2, wherein the first step is to select the improved Cox model based cancer gene prognosisiThe risk function for each of said patients is specifically:
Figure QLYQS_2
wherein
Figure QLYQS_3
Is a shared benchmark risk function; />
Figure QLYQS_4
Obtaining a regression coefficient for solving the Cox regression model; />
Figure QLYQS_5
Is shown asiGene expression levels of individual patients.
4. The method of claim 1, wherein the prior parameters comprise: mean value
Figure QLYQS_6
Variance->
Figure QLYQS_7
And sparsity ratio>
Figure QLYQS_8
(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting a Cox partial likelihood function to initialize the prior parameter and the message transfer function.
5. The method of claim 1, wherein the normalization process of the X matrix of covariate matrix is as follows:
Figure QLYQS_9
wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix;
the Cox partial likelihood function is specifically:
Figure QLYQS_10
the transition probability needs to normalize and use a partial likelihood function, specifically:
Figure QLYQS_11
wherein, the first and the second end of the pipe are connected with each other,
Figure QLYQS_13
indicates that the function is->
Figure QLYQS_16
Is transferred to->
Figure QLYQS_19
For representing ≥ a transition probability>
Figure QLYQS_14
About>
Figure QLYQS_15
Is normalized; />
Figure QLYQS_20
Is a Cox partial likelihood function, not normalized; />
Figure QLYQS_22
The sign is a direct proportion sign and represents a direct proportion relation; the function is based on>
Figure QLYQS_12
Is a variable, the firstiElement/element->
Figure QLYQS_17
,/>
Figure QLYQS_18
Is->
Figure QLYQS_21
To (1) aiAn element;
the initialization of the prior parameters specifically comprises the following steps: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:
Figure QLYQS_23
wherein the content of the first and second substances,
Figure QLYQS_24
representing a dirac Delta function; />
Figure QLYQS_28
Means is->
Figure QLYQS_30
The variance is greater or less>
Figure QLYQS_26
(ii) a gaussian distribution of; the function is based on>
Figure QLYQS_27
Is a variable; initializing a prior parameter ≥>
Figure QLYQS_29
,/>
Figure QLYQS_31
,/>
Figure QLYQS_25
The initialization of the message transfer function is specifically: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:
Figure QLYQS_32
initialization parameters
Figure QLYQS_33
Wherein is present>
Figure QLYQS_34
Is an n-dimensional column vector with elements all 0; />
Figure QLYQS_35
Is a p-dimensional column vector with elements all 0; />
Figure QLYQS_36
Is an n-dimensional column vector with elements all being 1; />
Figure QLYQS_37
Is a p-dimensional column vector with elements all being 1;
the message transfer function of the negative direction message is specifically:
Figure QLYQS_38
Figure QLYQS_53
is a random variable obeying independent same variance multidimensional Gaussian distribution; wherein-represents subject to a profile>
Figure QLYQS_45
Represents a mean vector of { [>
Figure QLYQS_49
The covariance matrix is a diagonal matrix and the diagonal element is ≥>
Figure QLYQS_47
Is multi-dimensional Gaussian distribution,. Is greater than or equal to >>
Figure QLYQS_51
Is->
Figure QLYQS_50
Is based on the mean value of>
Figure QLYQS_54
Is->
Figure QLYQS_48
Is greater than or equal to>
Figure QLYQS_52
Is->
Figure QLYQS_39
Is based on the mean value of>
Figure QLYQS_44
Is->
Figure QLYQS_55
The variance of (a); />
Figure QLYQS_57
Is->
Figure QLYQS_56
Is based on the mean value of>
Figure QLYQS_58
Is->
Figure QLYQS_41
In (b) based on the variance of (c), in>
Figure QLYQS_46
Is->
Figure QLYQS_42
In the mean value of (a)>
Figure QLYQS_43
Is->
Figure QLYQS_40
The variance of (c). />
6. The method for screening cancer gene prognosis based on improved Cox model as claimed in claim 5, wherein said step S22 is specifically for message transmission on determinant vector factor graph of Cox regression model based on moment matching rule, comprising the following steps:
s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure QLYQS_59
Updating, specifically:
Figure QLYQS_60
at a node
Figure QLYQS_61
Up, will>
Figure QLYQS_62
And->
Figure QLYQS_63
Multiplying and projecting the result to a multidimensional Gaussian distribution with independent covariance, and summing and->
Figure QLYQS_64
Is divided to get->
Figure QLYQS_65
The message of (a);
s222, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure QLYQS_66
Updating, specifically:
Figure QLYQS_67
at a node
Figure QLYQS_68
Up, will>
Figure QLYQS_69
And->
Figure QLYQS_70
Multiply and then accumulate the variable>
Figure QLYQS_71
And projected on a multidimensional Gaussian distribution of independent covariance, the projected result is summed>
Figure QLYQS_72
Is divided to get->
Figure QLYQS_73
The message of (2); wherein->
Figure QLYQS_74
Is a dirac Delta function;
s223, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model
Figure QLYQS_75
Updating, specifically:
Figure QLYQS_76
in that
Figure QLYQS_77
On the node, will->
Figure QLYQS_78
And->
Figure QLYQS_79
The result obtained by multiplication is projected on the multidimensional Gaussian distribution with independent covariance, and the result obtained by projection is combined with->
Figure QLYQS_80
Is divided into>
Figure QLYQS_81
The message of (a); wherein the result of the evaluation in S223 is output>
Figure QLYQS_82
As a regression coefficient->
Figure QLYQS_83
S224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairing
Figure QLYQS_84
Updating, specifically:
Figure QLYQS_85
in that
Figure QLYQS_86
On the node, will->
Figure QLYQS_87
And->
Figure QLYQS_88
Multiply and accumulate a variable>
Figure QLYQS_89
Projecting the result on a multidimensional Gaussian distribution with independent covariance, and summing the projected result>
Figure QLYQS_90
Is divided into>
Figure QLYQS_91
The message of (2).
7. The method of claim 1, wherein the step S23 is specifically to: regression coefficient output from step S22
Figure QLYQS_92
And the approximate a posteriori probability->
Figure QLYQS_93
In conjunction with the expectation maximization algorithm, a priori parameters are combined>
Figure QLYQS_94
Carrying out automatic updating; the updated expression is specifically:
Figure QLYQS_95
Figure QLYQS_96
Figure QLYQS_97
wherein the content of the first and second substances,
Figure QLYQS_98
and &>
Figure QLYQS_99
Are all about>
Figure QLYQS_100
Is expressed as follows:
Figure QLYQS_101
wherein the content of the first and second substances,
Figure QLYQS_102
for the vector point divide, greater or lesser>
Figure QLYQS_103
Is a vector dot product.
8. The method for screening cancer gene prognosis based on the improved Cox model according to any one of claims 1 to 7, wherein the iteration ending conditions preset in step S24 are specifically:
Figure QLYQS_104
determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration
Figure QLYQS_105
(ii) a If the Crit value does not start to rise, continuing iteration; wherein +>
Figure QLYQS_106
Representing a norm.
9. A cancer gene prognosis screening system based on an improved Cox model comprises a memory and a processor, wherein the memory comprises a cancer gene prognosis screening program based on the improved Cox model, and the cancer gene prognosis screening program based on the improved Cox model realizes the following steps when being executed by the processor:
s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix;
s2, inputting the survival data obtained in the step S1 and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; the specific solving method is as follows:
s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters;
s22, projecting the high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating and solving the model, and outputting a regression coefficient and an approximate posterior probability;
s23, inputting the regression coefficient and the approximate posterior probability into an expected maximum algorithm, and updating prior parameters;
s24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; if the preset iteration end condition is not met, returning to the step S22 for the next iteration;
s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk;
and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.
CN202211631423.4A 2022-12-19 2022-12-19 Cancer gene prognosis screening method and system based on improved Cox model Active CN115620808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211631423.4A CN115620808B (en) 2022-12-19 2022-12-19 Cancer gene prognosis screening method and system based on improved Cox model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211631423.4A CN115620808B (en) 2022-12-19 2022-12-19 Cancer gene prognosis screening method and system based on improved Cox model

Publications (2)

Publication Number Publication Date
CN115620808A CN115620808A (en) 2023-01-17
CN115620808B true CN115620808B (en) 2023-03-31

Family

ID=84879866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211631423.4A Active CN115620808B (en) 2022-12-19 2022-12-19 Cancer gene prognosis screening method and system based on improved Cox model

Country Status (1)

Country Link
CN (1) CN115620808B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116321620B (en) * 2023-05-11 2023-08-11 杭州行至云起科技有限公司 Intelligent lighting switch control system and method thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022048071A1 (en) * 2020-09-03 2022-03-10 中国科学院深圳先进技术研究院 Tumor risk grading method and system, terminal, and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110320390A1 (en) * 2009-03-10 2011-12-29 Kuznetsov Vladimir A Method for identification, prediction and prognosis of cancer aggressiveness
AU2015101194A4 (en) * 2015-07-26 2015-10-08 Macau University Of Science And Technology Semi-Supervised Learning Framework based on Cox and AFT Models with L1/2 Regularization for Patient’s Survival Prediction
CN106407689A (en) * 2016-09-27 2017-02-15 牟合(上海)生物科技有限公司 Stomach cancer prognostic marker screening and classifying method based on gene expression profile
CN113409946A (en) * 2021-07-02 2021-09-17 中山大学 System and method for predicting cancer prognosis risk under high-dimensional deletion data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022048071A1 (en) * 2020-09-03 2022-03-10 中国科学院深圳先进技术研究院 Tumor risk grading method and system, terminal, and storage medium

Also Published As

Publication number Publication date
CN115620808A (en) 2023-01-17

Similar Documents

Publication Publication Date Title
Labach et al. Survey of dropout methods for deep neural networks
Meeds et al. GPS-ABC: Gaussian process surrogate approximate Bayesian computation
Yin et al. Model selection and estimation in the matrix normal graphical model
CN110909926A (en) TCN-LSTM-based solar photovoltaic power generation prediction method
Alexandridis et al. A two-stage evolutionary algorithm for variable selection in the development of RBF neural network models
CN115620808B (en) Cancer gene prognosis screening method and system based on improved Cox model
Eftekhari et al. Extracting compact fuzzy rules for nonlinear system modeling using subtractive clustering, GA and unscented filter
Rischard et al. Unbiased estimation of log normalizing constants with applications to Bayesian cross-validation
CN113241122A (en) Gene data variable selection and classification method based on fusion of adaptive elastic network and deep neural network
Sukys et al. Approximating solutions of the chemical master equation using neural networks
Rad et al. GP-RVM: Genetic programing-based symbolic regression using relevance vector machine
CN116401555A (en) Method, system and storage medium for constructing double-cell recognition model
CN116629352A (en) Hundred million-level parameter optimizing platform
CN108876038B (en) Big data, artificial intelligence and super calculation synergetic material performance prediction method
Miao et al. Fisher-Pitman permutation tests based on nonparametric poisson mixtures with application to single cell genomics
CN116959585B (en) Deep learning-based whole genome prediction method
Roy et al. A hidden-state Markov model for cell population deconvolution
Evangelou et al. Estimation and prediction for spatial generalized linear mixed models with parametric links via reparameterized importance sampling
Rajpal et al. Balancing training time vs. performance with bayesian early pruning
Dhulipala et al. Efficient Bayesian inference with latent Hamiltonian neural networks in No-U-Turn Sampling
González-Vargas et al. Validation methods for population models of gene expression dynamics
Adewale et al. Boosting for correlated binary classification
Dhulipala et al. Bayesian Inference with Latent Hamiltonian Neural Networks
Jankowiak Bayesian variable selection in a million dimensions
Park et al. Stepwise feature selection using generalized logistic loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant