CN115620808A

CN115620808A - Cancer gene prognosis screening method and system based on improved Cox model

Info

Publication number: CN115620808A
Application number: CN202211631423.4A
Authority: CN
Inventors: 张善书; 张浩川
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-12-19
Filing date: 2022-12-19
Publication date: 2023-01-17
Anticipated expiration: 2042-12-19
Also published as: CN115620808B

Abstract

The invention discloses a cancer gene prognosis screening method and a cancer gene prognosis screening system based on an improved Cox model, which comprises the following steps: s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix; s2, inputting the survival data and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient; s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk; and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory. Compared with the traditional technology, the accuracy of regression is improved in the regression part through the addition of prior and the automatic updating of parameters, and guidance information is provided for predicting prognosis, recurrence and metastasis.

Description

Cancer gene prognosis screening method and system based on improved Cox model

技术领域technical field

本发明涉及生存分析Cox模型回归技术领域，更具体地，涉及一种基于改进Cox模型的癌症基因预后筛选方法及系统。The present invention relates to the technical field of survival analysis Cox model regression, and more specifically relates to a cancer gene prognosis screening method and system based on an improved Cox model.

背景技术Background technique

随着DNA微阵列技术的兴起和发展，该项技术可以同时监测数千个基因的表达水平以研究某些治疗，疾病和发育阶段对基因表达的影响。常用的场景为：检测多名癌症病人的癌变细胞的基因表达量，并通过随访获取这些病人的生存数据，最后利用生存分析手段对这些收集到的数据进行统计分析，最后筛选出预后相关的基因。研究预后基因与肿瘤的关系可以对预测预后、复发、转移乃至指导治疗提供信息，最终目的是为患者的个体化治疗提供帮助，进一步为癌症的治疗提供突破。With the rise and development of DNA microarray technology, this technology can simultaneously monitor the expression levels of thousands of genes to study the effects of certain treatments, diseases and developmental stages on gene expression. The commonly used scenario is: detecting the gene expression of cancerous cells of multiple cancer patients, and obtaining the survival data of these patients through follow-up, and finally using survival analysis methods to perform statistical analysis on these collected data, and finally screen out the genes related to prognosis . Studying the relationship between prognostic genes and tumors can provide information for predicting prognosis, recurrence, metastasis, and even guiding treatment. The ultimate goal is to provide assistance for individualized treatment of patients and further provide breakthroughs in cancer treatment.

而收集到的生存数据和基因表达量需要经过系统性的生存分析，从上万个基因中筛出十几个关键预后基因，这一步是整个预后分析中不可或缺的一环，通过这十几个基因组成的基因集，可以对癌症病人的风险进行评估，提供更多治疗信息。The collected survival data and gene expression levels need to undergo a systematic survival analysis to screen out more than a dozen key prognostic genes from tens of thousands of genes. This step is an indispensable part of the entire prognostic analysis. Through these ten A gene set composed of several genes can be used to assess the risk of cancer patients and provide more treatment information.

其中，Cox回归模型在医学随访研究中得到广泛的应用，是迄今生存分析中应用最多的多因素分析方法。它是一种基于协变量线性组合的半参数模型，该模型以生存结局和生存时间为因变量，可同时分析众多因素对生存时间的影响，能分析带有截尾生存时间的资料，且不要求估计资料的生存分布类型，具有优良的性质，该回归模型在癌症预后基因筛选中具有举足轻重的地位。Among them, the Cox regression model has been widely used in medical follow-up research, and is the most widely used multivariate analysis method in survival analysis so far. It is a semi-parametric model based on a linear combination of covariates. The model takes survival outcome and survival time as dependent variables. It can analyze the influence of many factors on survival time at the same time, and can analyze data with censored survival time. The type of survival distribution required to estimate the data has excellent properties, and the regression model plays a pivotal role in the screening of cancer prognosis genes.

根据公开文献显示，Cox回归模型中最常用到的求解方法是由Noah Simon等人于提出来的通过坐标下降，并使用热启动沿着正则化路径（

范数和

范数作为惩罚项）进行拟合的Cox回归方法。但其惩罚项系数通过交叉验证进行确定，这使得惩罚项系数无法自动地精确地求解，由于这种拟合是通过优化方法进行计算的，是一种点估计，无法得出后验分布并结合期望最大算法（Expectation-Maximum）进行先验参数自动求解（即惩罚项系数），这使得算法最终筛选出来的预后基因不能很好的和癌症相关联。According to the public literature, the most commonly used solution method in the Cox regression model is proposed by Noah Simon et al. through coordinate descent, and uses hot start along the regularization path (

Norm and

Norm as a penalty term) to fit the Cox regression method. However, the penalty item coefficient is determined through cross-validation, which makes the penalty item coefficient cannot be automatically and accurately solved. Since this fitting is calculated by an optimization method, it is a point estimate, and the posterior distribution cannot be obtained and combined with The Expectation-Maximum algorithm (Expectation-Maximum) automatically solves the prior parameters (that is, the penalty coefficient), which makes the prognosis genes finally screened by the algorithm not well correlated with cancer.

其中，Cox回归是一种生存分析方法，它是预后基因筛选中的一环，且占有重要地位。Cox回归模型求解得到的回归系数的含义是对每个对应基因的风险加权，只有回归系数准确了，后续每个患者的风险计算才会准确。因此，需要一种精度更高的求解Cox回归模型的方法。Among them, Cox regression is a survival analysis method, which is a part of prognostic gene screening and plays an important role. The meaning of the regression coefficient obtained by solving the Cox regression model is to weight the risk of each corresponding gene. Only when the regression coefficient is accurate can the subsequent risk calculation of each patient be accurate. Therefore, a method for solving the Cox regression model with higher accuracy is needed.

为此，结合以上需求和现有技术缺陷，本申请提出了一种基于改进Cox模型的癌症基因预后筛选方法及系统。Therefore, in combination with the above requirements and the defects of the prior art, the present application proposes a cancer gene prognosis screening method and system based on an improved Cox model.

发明内容Contents of the invention

本发明提供了一种基于改进Cox模型的癌症基因预后筛选方法及系统，在回归部分通过先验的加入及其参数的自动更新提高了的回归精度，并筛选出回归系数中绝对值大的对应基因作为预后基因，对后续的预测预后、复发、转移乃至指导治疗提供信息。The present invention provides a cancer gene prognosis screening method and system based on the improved Cox model. In the regression part, the regression accuracy is improved through the addition of a priori and the automatic update of its parameters, and the corresponding regression coefficient with a large absolute value is screened out. Genes, as prognostic genes, provide information for subsequent prediction of prognosis, recurrence, metastasis and even guidance of treatment.

本发明的首要目的是为解决上述技术问题，本发明的技术方案如下：Primary purpose of the present invention is to solve the above-mentioned technical problems, and technical scheme of the present invention is as follows:

本发明第一方面提供了一种基于改进Cox模型的癌症基因预后筛选方法，本方法包括以下步骤：The first aspect of the present invention provides a cancer gene prognosis screening method based on the improved Cox model, the method comprising the following steps:

S1、采集癌症患者的癌症细胞不同基因的表达量，收集患者的生存数据，将癌症细胞不同基因的表达量和患者信息整理为第一矩阵

，对第一矩阵

进行预处理，得到第二矩阵

。S1. Collect the expression levels of different genes in cancer cells of cancer patients, collect patient survival data, and organize the expression levels of different genes in cancer cells and patient information into the first matrix

, for the first matrix

Perform preprocessing to get the second matrix

.

S2、将步骤S1得到的生存数据和第二矩阵X输入预设的Cox回归模型，求解得到回归系数。S2. Input the survival data obtained in step S1 and the second matrix X into the preset Cox regression model, and solve to obtain the regression coefficients.

S3、根据患者的风险函数评估回归系数中对应基因的患者风险，筛选出高患者风险对应的预后基因组。S3. Evaluate the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screen out the prognostic gene group corresponding to the high patient risk.

S4、利用筛选出的预后基因组通过生物学理论对预测预后、复发和转移提供指导信息。S4. Use the screened prognostic genome to provide guidance information for predicting prognosis, recurrence and metastasis through biological theory.

其中，在第一矩阵

中，矩阵的行代表患者信息，矩阵的列代表癌症细胞的基因片段；第一矩阵

的某个元素表示对应行的病人体内对应列的基因的表达水平。Among them, in the first matrix

In , the rows of the matrix represent patient information, and the columns of the matrix represent gene fragments of cancer cells; the first matrix

An element of represents the expression level of the gene in the corresponding column in the patient in the corresponding row.

其中，所述生存数据包括有：协变量矩阵即第二矩阵X，生存时间y和删失索引c。Wherein, the survival data include: a covariate matrix, namely the second matrix X , survival time y and censoring index c.

其中，回归系数中绝对值较大的分量对应的基因对患者的生存时间有较大影响，通过评估回归系数能够筛选出高患者风险对应的预后基因集。Among them, the gene corresponding to the component with a larger absolute value in the regression coefficient has a greater impact on the survival time of the patient, and the prognostic gene set corresponding to the high patient risk can be screened out by evaluating the regression coefficient.

其中，步骤S1中预处理过程具体为：通过生物学信息统计手段去除无关基因，得到列数较少的第二矩阵

。Wherein, the preprocessing process in step S1 is specifically: removing irrelevant genes by means of biological information statistics to obtain a second matrix with fewer columns

.

进一步的，步骤S2中，首先将生存数据和第二矩阵组合形成的第三矩阵，将第三矩阵输入所述预设的Cox回归模型；其中，第三矩阵记作[X,y,c]，其中X代表协变量矩阵即第二矩阵，y代表生存时间，c代表删失索引；其中第i个病人的生存数据为

。Further, in step S2, the survival data and the second matrix are first combined to form a third matrix, and the third matrix is input into the preset Cox regression model; wherein, the third matrix is recorded as [X, y, c] , where X represents the covariate matrix, which is the second matrix, y represents the survival time, and c represents the censored index; where the survival data of the i -th patient is

.

进一步的，第i个所述患者的风险函数具体为：Further, the risk function of the ith patient is specifically:

其中

为共享基准风险函数；

为求解Cox回归模型得到的回归系数；

表示第i个患者的基因表达水平。in

is the shared benchmark risk function;

The regression coefficient obtained for solving the Cox regression model;

Indicates the gene expression level of the i -th patient.

其中，通过利用Cox回归模型回归拟合出回归系数

，我们就可以根据患者的基因表达水平

来评估患者风险，而回归系数

中绝对值较大的分量，则对患者生存时间起着较大的影响，而这些分量对应的基因正是我们要筛选出来的预后基因集。Among them, by using the Cox regression model regression to fit the regression coefficient

, according to the gene expression level of the patient, we can

to assess patient risk, and the regression coefficient

The components with larger absolute values have a greater impact on the survival time of patients, and the genes corresponding to these components are the prognostic gene sets we want to screen out.

进一步的，步骤S2中求解Cox回归模型得到回归系数，具体包括以下步骤：Further, in step S2, solving the Cox regression model to obtain regression coefficients specifically includes the following steps:

S21、将已有的生存数据合并成第三矩阵并根据参数生存时间排序，利用排序后的数据构建Cox回归模型，初始化先验参数和消息传递参数。S21. Merge existing survival data into a third matrix and sort according to parameter survival time, use the sorted data to construct a Cox regression model, and initialize prior parameters and message passing parameters.

S22、根据Cox回归模型的分列式矢量因子图，利用期望传播算法，通过矩匹配规则将高维消息投影到独立的高斯分布上，循环迭代求解模型，输出回归系数和近似后验概率。S22. According to the columnar vector factor diagram of the Cox regression model, the expectation propagation algorithm is used to project the high-dimensional information onto the independent Gaussian distribution through the moment matching rule, and iteratively solve the model, and output the regression coefficient and the approximate posterior probability.

S23、将回归系数和近似后验概率输入期望最大算法，更新先验参数。S23. Input the regression coefficient and the approximate posterior probability into the expectation maximization algorithm, and update the prior parameters.

S24、判断回归系数是否达到预设的迭代结束条件；若达到预设的迭代结束条件，则输出当前轮迭代得到的回归系数；若没有达到预设的迭代结束条件，则返回步骤S22进行下一轮迭代。S24, judging whether the regression coefficient reaches the preset iteration end condition; if the preset iteration end condition is reached, the regression coefficient obtained by the current round of iteration is output; if the preset iteration end condition is not reached, then return to step S22 for the next step round of iterations.

其中，所述第三矩阵为[X,y,c]，X代表协变量矩阵，y代表生存时间，c代表删失索引。Wherein, the third matrix is [X, y, c], X represents a covariate matrix, y represents survival time, and c represents a censored index.

其中，借助完整的贝叶斯分析方法解决回归系数估计的问题，将带惩罚项的最大似然估计转化为贝叶斯角度的最小均方误差估计，采用因子图作为工具，通过基于期望传播的消息传递方法计算节点间传递的消息，获取回归系数的近似后验概率，其实质为近似推断出回归系数所服从的概率分布。Among them, with the help of a complete Bayesian analysis method to solve the problem of regression coefficient estimation, the maximum likelihood estimation with penalty items is transformed into the minimum mean square error estimation of the Bayesian angle, and the factor graph is used as a tool. The message passing method calculates the messages transmitted between nodes and obtains the approximate posterior probability of the regression coefficients. Its essence is to approximate the probability distribution that the regression coefficients obey.

进一步的，所述先验参数包括有：均值

、方差

和稀疏率

；所述消息传递参数包括有：正方向消息的均值和方差；所述步骤S21具体为：将协变量矩阵X矩阵归一化，根据生存时间y对第三矩阵为[X,y,c]进行降序排序，将排序后的第三矩阵为[X,y,c]代入Cox部分似然函数，初始化先验参数和消息传递函数。Further, the prior parameters include: mean

,variance

and sparse rate

; The message delivery parameters include: the mean value and variance of the positive direction message; the step S21 is specifically: normalize the covariate matrix X matrix, and the third matrix is [X, y, c] according to the survival time y Perform descending sorting, substitute the sorted third matrix [X, y, c] into the Cox partial likelihood function, and initialize the prior parameters and message transfer function.

其中，所述先验参数和回归系数均服从高斯-伯努利分布，具有稀疏性。Wherein, the prior parameters and the regression coefficients all obey the Gauss-Bernoulli distribution and are sparse.

其中，采用拉普拉斯方法和矩生成函数，对似然函数节点的投影操作进行近似化简，让复杂的计算得以简化，在较小损失的情况下求解出较精确的回归系数。Among them, the Laplace method and the moment generation function are used to approximate and simplify the projection operation of the likelihood function node, which simplifies complex calculations and solves more accurate regression coefficients with less loss.

进一步的，所述将协变量矩阵X矩阵归一化具体为：Further, the normalization of the covariate matrix X matrix is specifically:

其中， mean(X)为X矩阵全体元素的均值， var(X)为X矩阵全体元素的方差。Among them, mean( X ) is the mean value of all elements of X matrix, and var( X ) is the variance of all elements of X matrix.

所述Cox部分似然函数具体为：The Cox partial likelihood function is specifically:

其中，

表示该函数为

转移到

的转移概率，用于表示

关于

是归一化的；

为Cox部分似然函数，未归一化，表示正比关系；该函数以

为变量，其第i个元素

，

为

的第i个元素。in,

means that the function is

move to

The transition probability of

about

is normalized;

is the Cox partial likelihood function, which is not normalized, and represents a proportional relationship; the function starts with

is a variable whose i -th element

,

for

The i -th element of .

所述先验参数的初始化具体为：令回归系数服从高斯-伯努利分布，其数学表达式为：The initialization of the prior parameters is specifically: make the regression coefficient obey the Gauss-Bernoulli distribution, and its mathematical expression is:

其中，

表示狄拉克Delta函数；

表示均值为

、方差为

的高斯分布；该函数以

为变量；初始化先验参数

，

，

。in,

Represents the Dirac Delta function;

Indicates that the mean is

, the variance is

Gaussian distribution; the function takes

as a variable; initialize the prior parameters

,

.

所述消息传递函数的初始化具体为：初始化正方向消息的消息传递函数，其数学表达式为：The initialization of the message transfer function is specifically: initialize the message transfer function of the message in the forward direction, and its mathematical expression is:

其中，

为元素全为0的n维列向量；

为元素全为1的n维列向量，下标表示向量的维度大小；

是服从独立同方差多维高斯分布的随机变量；

为元素为1的n列维向量；初始化

，

，

。in,

is an n-dimensional column vector whose elements are all 0;

is an n-dimensional column vector whose elements are all 1, and the subscript indicates the dimension of the vector;

is a random variable that obeys an independent homoscedastic multidimensional Gaussian distribution;

Is an n column-dimensional vector with elements 1; initialization

,

.

其中，在所述Cox回归模型的分列式矢量因子图中，使用四个多维随机变量表示因子图上传递的消息，即将消息视为一种多维高斯概率密度函数，所述矩匹配过程要求消息服从以下分布：Wherein, in the split vector factor diagram of the Cox regression model, four multidimensional random variables are used to represent the message transmitted on the factor diagram, that is, the message is regarded as a multidimensional Gaussian probability density function, and the moment matching process requires the message to obey the following distributed:

其中，

是服从独立同方差多维高斯分布的随机变量；

为元素为1的n列维向量，下标表示向量的维度大小；

为元素为1的p列维向量，下标表示向量维度；当多维高斯随机变量的元素相互独立时，即协方差矩阵非对角线元素为0时，能够采用向量来表示对角矩阵。in,

is an n-column vector with an element of 1, and the subscript indicates the dimension of the vector;

is a p-dimensional vector with an element of 1, and the subscript indicates the dimension of the vector; when the elements of the multidimensional Gaussian random variable are independent of each other, that is, when the off-diagonal elements of the covariance matrix are 0, the vector can be used to represent the diagonal matrix.

进一步的，所述步骤S22具体为，基于矩匹配规则在Cox回归模型的分列式矢量因子图上进行消息传递，包括以下步骤：Further, the step S22 is specifically, based on the moment matching rule, message passing is performed on the columnar vector factor graph of the Cox regression model, including the following steps:

S221、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

进行更新，具体为：S221, according to the moment matching rule of the split vector factor graph of the Cox regression model, to

Make an update, specifically:

在节点

上，将

的消息与

相乘并投影到独立同方差的多维高斯分布上，投影得到的结果再和

的消息相除，得到

的消息。at node

on, will

news with

Multiply and project onto a multidimensional Gaussian distribution with independent homoscedasticity, and then sum the results obtained by the projection

Divide the news, get

news.

其中，

是投影操作，即求出

关于

的均值向量

和方差向量

，因为是独立同方差的多维高斯，所以向量

中的每个元素都相等且非对角线元素为0，并输出

。in,

is a projection operation, that is, find

about

The mean vector of

and variance vector

, because it is a multidimensional Gaussian with independent homoscedasticity, so the vector

Each element in is equal and the off-diagonal elements are 0, and outputs

.

S222、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

进行更新，具体为：S222, according to the moment matching rule of the split vector factor graph of the Cox regression model, to

Make an update, specifically:

在节点

上，将

的消息和

相乘然后积掉变量

，并投影到独立同方差的多维高斯分布上，投影得到的结果再和

的消息相除，得到

的消息；其中

是狄拉克Delta函数。at node

on, will

news and

multiply and product the variables

, and projected onto a multidimensional Gaussian distribution with independent homoscedasticity, and the results obtained by the projection are summed

Divide the news, get

news; among them

is the Dirac Delta function.

S223、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

进行更新，具体为：S223, according to the moment matching rule of the split vector factor graph of the Cox regression model, to

Make an update, specifically:

在

节点上，将

的消息和

相乘得到的结果经过投影到独立同方差的多维高斯分布上，将投影得到的结果和

的消息相除，得到

的消息；其中，投影操作得到的均值

是作为输出结果的Cox回归系数。exist

on the node, the

news and

The results obtained by multiplication are projected onto the multidimensional Gaussian distribution with independent homoscedasticity, and the projected results and

Divide the news, get

The message; among them, the mean value obtained by the projection operation

is the Cox regression coefficient as the output result.

S224、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

进行更新，具体为：S224, according to the moment matching rule of the split vector factor graph of the Cox regression model, to

Make an update, specifically:

在

节点上，将

的消息和

相乘并积掉变量

，将结果投影到独立同方差的多维高斯分布上，投影得到的结果再和

的消息相除，得到

的消息。exist

on the node, the

news and

multiply and product variables

, project the result onto a multidimensional Gaussian distribution with independent homoscedasticity, and then sum the projected results

Divide the news, get

news.

其中，由于

具有极其复杂的形式，因此使用累积量生成函数和拉普拉斯方法替代

进行投影操作。Among them, due to

has an extremely complex form, so the cumulant generating function and Laplace's method are used instead

Perform projection operation.

进一步的，步骤S223中，投影操作具体为：Further, in step S223, the projection operation is specifically:

其中，

表示回归系数的近似后验概率；投影得到的均值

即是模型输出的Cox回归系数。in,

Represents the approximate posterior probability of the regression coefficient; the projected mean

That is, the Cox regression coefficient output by the model.

进一步的，所述步骤S23具体为：将步骤S22输出的回归系数

和近似后验概率

，配合期望最大算法，对先验参数

进行自动更新；更新的表达式具体为：Further, the step S23 is specifically: the regression coefficient output in the step S22

and the approximate posterior probability

, with the expectation maximization algorithm, for the prior parameters

Perform automatic update; the update expression is specifically:

其中，

和

都是关于

的函数，其表达式如下：in,

and

it's all about

function, whose expression is as follows:

其中，

为向量点除，

为向量点乘。in,

For vector point division,

is the vector dot product.

其中，通过使先验参数进行自学习，随着整体算法的迭代不断自动更新，而无需手动的调整，能进一步避免了交叉验证的不确定性。Among them, by making the prior parameters self-learning, they are automatically updated with the iteration of the overall algorithm without manual adjustment, which can further avoid the uncertainty of cross-validation.

进一步的，步骤S24中所述预设的迭代结束条件具体为：Further, the preset iteration end condition described in step S24 is specifically:

其中，通过判断Crit值是否开始上升决定是否结束迭代，若Crit值开始上升，则停止迭代过程并输出最终一轮迭代的回归系数

；若Crit值未开始上升，则继续迭代；其中

表示一范数。Among them, whether to end the iteration is determined by judging whether the Crit value starts to rise. If the Crit value starts to rise, the iterative process is stopped and the regression coefficient of the last round of iteration is output.

; If the Crit value does not start to rise, continue to iterate; where

represents a norm.

本发明第二方面提供了一种基于改进Cox模型的癌症基因预后筛选系统，包括有存储器和处理器，所述存储器中包括有基于改进Cox模型的癌症基因预后筛选程序，所述基于改进Cox模型的癌症基因预后筛选程序被所述处理器执行时实现如下步骤：The second aspect of the present invention provides a cancer gene prognosis screening system based on the improved Cox model, including a memory and a processor, the memory includes a cancer gene prognosis screening program based on the improved Cox model, and the improved Cox model based When the cancer gene prognosis screening program is executed by the processor, the following steps are implemented:

，对第一矩阵

进行预处理，得到第二矩阵

, for the first matrix

Perform preprocessing to get the second matrix

.

与现有技术相比，本发明技术方案的有益效果是：Compared with the prior art, the beneficial effects of the technical solution of the present invention are:

本发明提供了一种基于改进Cox模型的癌症基因预后筛选方法及系统，采用因子图作为工具，通过基于期望传播的矩匹配消息传递方法推断出Cox回归系数的近似后验概率；采用最小均方误差估计的方法，实现对回归系数估计值的准确估计；先验参数方面，采用期望最大算法自动求解，省去了交叉验证，使得回归系数估计更加精确；具体实现方面，通过拉普拉斯方法和累积量生成函数的化简，将形式复杂的

与高斯相乘成功投影，使得迭代得以进行，从而能够解决回归精度的问题，并筛选出回归系数中绝对值大的对应基因作为预后基因，对后续的预测预后、复发、转移乃至指导治疗提供信息。The present invention provides a cancer gene prognosis screening method and system based on an improved Cox model, using factor graph as a tool, and inferring the approximate posterior probability of Cox regression coefficients through the moment matching message passing method based on expected propagation; using least mean square The method of error estimation realizes accurate estimation of the estimated value of the regression coefficient; in terms of prior parameters, the expected maximum algorithm is used to automatically solve the problem, eliminating the need for cross-validation and making the estimation of the regression coefficient more accurate; in terms of specific implementation, the Laplace method is adopted and the simplification of the cumulant generating function, the complex form

Successful projection by multiplying with Gaussian enables iteration to be carried out, so that the problem of regression accuracy can be solved, and the corresponding gene with a large absolute value in the regression coefficient is selected as the prognostic gene, which provides information for subsequent prediction of prognosis, recurrence, metastasis, and even guidance for treatment .

附图说明Description of drawings

图1为本发明一种基于改进Cox模型的癌症基因预后筛选方法的流程图。Fig. 1 is a flow chart of a cancer gene prognosis screening method based on the improved Cox model of the present invention.

图2为本发明一种基于改进Cox模型的癌症基因预后筛选方法中求解Cox模型的流程图。Fig. 2 is a flow chart of solving the Cox model in a cancer gene prognosis screening method based on the improved Cox model of the present invention.

图3为本发明求解Cox模型的一种实施例的流程图。Fig. 3 is a flow chart of an embodiment of the present invention for solving the Cox model.

图4为本发明一种实施例中分列式矢量因子图的示意图。Fig. 4 is a schematic diagram of a columnar vector factor graph in an embodiment of the present invention.

图5为本发明一种实施例中基于期望传播的矩匹配消息传递方法的示意图。Fig. 5 is a schematic diagram of an expected propagation-based moment matching message delivery method in an embodiment of the present invention.

图6为本发明一种实施例中对模拟数据进行回归的性能表现。Fig. 6 shows the performance of regression on simulated data in an embodiment of the present invention.

图7为本发明一种基于改进Cox模型的癌症基因预后筛选系统的结构示意图。Fig. 7 is a schematic structural diagram of a cancer gene prognosis screening system based on the improved Cox model of the present invention.

具体实施方式detailed description

为了能够更清楚地理解本发明的上述目的、特征和优点，下面结合附图和具体实施方式对本发明进行进一步的详细描述。需要说明的是，在不冲突的情况下，本申请的实施例及实施例中的特征可以相互组合。In order to understand the above-mentioned purpose, features and advantages of the present invention more clearly, the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments. It should be noted that, in the case of no conflict, the embodiments of the present application and the features in the embodiments can be combined with each other.

在下面的描述中阐述了很多具体细节以便于充分理解本发明，但是，本发明还可以采用其他不同于在此描述的其他方式来实施，因此，本发明的保护范围并不受下面公开的具体实施例的限制。In the following description, many specific details are set forth in order to fully understand the present invention. However, the present invention can also be implemented in other ways different from those described here. Therefore, the protection scope of the present invention is not limited by the specific details disclosed below. EXAMPLE LIMITATIONS.

实施例1Example 1

如图1所示，本发明提供了一种基于改进Cox模型的癌症基因预后筛选方法，本方法包括以下步骤：As shown in Figure 1, the present invention provides a kind of cancer gene prognosis screening method based on improved Cox model, and this method comprises the following steps:

，对第一矩阵

进行预处理，得到第二矩阵

, for the first matrix

Perform preprocessing to get the second matrix

.

其中，在第一矩阵

.

.

其中

为共享基准风险函数；

为求解Cox回归模型得到的回归系数；

表示第i个患者的基因表达水平。in

is the shared benchmark risk function;

The regression coefficient obtained for solving the Cox regression model;

Indicates the gene expression level of the i -th patient.

其中，通过利用Cox回归模型回归拟合出回归系数

，我们就可以根据患者的基因表达水平

来评估患者风险，而回归系数

, according to the gene expression level of the patient, we can

to assess patient risk, and the regression coefficient

进一步的，步骤S2中求解Cox回归模型得到回归系数，如图2所示，具体包括以下步骤：Further, in step S2, the regression coefficient is obtained by solving the Cox regression model, as shown in Figure 2, which specifically includes the following steps:

进一步的，所述先验参数包括有：均值

、方差

和稀疏率

,variance

and sparse rate

在一个具体的实施例中，所述协变量矩阵能够采用基因表达量矩阵，其中每行代表不同病人，每列代表不同基因，矩阵中的某元素代表某个人的某个基因的表达量。In a specific embodiment, the covariate matrix can be a gene expression matrix, where each row represents a different patient, each column represents a different gene, and a certain element in the matrix represents the expression level of a certain gene of a certain person.

其中，

表示该函数为

转移到

的转移概率，用于表示

关于

是归一化的；

为Cox部分似然函数，未归一化，表示正比关系；该函数以

为变量，其第i个元素

，

为

的第i个元素。in,

means that the function is

move to

The transition probability of

about

is normalized;

is a variable whose i -th element

,

for

The i -th element of .

其中，

表示狄拉克Delta函数；

表示均值为

、方差为

的高斯分布；该函数以

为变量；初始化先验参数

，

，

。in,

Represents the Dirac Delta function;

Indicates that the mean is

, the variance is

Gaussian distribution; the function takes

as a variable; initialize the prior parameters

,

.

其中，

为元素全为0的n维列向量；

为元素全为1的n维列向量；

是服从独立同方差多维高斯分布的随机变量；

为元素为1的n列维向量；初始化

，

，

。in,

is an n-dimensional column vector whose elements are all 0;

is an n-dimensional column vector whose elements are all 1;

Is an n column-dimensional vector with elements 1; initialization

,

.

在一个具体的实施例中，所述Cox回归模型的分列式矢量因子图如图4所示。In a specific embodiment, the columnar vector factor diagram of the Cox regression model is shown in FIG. 4 .

其中，在所述Cox回归模型的分列式矢量因子图中，如图5所示，使用四个多维随机变量表示因子图上传递的消息，即将消息视为一种多维高斯概率密度函数，所述矩匹配过程要求消息服从以下分布：Wherein, in the split vector factor diagram of the Cox regression model, as shown in Figure 5, four multidimensional random variables are used to represent the message transmitted on the factor diagram, that is, the message is regarded as a multidimensional Gaussian probability density function, and the moment The matching process requires messages to obey the following distribution:

其中，

是服从独立同方差多维高斯分布的随机变量；

为元素为1的n列维向量，下标表示向量的维度大小；

在一个具体的实施例中，设定先验参数，既先验分布

中的

-稀疏参数，

-均值参数，

-方差参数的初始值分别为

，

，

，并在后续采用期望最大算法对先验参数进行自动更新。In a specific embodiment, the prior parameters are set, that is, the prior distribution

middle

- sparse parameters,

- mean parameter,

- The initial values of the variance parameters are

,

, and then automatically update the prior parameters using the expected maximum algorithm.

S221、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

Make an update, specifically:

在节点

上，将

的消息与

的消息相除，得到

的消息。at node

on, will

news with

Divide the news, get

news.

其中，

是投影操作，即求出

关于

的均值向量

和方差向量

，因为是独立同方差的多维高斯，所以向量

中的每个元素都相等且非对角线元素为0，并输出

。in,

is a projection operation, that is, find

about

The mean vector of

and variance vector

Each element in is equal and the off-diagonal elements are 0, and outputs

.

S222、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

Make an update, specifically:

在节点

上，将

的消息和

相乘然后积掉变量

的消息相除，得到

的消息；其中

是狄拉克Delta函数。at node

on, will

news and

multiply and product the variables

Divide the news, get

news; among them

is the Dirac Delta function.

S223、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

Make an update, specifically:

在

节点上，将

的消息和

的消息相除，得到

的消息；其中，投影操作得到的均值

是作为输出结果的Cox回归系数。exist

on the node, the

news and

Divide the news, get

The message; among them, the mean value obtained by the projection operation

is the Cox regression coefficient as the output result.

S224、根据Cox回归模型的分列式矢量因子图的矩匹配规则，对

Make an update, specifically:

在

节点上，将

的消息和

相乘并积掉变量

的消息相除，得到

的消息。exist

on the node, the

news and

multiply and product variables

Divide the news, get

news.

其中，由于

进行投影操作。Among them, due to

Perform projection operation.

其中，

表示回归系数的近似后验概率；投影得到的均值

即是模型输出的Cox回归系数。in,

That is, the Cox regression coefficient output by the model.

进一步的，所述步骤S23具体为：将步骤S22输出的回归系数

和近似后验概率

，配合期望最大算法，对先验参数

and the approximate posterior probability

, with the expectation maximization algorithm, for the prior parameters

Perform automatic update; the update expression is specifically:

其中，

和

都是关于

的函数，其表达式如下：in,

and

it's all about

function, whose expression is as follows:

其中，

为向量点除，

为向量点乘。in,

For vector point division,

is the vector dot product.

；若Crit值未开始上升，则继续迭代；其中

; If the Crit value does not start to rise, continue to iterate; where

represents a norm.

在一个具体的实施例中，在单次实验下对模拟数据进行回归的性能表现如图6所示，其中黑线为真实值，星号为估计值。In a specific embodiment, the performance of regression on simulated data under a single experiment is shown in Figure 6, where the black line is the real value, and the asterisk is the estimated value.

其中，模拟数据生成方式如下：Among them, the simulated data generation method is as follows:

由独立标准正态抽样生成

。Generated by independent standard normal sampling

.

对

于二项分布B(1,0.8)独立抽样，

，其中删失率为0.2。right

Independently sampled from the binomial distribution B(1,0.8),

, where the censoring rate is 0.2.

从拉普拉斯-伯努利抽样生成

，其中稀疏率为0.2。Generated from Laplacian-Bernoulli sampling

, where the sparsity rate is 0.2.

当

且第i号样本非删失时：when

And when sample i is not censored:

其中

从U(0，1)中独立采样，当

且第i号样本删失时：in

Sampled independently from U(0,1), when

And when sample i is censored:

实施例2Example 2

基于上述实施例1，结合图3，本实施例详细阐述本发明中求解Cox模型的具体过程。Based on the above-mentioned embodiment 1, with reference to FIG. 3 , this embodiment elaborates in detail the specific process of solving the Cox model in the present invention.

在一个具体的实施例中，如图3所示，已知数据为

，

，

，待回归系数为

。In a specific embodiment, as shown in Figure 3, the known data is

,

, the coefficient to be regressed is

.

Step 1：Step 1:

S 1.1：X初始化S 1.1: X initialization

S 1.2：将已有的生存数据（协变量矩阵-X，生存时间-y，删失索引-c）合并成一个矩阵[X,y,c]并根据y降序排序；S 1.2: Merge the existing survival data (covariate matrix-X, survival time-y, censor index-c) into a matrix [X, y, c] and sort them in descending order according to y;

S1.3：将排序后的[X,y,c]代入Cox部分似然函数：S1.3: Substitute the sorted [X,y,c] into the Cox partial likelihood function:

表示该函数为

转移到

的转移概率，这暗示

关于

是归一化的（概率密度函数的特性），而

是Cox部分似然函数，未归一化，所以是正比关系；该函数以

为变量，其第i个元素

，

为

的第i个元素。

means that the function is

move to

The transition probability of , which implies

about

is normalized (property of the probability density function), while

is the Cox partial likelihood function, which is not normalized, so it is a proportional relationship; the function is based on

is a variable whose i -th element

,

for

The i -th element of .

S 1.4：假设先验服从高斯-伯努利分布：S 1.4: Assume that the prior follows a Gauss-Bernoulli distribution:

该函数以

为变量；初始化先验参数

，

，

。This function starts with

as a variable; initialize the prior parameters

,

.

S 1.5：初始化正方向消息：S 1.5: Initialize forward direction message:

其中，初始化

，

，

；

为元素全为0的n维列向量；

为元素为1的n维列向量，下标表示向量的维度大小。Among them, initialize

,

;

is an n-dimensional column vector whose elements are all 0;

is an n-dimensional column vector whose elements are 1, and the subscript indicates the dimension of the vector.

Step 2：基于矩匹配规则在因子图上进行消息传递——期望传播算法（Expectationpropagation）Step 2: Message passing on the factor graph based on moment matching rules - Expectation propagation algorithm (Expectation propagation)

S 2.1：更新

：在

节点上，将

的消息与

相乘并投影到独立同方差的多维高斯分布上，然后除去

的消息：S 2.1: Update

:exist

on the node, the

news with

multiplied and projected onto a multidimensional Gaussian distribution with independent homoscedasticity, and then removes

message:

其中，

是投影操作，即求出

关于

的均值向量

和方差向量

(协方差矩阵的对角线)，因为是独立同方差的多维高斯，所以向量

中的每个元素都相等且非对角线元素为0，并输出

。in,

is a projection operation, that is, find

about

The mean vector of

and variance vector

(diagonal of the covariance matrix), because it is a multidimensional Gaussian with independent homoscedasticity, so the vector

Each element in is equal and the off-diagonal elements are 0, and outputs

.

通过拉普拉斯方法和矩生成函数对

进行化简最终得到：By Laplace method and moment generating function pair

After simplification, we finally get:

其中

即

的方差，

，

为

的黑塞矩阵（

对

的二阶梯度）。in

Right now

Variance,

,

for

The Hessian matrix (

right

second-order gradient).

含义如下：当

是矩阵时取出其对角线，当

是向量时将其张成对角矩阵。

The meaning is as follows: when

is a matrix, take out its diagonal, when

When is a vector, span it into a diagonal matrix.

是对向量求均值，

为向量点除，

为向量点乘。

is the mean value of the vector,

For vector point division,

is the vector dot product.

其中，

采用对

进行二次近似后利用坐标上升算法求解：in,

adopt to

After quadratic approximation, use the coordinate ascending algorithm to solve:

先将

泰勒展开：will first

Taylor expands:

其中，

为

在

处的梯度，

为

在

处的黑塞矩阵。经过改写得到：in,

for

exist

the gradient at

for

exist

The Hessian matrix at . After rewriting:

其中，

，最终将

化简成：in,

, will eventually

Simplifies to:

其中，

是

的第i个元素，然后套用坐标上升算法（Coordinate Ascent）：in,

yes

The ith element of , and then apply the Coordinate Ascent algorithm (Coordinate Ascent):

S 2.1.1：初始化

；S 2.1.1: Initialization

;

S2.1.2：更新

在

处的梯度

，对于

的第k个元素

：S2.1.2: Update

exist

Gradient at

,for

The kth element of

:

S 2.1.3：更新

在

处的黑塞矩阵

，对于

的第k行k列个元素

（为加速计算，只保留对角线元素来近似整个矩阵）：S 2.1.3: Update

exist

Hessian matrix at

,for

The kth row and k column elements of

(To speed up the calculation, only the diagonal elements are kept to approximate the entire matrix):

S2.1.4：更新

：S2.1.4: Update

:

S 2.1.5：更新

：S 2.1.5: Update

:

S2.1.6：更新

的变化，要是变化小到一定程度则输出

；S2.1.6: Update

The change, if the change is small to a certain extent, the output

;

若变化仍然很大则返回S 2.1.2继续迭代。If the change is still large, return to S 2.1.2 to continue iteration.

最后，计算相除部分，输出

：Finally, the division part is calculated, outputting

:

S 2.2：更新

：在

节点上，将

和

相乘然后积掉变量

，并投影到独立同方差的多维高斯分布上，然后除去

的消息：S 2.2: Update

:exist

on the node, the

and

multiply and product the variables

, and projected onto a multidimensional Gaussian distribution with independent homoscedasticity, and then remove

message:

其中，

计算得出：in,

Calculated:

其中，

为元素为1的n维列向量，下标表示向量的维度大小；

含义为：当

是矩阵时取出其对角线，当

是向量时将其张成对角矩阵，

是对向量求均值；

是指求出

的关于

均值向量

和方差向量

，并输出

；

指矩阵求逆，

指矩阵转置。in,

is an n-dimensional column vector with an element of 1, and the subscript indicates the dimension of the vector;

Meaning: when

is a matrix, take out its diagonal, when

When it is a vector, it is stretched into a diagonal matrix,

is to calculate the mean value of the vector;

means to find out

about

mean vector

and variance vector

, and output

;

Refers to matrix inversion,

Refers to the matrix transpose.

最后，计算相除部分，输出

：Finally, the division part is calculated, outputting

:

S 2.3：更新

：在

节点上，将

的和

相乘得到的结果经过投影到独立同方差的多维高斯分布上，然后除去

的消息：S 2.3: Update

:exist

on the node, the

and

The result of multiplication is projected onto a multidimensional Gaussian distribution with independent homoscedasticity, and then removed

message:

其中，

经过计算得出：in,

After calculating:

其中，

和

都是关于

的函数，其表达式如下：in,

and

it's all about

function, whose expression is as follows:

最后，计算相除部分，输出

：Finally, the division part is calculated, outputting

:

其中回归系数的近似后验如下：The approximate posterior of the regression coefficients is as follows:

而投影操作得到的均值

正是要输出的Cox回归系数。And the mean value obtained by the projection operation

Exactly the Cox regression coefficients to output.

S 2.4：更新

：在

节点上，将

和

相乘然后积掉变量

，并投影到独立同方差的多维高斯分布上，然后除去

的消息：S 2.4: Update

:exist

on the node, the

and

multiply and product the variables

message:

其中，

计算得出：in,

Calculated:

最后，计算相除部分，输出

：Finally, the division part is calculated, outputting

:

Step 3：根据S2.3输出近似后验概率

，配合期望最大算法（Expectationmaximization），对先验参数

进行自动更新。Step 3: Output the approximate posterior probability according to S2.3

, with the expectation maximization algorithm (Expectationmaximization), the prior parameters

Make automatic updates.

S 3.1：更新

：S 3.1: Update

:

S 3.2：更新

：S 3.2: Update

:

S 3.3：更新

：S 3.3: Update

:

Step 4：判断是否达到预设的迭代结束条件：Step 4: Determine whether the preset iteration end condition is reached:

结束条件为：The end condition is:

判断其是否开始上升，若

开始上升，则停止迭代过程，输出最终结果回归系数

（S2.3中）。其中

为一范数。Determine whether it starts to rise, if

Start to rise, stop the iterative process, and output the final result regression coefficient

(S2.3). in

is a norm.

实施例3Example 3

基于上述实施例1和实施例2，结合图7，本实施例详细阐述本发明的第二方面一种基于改进Cox模型的癌症基因预后筛选系统。Based on the above-mentioned Example 1 and Example 2, combined with FIG. 7 , this example elaborates the second aspect of the present invention, a cancer gene prognosis screening system based on an improved Cox model.

在一个具体的实施例中，如图7所示，本发明还提供了一种基于改进Cox模型的癌症基因预后筛选系统，包括有存储器和处理器，所述存储器中包括有基于改进Cox模型的癌症基因预后筛选程序，所述基于改进Cox模型的癌症基因预后筛选程序被所述处理器执行时实现如下步骤：In a specific embodiment, as shown in FIG. 7 , the present invention also provides a cancer gene prognosis screening system based on the improved Cox model, including a memory and a processor, and the memory includes a system based on the improved Cox model. Cancer gene prognostic screening program, the cancer gene prognostic screening program based on the improved Cox model realizes the following steps when executed by the processor:

，对第一矩阵

进行预处理，得到第二矩阵

, for the first matrix

Perform preprocessing to get the second matrix

.

附图中描述结构位置关系的图标仅用于示例性说明，不能理解为对本专利的限制。The icons describing the positional relationship of structures in the drawings are only for illustrative purposes, and should not be construed as limitations on this patent.

显然，本发明的上述实施例仅仅是为清楚地说明本发明所作的举例，而并非是对本发明的实施方式的限定。对于所属领域的普通技术人员来说，在上述说明的基础上还可以做出其它不同形式的变化或变动。这里无需也无法对所有的实施方式予以穷举。凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明权利要求的保护范围之内。Apparently, the above-mentioned embodiments of the present invention are only examples for clearly illustrating the present invention, rather than limiting the implementation of the present invention. For those of ordinary skill in the art, other changes or changes in different forms can be made on the basis of the above description. It is not necessary and impossible to exhaustively list all the implementation manners here. All modifications, equivalent replacements and improvements made within the spirit and principles of the present invention shall be included within the protection scope of the claims of the present invention.

Claims

1. A cancer gene prognosis screening method based on an improved Cox model is characterized by comprising the following steps:

s1, collecting the expression quantity of different genes of cancer cells of a cancer patient, collecting survival data of the patient, collating the expression quantity of the different genes of the cancer cells and patient information into a first matrix, and preprocessing the first matrix to obtain a second matrix;

s2, inputting the survival data obtained in the step S1 and the second matrix into a preset Cox regression model, and solving to obtain a regression coefficient;

s3, evaluating the patient risk of the corresponding gene in the regression coefficient according to the risk function of the patient, and screening a prognostic genome corresponding to high patient risk;

and S4, providing guide information for predicting prognosis, relapse and metastasis by using the screened prognostic genome through a biological theory.

2. The method of claim 1, wherein in step S2, the survival data and the second matrix are combined to form a third matrix, and the third matrix is inputted into the predetermined Cox regression model; wherein the third matrix is denoted as [ X, y, c]X represents a covariate matrix, i.e. a second matrix, y represents the time-to-live, c represents the deletionIndexing; wherein the first stepiSurvival data for individual patients is

。

3. The method of claim 2, wherein the first step is to select the improved Cox model based cancer gene prognosisiThe risk function for each of said patients is specifically:

wherein

Is a shared benchmark risk function;

obtaining a regression coefficient for solving the Cox regression model;

is shown asiGene expression levels of individual patients.

4. The method of claim 3, wherein the step S2 of solving the Cox regression model to obtain regression coefficients comprises the following steps:

s21, combining the existing survival data into a third matrix, sequencing according to the survival time of the parameters, constructing a Cox regression model by using the sequenced data, and initializing prior parameters and message transmission parameters;

s22, projecting a high-dimensional message to independent Gaussian distribution through a moment matching rule by using an expected propagation algorithm according to a determinant vector factor graph of the Cox regression model, circularly iterating to solve the model, and outputting a regression coefficient and an approximate posterior probability;

s23, inputting the regression coefficient and the approximate posterior probability into an expected maximum algorithm, and updating prior parameters;

s24, judging whether the regression coefficient reaches a preset iteration ending condition or not; if the preset iteration ending condition is reached, outputting a regression coefficient obtained by the current iteration; and if the preset iteration end condition is not reached, returning to the step S22 for the next iteration.

5. The method of claim 4, wherein the prior parameters include: mean value

Variance, variance

And sparsity ratio

(ii) a The message passing parameters comprise: mean and variance of positive direction messages; the step S21 is specifically: normalizing the X matrix of the covariate matrix, and determining the third matrix as [ X, y, c ] according to the survival time y]Sorting in descending order, and setting the sorted third matrix as [ X, y, c ]]And substituting Cox partial likelihood function to initialize prior parameter and message transfer function.

6. The method of claim 4, wherein the normalization process of the X matrix of the covariate matrix is as follows:

wherein mean (m)X) Is composed ofXMean of the whole elements of the matrix, var: (X) Is composed ofXThe variance of the whole elements of the matrix;

the Cox partial likelihood function is specifically:

wherein,

expressing the function as

Is transferred to

For representing transition probabilities of

About

Is normalized;

the partial likelihood function of Cox is not normalized and represents a direct proportion relation; the function is as follows

Is a variable, the firstiAn element

，

Is composed of

To (1) aiAn element;

the initialization of the prior parameters specifically comprises the following steps: the regression coefficients are subjected to Gaussian-Bernoulli distribution, and the mathematical expression is as follows:

wherein,

representing a dirac Delta function;

represents a mean value of

Variance of

A gaussian distribution of (d); the function is as follows

Is a variable; initializing prior parameters

，

，

；

The initialization of the message transfer function is specifically as follows: initializing a message transfer function of a positive direction message, wherein the mathematical expression of the message transfer function is as follows:

wherein,

is an n-dimensional column vector with elements all 0;

is an n-dimensional column vector with elements all being 1;

is a random variable obeying independent same variance multidimensional Gaussian distribution;

is an n-column dimensional vector with element 1; initialization

，

，

。

7. The method for screening cancer gene prognosis based on improved Cox model as claimed in claim 6, wherein said step S22 is specifically for message transmission on determinant vector factor graph of Cox regression model based on moment matching rule, comprising the following steps:

s221, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model

Updating, specifically:

at a node

In the above, will

Of (2) a message

Multiplying and projecting the result onto a multidimensional Gaussian distribution of independent covariance

Is divided by the message to obtain

The message of (a);

s222, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model

Updating, specifically:

at a node

In the above, will

Of a message and

multiply and then accumulate variables

And projected to independent covarianceOn a multi-dimensional Gaussian distribution, the results obtained by projection are then summed

Is divided by the message to obtain

The message of (2); wherein

Is a dirac Delta function;

s223, matching according to the moment matching rule of the determinant vector factor graph of the Cox regression model

Updating, specifically:

in that

On a node, will

Of a message and

projecting the result obtained by multiplication on the multidimensional Gaussian distribution of independent covariance, and summing the results obtained by projection

Is divided by the message to obtain

The message of (2); wherein the mean value obtained by the projection operation

Is the Cox regression coefficient as the output result;

s224, according to the moment matching rule of the determinant vector factor graph of the Cox regression model, pairing

Updating, specifically:

in that

On a node, will

Of a message and

multiply and accumulate variables

Projecting the result on a multidimensional Gaussian distribution with independent covariance, and then summing the projected results

Is divided by the message to obtain

The message of (2).

8. The method for screening cancer gene prognosis based on improved Cox model according to claim 4, wherein the step S23 is specifically as follows: regression coefficient output from step S22

And approximate posterior probability

Matching with expectation maximization algorithm to prior parameter

Carrying out automatic updating; the updated expression is specifically:

wherein,

and

are all about

Is expressed as follows:

wherein,

the vector points are divided by the vector points,

is a vector dot product.

9. The method for screening cancer gene prognosis based on improved Cox model according to any one of claims 4-8, wherein the iteration end conditions preset in step S24 are specifically:

determining whether to end iteration by judging whether the Crit value starts to rise or not, if the Crit value starts to rise, stopping the iteration process and outputting a regression coefficient of the final iteration

(ii) a If the Crit value does not start to rise, continuing iteration; wherein

Representing a norm.

10. A cancer gene prognosis screening system based on an improved Cox model comprises a memory and a processor, wherein the memory comprises a cancer gene prognosis screening program based on the improved Cox model, and the cancer gene prognosis screening program based on the improved Cox model realizes the following steps when being executed by the processor: