CN105678104A - Method for analyzing health data of old people on basis of Cox regression model - Google Patents

Method for analyzing health data of old people on basis of Cox regression model Download PDF

Info

Publication number
CN105678104A
CN105678104A CN201610209336.8A CN201610209336A CN105678104A CN 105678104 A CN105678104 A CN 105678104A CN 201610209336 A CN201610209336 A CN 201610209336A CN 105678104 A CN105678104 A CN 105678104A
Authority
CN
China
Prior art keywords
risk factor
risk
factor
old man
health data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610209336.8A
Other languages
Chinese (zh)
Inventor
饶云波
刘伟
陆川
廖丹
张明
范柏江
葛丰
王诗琪
徐凡超
李慧
邓建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU RESEARCH INSTITUTE OF UESTC
Original Assignee
CHENGDU RESEARCH INSTITUTE OF UESTC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU RESEARCH INSTITUTE OF UESTC filed Critical CHENGDU RESEARCH INSTITUTE OF UESTC
Priority to CN201610209336.8A priority Critical patent/CN105678104A/en
Publication of CN105678104A publication Critical patent/CN105678104A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention belongs to the technical field of big data analysis and relates to a method for analyzing health data of old people on the basis of a Cox regression model. The method mainly comprises the following steps: collecting the health data of multiple old people as a sample and calculating a regression coefficient of a danger factor, wherein the danger factor is the factor influencing the health of the old people; analyzing the relative danger coefficient RR of the danger factor according to a regression coefficient estimated value acquired from the step a; acquiring a set of the danger factor related to the health data of the old people according to the relative danger coefficient acquired from the step b; and using a Cox survival function for forecasting the morbidity of the individual old people. The method provided by the invention has the beneficial effects that started with the health data of the old people, the big data are utilized to analyze the relationship between the self-factor of the old people and the disease and establish a model for judging the happening of the disease, the forecast for the disease is realized by finally monitoring the vital signs of the old people and the intelligent provision for the old people can be realized.

Description

A kind of aged health data analysing method based on Cox regression model
Technical field
The invention belongs to big data analysis technique field, relate to a kind of aged health data analysing method based on Cox regression model.
Background technology
Intelligence endowment, also referred to as " intelligent home endowment (Smarthomecare) ", is recently popular one endowment concept. Can be proposed the earliest by Britain's life trust fund the earliest, it is collectively referred to as complete intelligent old system, this old-age provision model can allow old man in daily life not by the constraint of time and geographical environment, crosses, at old welfare home, apartment for elderly people, the home for the aged, gerontal rehabilitation center, geriatric nursing home such as hotel type endowment apartment and large-scale endowment community etc., the life that high-quality height is enjoyed.
Aged health data analysis is an important step of " intelligence endowment ". Its to the effect that health risk assessment, is the first step of epidemiology prevention. This research is by collecting aged health data, analyze the quantitative relationship between risk factor and disease such as life of elderly person mode, environment, heredity and set up model, certain specified disease or the probability because of this disease death occur within a certain period of time according to model prediction old man. The risk evaluation model that distinct methods is set up carries out effect assessment from forecasting accuracy (accuracy), model-fitting degree (goodnessoffit), three aspects of reliability (reliability) and compares.
The current analysis to health data is substantially all aspects such as being based on the height to object of study, body weight, vital capacity, body constitution, nutriture, blood pressure, blood glucose and carries out check measurement and then the data obtained are analyzed, then coordinate visit, the method such as questionnaire method analyzes Problems existing from the data obtained and proposes some countermeasures accordingly. This research method is only used for finding existing issue, and the health status of Research Group can not be predicted, and this traditional research method relates to visiting research and questionnaire, there is research cycle relatively long, the shortcomings such as workload is bigger, it is impossible to well meet the requirement of the quick rhythm of life of modern society.
Summary of the invention
The invention aims to overcome the problems referred to above, it is proposed to a kind of achieve the analysis to population data to predict the aged health data analysing method based on Cox regression model of individual health situation.
The technical scheme is that a kind of aged health data analysing method based on Cox regression model, it is characterised in that set Cox regression model as:
H (t, X)=h0(t)exp(β1x12x2+...+βnxn)
Wherein, X=(x1,x2,...,xn) ' it is n-dimensional vector, it represents the observation of n variable of body one by one, represents n covariant of risk function, is n the factor studied and affect aged health situation; H (t, X) represents have the association vector X hazard rates function o f at moment t, h0T () represents the baseline risk rate function of t, namely the value of t covariant X is hazard rates function o f when 0, it can be seen that h0T () is only relevant with the time; βiRepresent corresponding covariant xiRegression coefficient, β is regression coefficient vector;
Set survival function corresponding to Cox regression model as:
S (t; X)=S0(t)exp(X′β)
Wherein, S0T () is the average survival time function of t, the i.e. survival function when risk factor is in average level.
Analysis method comprises the following steps:
A. collecting the health data of multiple old man as sample, estimate the regression coefficient of risk factor, described risk factor is the factor affecting aged health;
B. according to the regression coefficient estimated value obtained in step a, the relative risk coefficients R R of risk factor is analyzed;
C. according to the relative risk coefficient obtained in step b, the set of the risk factor being associated with aged health data is obtained;
D. use Cox survival function that the sickness rate of individual old man is predicted.
Further, step a specifically includes following steps:
A1. assuming to be formed sample by n old man, then need from observation time, obtain k different healthy time period statistical value and n-k different Random censorship at moment t, described Random censorship is the individuality exited in observation time section; The k obtained a different observation is expressed as: t1< t2< ... < tk; If Ri={ j:tj≥tiRepresent tiThe risk set in moment, its implication is at tiDo not occur old man sick before moment and do not occur deleting the individuality of mistake;
A2. likelihood function is adopted
L ( &beta; ) = &Pi; i = 1 k &lsqb; exp ( X ( i ) &beta; ) / &Sigma; i &Element; D i exp ( X i &beta; ) &rsqb;
By above-mentioned likelihood function, it is possible to obtain regression coefficient βiEstimated value.
But the special circumstances according to research, due to statistic, the data of continuous variable are frequently present of many knots, are namely the statistical data the obtained situations that there is the statistic of identical value. If node is a lot, then it is also contemplated that discrete model. If node is relatively fewer, formula can be done a little simplification by us, and is taken into account by these nodes. Then knot is opened by we with below formula, thus optimizing likelihood function is:
Further, step a specifically includes following steps:
A1. assuming to be formed sample by n old man, then need from observation time, obtain k different healthy time period statistical value and n-k different Random censorship at moment t, described Random censorship is the individuality exited in observation time section; The k obtained a different observation is expressed as: t1< t2< ... < tk; If Ri={ j:tj≥tiRepresent tiThe risk set in moment, its implication is at tiDo not occur old man sick before moment and do not occur deleting the individuality of mistake;
A2. likelihood function is adopted
L ( &beta; ) = &Pi; i = 1 k &lsqb; exp ( S ( i ) &beta; ) / &Sigma; i &Element; D i exp ( X i &beta; ) d i &rsqb;
Obtain the estimated value of regression coefficient β; Wherein, k is the k observation obtained; diThe healthy time for old man is equal to tiNumber; Make DiRepresent tiThere is the set of the number of disease in moment old man; S(i)For diThe variable summation of individual individual of sample, namely
Further, described step b method particularly includes:
B1. formula is passed through:
RR=exp (β)
Obtain the relative risk coefficients R R of risk factor;Described relative risk system RR is risk factor xiExpose and unexposed relative risk. RR value is more big, it was shown that the effect of exposure is more big;
B2. risk factor classified statistic is carried out according to the RR value obtained, particularly as follows:
The first kind: RR is 0.9~1 or 1.0~1.1, represents exposure factors and disease onrelevant;
Equations of The Second Kind: RR is 0.7~0.8 or 1.2~1.4, represents that exposure factors has weak associating with disease;
3rd class: RR is 0.4~0.6 or 1.5~2.9, represent exposure factors have with disease in associate;
4th class: RR is 0.1~0.3 or 3.0~9.9, represents that exposure factors has strong associating with disease;
5th class: RR, less than 0.1 or more than 10, represents that exposure factors and disease association are very strong.
Further, described step c method particularly includes:
C1. the classification according to step b2, carries out covariant inspection using each class risk factor as independent one group respectively, particularly as follows:
Assume that certain group has m value, for instance: (β1,...,βm); It is assumed to be H012=...=βm=0,1 < m < P, wherein P is the number studying initial risk factor; H1:(β1,...,βm) in have at least a βiIt is not 0; WillThe logarithm value of largest portion likelihood function is denoted asThe inspection level α of likelihood function is set to 0.05;
May certify that at H0When setting up, statistic obeys card side's distribution that degree of freedom is m, x2Value be:
May certify that at H0When setting up, statistic obeys card side's distribution that degree of freedom is m; Pass through X 2 test, if it is assumed that the probability set up drops within acceptance region, then this group risk factor corresponding to regression coefficient is more weak with the relatedness of aged health situation can be left in the basket, and otherwise cannot be left in the basket, and the risk factor of this group is just incorporated in risk factor set;
C2. it is sequentially completed the regression coefficient hypothesis testing of 5 class risk factor, gets rid of the factor unrelated with aged health, be there is with it the set X of the strongly connected factor.
Further, step d method particularly includes:
If aged health risk factor set X, the X=(x that step c2 obtains1,x2,...,xt) ', by gathering the hazard data (x of single old man1,x2,...,xt), Cox survival function is adjusted by the average risk factor levels and the average attack rate that re-use crowd, it is possible to calculating old man's sickness rate of following 4 years, old man's morbidity risk P computing formula is as follows:
P=1-S0(t)exp(f(x,M))
Wherein, f (x, M)=β1(x1-M1)+β2(x2-M2)+...+βt(xt-Mt), β1,...,βtFor the partial regression coefficient of each risk factor difference layering, x in set1,...xnFor the level of individual each risk factor, M1,...,MnAverage level for each risk factor of this crowd.
Beneficial effects of the present invention is, the present invention starts with from the healthy data of old man, by big data analysis old man's oneself factor and the relation between disease occurs and sets up the judgement pathogenetic model of disease, vital sign eventually through monitoring old man, realize the prediction to disease, it is achieved intelligence endowment.
Detailed description of the invention
Technical scheme is described below in detail:
The main thought of aged health data analysis of the present invention is to lead to old man's Population Health data carry out Cox regression model modeling, the risk factor of this colony's aged health of analyzing influence and danger coefficient thereof and then individuality is carried out disease forecasting. It is known that the chronic diseases such as hypertension, coronary heart disease, osteoporosis are the common diseases of old man, they annoying the normal daily routines of old man. More seriously, these a few class burst diseases cure the life just probably injuring old man when outbreak timely once can not get, and cause the loss that cannot retrieve.Therefore old man's physical condition is carried out risk assessment to have great practical value. Aged health data analysis algorithm presented herein is to affect the healthy factor of old man according to the age of old man, sex, Body Mass Index, blood pressure, body temperature, sterol levels, smoking etc. to carry out Cox regression model modeling, by average risk factor levels and the average attack rate with crowd, Cox survival function is adjusted, it was predicted that the health status of old man also estimates the probability that old man is ill simultaneously. The health level of old man is set to health, subhealth state and unhealthy Three Estate by us. Through system-computed, if the health level of current old man is cited as " subhealth state ", system can be pointed out and allow old man remove examination in hospital; If it is " unhealthy " that system-computed goes out the body health level of old man, thus illustrate that the health of old man is likely to occur comparatively serious health problem, now relevant information also can be reported to care provider and medical worker by system while reminding old man to go to seek medical advice, and allows them that old man carries out physical examination timely and treatment.
The present invention select correct risk factor be by the basis of aged health data analysis as the object studied. Finding by studying, the risk factor of several diseases that old man is common has very big similarity, and therefore the present invention is for coronary heart disease and hypertension both old people's common diseases.
The high risk factor of coronary heart disease, except the coronary heart disease that clinic has diagnosed, also includes Symptomatic carotid disease, peripheral arterial disease, abdominal aortic aneurysm and diabetes. There is any one person in above high risk factor, heart disease or the probability of heart disease recurrence occur for following 4 years 20%, i.e. 4 years cardiac risk>20%. The Major Risk Factors of old man's coronary heart disease include smoking, hypertension (blood pressure>140/90mmHg, or just accepting anti-hypertension treatment), HDL-cholesterol<40mg/dl, cardiovascular disease family history. There is 0~1 Major Risk Factors person, its coming 10 years cardiac risk>10%. Having 2 or more than 2 Major Risk Factors persons, the danger that suffers from a heart complaint can increase by 20%. Other risk factor include obesity, motion less, higher unsaturated fatty acid and High cholesterol diet, homocysteine and lipoprotein levels rising etc.
The another kind of disease that hypertension is often suffered from as old man, its Major Risk Factors has similar very greatly to coronary heart disease. Also include vessel aging except the risk factor of above-mentioned coronary heart disease, blood pressure rises, personality is melancholy with psychentonia, excess salt, high-carbohydrate food intake is excessive and drinks.
The present invention sets up health account based in welfare home, apartment for elderly people, the home for the aged, gerontal rehabilitation center, colony's geriatric nursing home such as hotel type endowment apartment and large-scale endowment community etc., by the risk factor that certain special group of analyzing influence is healthy, and calculate the relative risk (relativerisk of each factor, RR), hypothesis testing is utilized to find out and the set of the strong relation factor of aged health situation. Finally by the health data of real-time collecting individuality old man, with the average risk factor levels of crowd and average attack rate, Cox survival function is adjusted, calculates old man's incidence rate.
Cox proportion grading model is the model of a kind of half parameter, as the term suggests, it is a kind of homing method between parameter and nonparametric. Owing to baseline function be there is no any restriction, estimate predictive factors to healthy impact just with Partial likelihood, gathered parameter model and nonparametric model advantage between the two, be a kind of multifactorial survival analysis method. It can be analyzed with deleting the data losing life span, can analyze the factors impact on life span simultaneously, and not require the distribution pattern of the survival function of estimated data. The present invention adopts COX regression model as analyzing the method for contact between risk factor and aged health situation.
The primitive form of Cox regression model is:
H (t, X)=h0(t)exp(β1x12x2+ ...+βnxn)(1)
X=(x1,x2,...,xn) ' it is n-dimensional vector, it represents the observation of n variable of body one by one, is also n covariant of risk function. H (t, X) represents have the association vector X hazard rates function o f at moment t, h0T () represents the baseline risk rate function of t, namely the value of t covariant X is hazard rates function o f when 0, it can be seen that h0T () is only relevant with the time. X=(x1,x2,...,xn) ' represent corresponding covariant regression coefficient vector.
The survival function that this model is corresponding is:
S (t; X)=S0(t)exp(X′β)(2)
Wherein S0T () is the average survival time function of t, the in time survival function when risk factor is in average level. From statistical viewpoint, Cox regression model must have two basic assumptions:
Proportional hazards supposes: Different Individual has proportional danger control. I.e. h (t; X1)/h (t; X2) for the covariant X of single individuality1=(X11,X12,...,X1n) ' and X2=(X21,X12,...,X2n) ' not the change of t in time and change.
The supposition of log-linear: formula (2) can be changed to:
Lnh (t; X)/h0T ()=X' β, namely to the continuous variable in model, the object risk of any individual i is linear with covariant.
Step A: the present invention adopts the health data of old man in the specific home for the aged as training data, to be provided with the random sample (such as taking n=100) of n old man's composition. From observation time (i.e. t=0 moment), k different healthy time period statistical value and n-k different Random censorship (so-called Random censorship refers to being observed in the time period, owing to some reasons exit the individuality of observation) is obtained at moment t by observing to collect. We obtain k different observation and can be expressed as by order statistic: t1< t2< ... < tk. We make Ri={ j:tj≥tiRepresent tiThe risk set (Riskset) in moment, say, that at tiDo not occur old man sick before moment and do not occur deleting the individuality of mistake, the set of composition has k observation. The statistical value of Cox regression model, is generally adopted the following Partial likelihood occasion at shortage benchmark risk to estimate the regression coefficient in modular form (1).
L ( &beta; ) = &Pi; i = 1 k &lsqb; exp ( X ( i ) &beta; ) / &Sigma; i &Element; D i exp ( X i &beta; ) &rsqb; - - - ( 3 )
For inferring regression coefficient β, L (β) is used as initial likelihood function and processes by us, obtains the estimated value of regression coefficient β. Especially, when under proper condition, by the estimation regression coefficient of (3) formula maximization derivationApproaching normal distribution, the combination that its covariance matrix is the matrix of second derivatives of common lnL (β) is estimated.
Due to statistic, the data of continuous variable are frequently present of many knots (namely ties is that the statistical data obtained exists the situation of identical value and claims statistical data to there is knot). If node is a lot, then it is also contemplated that discrete model. If node is relatively fewer, formula (3) can be done a little simplification by us, and are taken into account by these nodes. Then knot can be opened for replacing formula (3) with below formula by we:
L ( &beta; ) = &Pi; i = 1 k &lsqb; exp ( S ( i ) &beta; ) / &Sigma; i &Element; D i exp ( X i &beta; ) d i &rsqb; - - - ( 4 )
Wherein diThe healthy time for old man is equal to tiNumber. Make DiRepresent tiThere is the set of the number of disease (including death) in moment old man. S(i)For this diThe variable summation of individual individual of sample, namelyWithout knot, then be equivalent to make all of d in formula (4)iBeing 1, such formula just deteriorates to formula (3). There is substantial amounts of knot in statistic to process and propose of formula (4), substantially the effect of two formulas is identical. The derivation about likelihood function used in the present invention program is based on formula (4).Setting up Cox proportional hazards regression models, when asking Partial likelihood function to reach very big, the value of parameter, is the maximum likelihood estimator of regression coefficient β.
Step B: obtaining regression coefficient βiMaximum likelihood estimator after, risk factor xiExposing the computing formula with unexposed relative risk RR (Relativerisk) is:
RR=exp (β) (5)
Relative risk RR shows that exposure group sickness rate is how many times of matched group morbidity. What illustrate is the onset risk of exposure group is the multiple of non-exposed group. RR value is more big, it was shown that the effect of exposure is more big, exposes more big with the intensity associated of final result. In general its numerical value meaning can be summarized as:
The first kind: RR is 0.9~1 or 1.0~1.1, and exposure factors and disease onrelevant are described;
Equations of The Second Kind: RR is 0.7~0.8 or 1.2~1.4, illustrates that exposure factors has weak associating with disease;
3rd class: RR is 0.4~0.6 or 1.5~2.9, illustrate exposure factors have with disease in associate;
4th class: RR is 0.1~0.3 or 3.0~9.9, illustrates that exposure factors has strong associating with disease;
5th class: RR, less than 0.1 or more than 10, illustrates that exposure factors and disease association are very strong.
Step C: substantially can determine the relation between the risk factor studied and aged health situation by RR value, but RR value is in the risk factor in the scope of 0.9~1.1 can not illustrate that this risk factor is really to aged health situation onrelevant, simultaneously because the reason of statistical sample can not illustrate that the RR value risk factor less than 0.1 or more than 10 is necessarily very strong to the association of aged health situation. In order to improve the accuracy of prediction, the present invention is after setting up Cox ratio regression model, need to obtain the variable subset that aged health situation has obvious relation between persistence, be namely worth risk factor to carry out hypothesis testing different RR, it is determined that they impacts on aged health situation.
Risk factor is fallen into 5 types by present invention RR value classification mentioned above, and a class risk factor carries out as independent one group, carries out independent one group of covariant inspection respectively.
Assume that certain group has m value, as: (β1,...,βm). It is assumed to be H012=...=βm=0,1 < m < P (wherein P is the number studying initial risk factor); H1:(β1,...,βm) in have at least a βiIt is not 0. Here willThe logarithm value of largest portion likelihood function is denoted asThe inspection level α of likelihood function is set to 0.05 by the present invention.
Х2Value be:
May certify that at H0When setting up, statistic obeys card side's distribution that degree of freedom is m.
Pass through X 2 test, if it is assumed that the probability set up drops within acceptance region, then this group risk factor corresponding to regression coefficient is more weak with the relatedness of aged health situation can be left in the basket, and otherwise cannot be left in the basket, and the risk factor of this group is just incorporated in risk factor set.
After the hypothesis testing having carried out all five groups of regression coefficients, get rid of the factor unrelated with aged health, be there is with it the set X of the strongly connected factor.
Step D: pass through above step, it is thus achieved that the set X of the factor being associated with aged health, is set to X=(x1,x2,...,xt) '. By the above test to Population Health data, obtain affecting the set X of the association risk factor of aged health, be set to X=(x1,x2,...,xt) and regression coefficient vector β, the β=(β of associated risk factor1,...,βt). The groundwork of this step is the associated risk factor data collecting single old man, Cox survival function is adjusted by the average risk factor level of use crowd and average attack rate, the final prediction realizing utilizing Cox regression model to individual old man carries out health status.Old man's morbidity risk P computing formula of following 4 years is as follows:
P=1-S0(t)exp(f(x,M))(7)
Wherein: f (x, M)=β1(x1-M1)+β2(x2-M2)+...+βt(xt-Mt), β1,...,βtFor the partial regression coefficient of each risk factor difference layering, x in set1,...xnFor the level of individual each risk factor, M1,...,MnAverage level for each risk factor of this crowd.

Claims (6)

1. the aged health data analysing method based on Cox regression model, it is characterised in that set Cox regression model as:
H (t, X)=h0(t)exp(β1x12x2+...+βnxn)
Wherein, X=(x1,x2,...,xn) ' it is n-dimensional vector, it represents the observation of n variable of body one by one, is also n covariant of risk function; H (t, X) represents have the association vector X hazard rates function o f at moment t, h0T () represents the baseline risk rate function of t, namely the value of t covariant X is hazard rates function o f when 0, it can be seen that h0T () is only relevant with the time; X=(x1,x2,...,xn) ' represent corresponding covariant regression coefficient vector; β is regression coefficient vector;
Set survival function corresponding to Cox regression model as:
S (t; X)=S0(t)exp(X′β)
Wherein, S0T () is the average survival time function of t, the i.e. survival function when risk factor is in average level;
Method of then analyzing comprises the following steps:
A. collecting the health data of multiple old man as sample, estimate the regression coefficient of risk factor, described risk factor is the factor affecting aged health;
B. according to the regression coefficient estimated value obtained in step a, the relative risk coefficients R R of risk factor is analyzed;
C. according to the relative risk coefficient obtained in step b, the set of the risk factor being associated with aged health data is obtained;
D. use Cox ecology function that the sickness rate of individual old man is predicted.
2. a kind of aged health data analysing method based on Cox regression model according to claim 1, it is characterised in that described step a includes:
A1. assuming to be formed sample by n old man, then need from observation time, obtain k different healthy time period statistical value and n-k different Random censorship at moment t, described Random censorship is the individuality exited in observation time section; The k obtained a different observation is expressed as: t1< t2< ... < tk; If Ri={ j:tj≥tiRepresent tiThe risk set in moment, its implication is at tiDo not occur old man sick before moment and do not occur deleting the individuality of mistake;
A2. likelihood function is adopted
L ( &beta; ) = &Pi; i = 1 k &lsqb; exp ( X ( i ) &beta; ) / &Sigma; i &Element; D i exp ( X i &beta; ) &rsqb;
Obtain the estimated value of regression coefficient β.
3. a kind of aged health data analysing method based on Cox regression model according to claim 2, it is characterised in that include a further comprising the steps of:
A3. assuming to be formed sample by n old man, then need from observation time, obtain k different healthy time period statistical value and n-k different Random censorship at moment t, described Random censorship is the individuality exited in observation time section; The k obtained a different observation is expressed as: t1< t2< ... < tk; If Ri={ j:tj≥tiRepresent tiThe risk set in moment, its implication is at tiDo not occur old man sick before moment and do not occur deleting the individuality of mistake;
A4. the likelihood function after improving
L ( &beta; ) = &Pi; i = 1 k &lsqb; exp ( S ( i ) &beta; ) / &Sigma; i &Element; D i exp ( X i &beta; ) d i &rsqb;
Obtain the estimated value of regression coefficient β; Wherein, diThe healthy time for old man is equal to tiNumber; Make DiRepresent tiThere is the set of the number of disease in moment old man; S(i)For diThe variable summation of individual individual of sample, namely
4. a kind of aged health data analysing method based on Cox regression model according to claim 3, it is characterised in that step b also includes:
B1. formula is passed through:
RR=exp (β)
Obtain the relative risk coefficients R R of risk factor;Described relative risk system RR is risk factor xiExposing and unexposed relative risk, RR value is more big, it was shown that the effect of exposure is more big;
B2. risk factor classified statistic is carried out according to the RR value obtained, particularly as follows:
The first kind: RR is 0.9~1 or 1.0~1.1, represents exposure factors and disease onrelevant;
Equations of The Second Kind: RR is 0.7~0.8 or 1.2~1.4, represents that exposure factors has weak associating with disease;
3rd class: RR is 0.4~0.6 or 1.5~2.9, represent exposure factors have with disease in associate;
4th class: RR is 0.1~0.3 or 3.0~9.9, represents that exposure factors has strong associating with disease;
5th class: RR, less than 0.1 or more than 10, represents that exposure factors and disease association are very strong.
5. a kind of aged health data analysing method based on Cox regression model according to claim 4, it is characterised in that described step c also includes:
C1. the classification according to step b2, carries out covariant inspection using each class risk factor as independent one group respectively, particularly as follows:
Assume that certain group has m value, for instance: (β1,...,βm); It is assumed to be H012=...=βm=0,1 < m < P, wherein P is the number of the risk factor of preliminary research; H1:(β1,...,βm) in have at least a βiIt is not 0; WillThe logarithm value of largest portion likelihood function is denoted asThe inspection level α of likelihood function is set to 0.05;
Can obtain: χ2Value be: χ2=2 [lnL (All X) lnL (Remove x1,...,xmIn addition all x)];
May certify that at H0When setting up, statistic obeys card side's distribution that degree of freedom is m; Pass through X 2 test, if it is assumed that the probability set up drops within acceptance region, then this group risk factor corresponding to regression coefficient is more weak with the relatedness of aged health situation can be left in the basket, and otherwise cannot be left in the basket, and the risk factor of this group is just incorporated in risk factor set;
C2. it is sequentially completed the regression coefficient hypothesis testing of 5 class risk factor, gets rid of the factor unrelated with aged health, be there is with it the set X of the strongly connected factor.
6. a kind of aged health data analysing method based on Cox regression model according to claim 5, it is characterised in that step d also includes:
If the set X that step c2 obtains is X=(x1,x2,...,xt), Cox survival function is adjusted by the average risk factor levels and the average attack rate that re-use crowd, it is achieved utilize Cox regression model that old man's physical condition is predicted; Old man's morbidity risk P computing formula of following 4 years is as follows:
P=1-S0(t)exp(f(x,M))
Wherein, f (x, M)=β1(x1-M1)+β2(x2-M2)+...+βt(xt-Mt), β1,...,βtFor the partial regression coefficient of each risk factor difference layering, x in set1,...xnFor the level of individual each risk factor, M1,...,MnAverage level for each risk factor of this crowd.
CN201610209336.8A 2016-04-06 2016-04-06 Method for analyzing health data of old people on basis of Cox regression model Pending CN105678104A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610209336.8A CN105678104A (en) 2016-04-06 2016-04-06 Method for analyzing health data of old people on basis of Cox regression model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610209336.8A CN105678104A (en) 2016-04-06 2016-04-06 Method for analyzing health data of old people on basis of Cox regression model

Publications (1)

Publication Number Publication Date
CN105678104A true CN105678104A (en) 2016-06-15

Family

ID=56309347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610209336.8A Pending CN105678104A (en) 2016-04-06 2016-04-06 Method for analyzing health data of old people on basis of Cox regression model

Country Status (1)

Country Link
CN (1) CN105678104A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407643A (en) * 2016-08-03 2017-02-15 无锡金世纪国民体质与健康研究有限公司 Method for establishing health risk assessment system
CN106407498A (en) * 2016-08-03 2017-02-15 无锡金世纪国民体质与健康研究有限公司 Falling risk model establishment method
CN106570320A (en) * 2016-10-25 2017-04-19 兰州大学 Prediction system for old people behavior health
CN107122587A (en) * 2017-03-22 2017-09-01 上海商保通健康科技有限公司 Layer-stepping personalized health trend evaluation system based on big data
CN108198628A (en) * 2017-12-29 2018-06-22 创业软件股份有限公司 A kind of epidemic disease based on motion bracelet big data analysis propagates analysis method
CN108682457A (en) * 2018-04-17 2018-10-19 中国医学科学院阜外医院 Patient's long-term prognosis quantitative forecast and interfering system and method
CN109036555A (en) * 2018-08-16 2018-12-18 芜湖云枫信息技术有限公司 The exercise risk appraisal procedure of cardiovascular diseases risk population
CN109448846A (en) * 2018-09-07 2019-03-08 北京大学 A kind of analysis method for calculating rare sick disease incidence based on medical insurance big data
CN109544919A (en) * 2018-11-22 2019-03-29 北京交通大学 Mixed row website section car transit time estimation method based on reliability model
CN109712716A (en) * 2018-12-25 2019-05-03 广州天鹏计算机科技有限公司 Sickness influence factor determines method, system and computer equipment
CN110767313A (en) * 2019-10-23 2020-02-07 苏州大学 Hypertension risk assessment device based on multi-level Bayesian model
CN110797120A (en) * 2019-10-23 2020-02-14 苏州大学 Ischemic stroke bad outcome risk prediction device integrating epigenetic factors
CN111127225A (en) * 2019-11-25 2020-05-08 泰康保险集团股份有限公司 System, method, apparatus and computer readable medium for insurance underwriting
CN111312393A (en) * 2020-01-14 2020-06-19 之江实验室 Time sequence deep survival analysis system combined with active learning
CN111798984A (en) * 2020-07-07 2020-10-20 章越新 Disease prediction scheme based on Fourier transform
CN113704857A (en) * 2021-09-02 2021-11-26 华中科技大学 Automatic generation method and system for space layout of old-fit house
CN114974598A (en) * 2022-06-29 2022-08-30 山东大学 Lung cancer prognosis prediction model construction method and lung cancer prognosis prediction system
CN115481835A (en) * 2021-05-31 2022-12-16 四川大学 Atmospheric pollutant hazard assessment method based on continuous exposure generalized accurate matching

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407498A (en) * 2016-08-03 2017-02-15 无锡金世纪国民体质与健康研究有限公司 Falling risk model establishment method
CN106407643A (en) * 2016-08-03 2017-02-15 无锡金世纪国民体质与健康研究有限公司 Method for establishing health risk assessment system
CN106570320A (en) * 2016-10-25 2017-04-19 兰州大学 Prediction system for old people behavior health
CN107122587A (en) * 2017-03-22 2017-09-01 上海商保通健康科技有限公司 Layer-stepping personalized health trend evaluation system based on big data
CN108198628B (en) * 2017-12-29 2021-10-22 创业慧康科技股份有限公司 Epidemic disease propagation analysis method based on big data analysis of sports bracelet
CN108198628A (en) * 2017-12-29 2018-06-22 创业软件股份有限公司 A kind of epidemic disease based on motion bracelet big data analysis propagates analysis method
CN108682457A (en) * 2018-04-17 2018-10-19 中国医学科学院阜外医院 Patient's long-term prognosis quantitative forecast and interfering system and method
CN109036555A (en) * 2018-08-16 2018-12-18 芜湖云枫信息技术有限公司 The exercise risk appraisal procedure of cardiovascular diseases risk population
CN109448846A (en) * 2018-09-07 2019-03-08 北京大学 A kind of analysis method for calculating rare sick disease incidence based on medical insurance big data
CN109544919A (en) * 2018-11-22 2019-03-29 北京交通大学 Mixed row website section car transit time estimation method based on reliability model
CN109544919B (en) * 2018-11-22 2020-11-10 北京交通大学 Mixed-driving station road section car passing time estimation method based on reliability model
CN109712716A (en) * 2018-12-25 2019-05-03 广州天鹏计算机科技有限公司 Sickness influence factor determines method, system and computer equipment
CN109712716B (en) * 2018-12-25 2021-08-31 广州医科大学附属第一医院 Disease influence factor determination method, system and computer equipment
CN110767313A (en) * 2019-10-23 2020-02-07 苏州大学 Hypertension risk assessment device based on multi-level Bayesian model
CN110797120A (en) * 2019-10-23 2020-02-14 苏州大学 Ischemic stroke bad outcome risk prediction device integrating epigenetic factors
CN111127225A (en) * 2019-11-25 2020-05-08 泰康保险集团股份有限公司 System, method, apparatus and computer readable medium for insurance underwriting
WO2021143774A1 (en) * 2020-01-14 2021-07-22 之江实验室 Time series deep survival analysis system in combination with active learning
CN111312393A (en) * 2020-01-14 2020-06-19 之江实验室 Time sequence deep survival analysis system combined with active learning
CN111312393B (en) * 2020-01-14 2022-02-22 之江实验室 Time sequence deep survival analysis system combined with active learning
US11461658B2 (en) 2020-01-14 2022-10-04 Zhejiang Lab Time series deep survival analysis system in combination with active learning
CN111798984A (en) * 2020-07-07 2020-10-20 章越新 Disease prediction scheme based on Fourier transform
CN115481835A (en) * 2021-05-31 2022-12-16 四川大学 Atmospheric pollutant hazard assessment method based on continuous exposure generalized accurate matching
CN115481835B (en) * 2021-05-31 2024-02-02 四川大学 Atmospheric pollutant hazard assessment method based on continuous exposure generalized exact match
CN113704857A (en) * 2021-09-02 2021-11-26 华中科技大学 Automatic generation method and system for space layout of old-fit house
CN113704857B (en) * 2021-09-02 2024-04-16 华中科技大学 Automatic generation method and system for space layout of suitable old residence
CN114974598A (en) * 2022-06-29 2022-08-30 山东大学 Lung cancer prognosis prediction model construction method and lung cancer prognosis prediction system
CN114974598B (en) * 2022-06-29 2024-04-16 山东大学 Method for constructing lung cancer prognosis prediction model and lung cancer prognosis prediction system

Similar Documents

Publication Publication Date Title
CN105678104A (en) Method for analyzing health data of old people on basis of Cox regression model
CN110957015B (en) Missing value filling method for electronic medical record data
Escarce et al. Admission source to the medical intensive care unit predicts hospital death independent of APACHE II score
CN110827993A (en) Early death risk assessment model establishing method and device based on ensemble learning
Rafiei et al. SSP: Early prediction of sepsis using fully connected LSTM-CNN model
Politano et al. Predicting the need for urgent intubation in a surgical/trauma intensive care unit
JP2013536971A5 (en)
Liu et al. Natural language processing of clinical notes for improved early prediction of septic shock in the ICU
CN113362954A (en) Postoperative infection complication risk early warning model for old patients and establishment method thereof
CN107066798A (en) A kind of health of heart quality pre-alert system and its method for early warning
CN112786203A (en) Machine learning diabetic retinopathy morbidity risk prediction method and application
KR102169637B1 (en) Method for predicting of mortality risk and device for predicting of mortality risk using the same
RU2352258C2 (en) Method of estimation of total risk of development of cardiovascular diseases, specific for russian population
CN110770848A (en) Risk assessment of disseminated intravascular coagulation
CN116564521A (en) Chronic disease risk assessment model establishment method, medium and system
RU2692667C1 (en) Method for prediction of relapsing myocardial infarction following recurrent myocardial infarction in men younger than 60 years old
Dewi et al. Pediatric logistic organ dysfunction score as a predictive tool of dengue shock syndrome outcomes
CN116451129A (en) Pulse classification and identification method and system
Cutter et al. Methodological issues in weight cycling
Chatchumni et al. Performance of the Simple Clinical Score (SCS) and the Rapid Emergency Medicine Score (REMS) to predict severity level and mortality rate among patients with sepsis in the emergency department
Dunitz et al. Predicting hyperlactatemia in the MIMIC II database
Sancar et al. Body mass index estimation by using an adaptive neuro fuzzy inference system
Umut et al. Prediction of sepsis disease by Artificial Neural Networks
Mui Projecting coronary heart disease incidence and cost in Australia: results from the incidence module of the Cardiovascular Disease Policy Model
RU2650212C1 (en) Method of estimation of overdiagnosis of myocardial infarction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160615