CN113257421A

CN113257421A - Method and system for constructing hypertension prediction model

Info

Publication number: CN113257421A
Application number: CN202110606139.0A
Authority: CN
Inventors: 李平; 陈伯怀
Original assignee: Wuzheng Intelligent Technology Beijing Co ltd
Current assignee: Wuzheng Intelligent Technology Beijing Co ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2021-08-13
Anticipated expiration: 2041-05-31
Also published as: CN113257421B

Abstract

The invention relates to a method and a system for constructing a hypertension prediction model, wherein the method comprises the following steps: screening and determining characteristic variables of a hypertension prediction model based on a statistical method; determining regression coefficients and corresponding scores of the characteristic variables by taking the characteristic variables as factors; and constructing a multi-factor Logistic (logarithmic probability) regression model for hypertension prediction according to the regression coefficient and the corresponding score of each characteristic variable, wherein the multi-factor Logistic regression model is a hypertension risk prediction probability value function corresponding to each characteristic variable, analyzing main influence factors of hypertension, determining multiple risk factors and the occurrence of future hypertension as a quantitative relation by adopting the Logistic regression model, and predicting the incidence probability of the future hypertension of an individual according to the levels of the multiple risk factors.

Description

Method and system for constructing hypertension prediction model

Technical Field

The invention relates to the technical field of health management, in particular to a method and a system for constructing a hypertension prediction model.

Background

Hypertension is one of the chronic diseases with the largest number of patients and is the most important risk factor for cardiovascular and cerebrovascular disease death of urban and rural residents, but the awareness rate, treatment rate and control rate of hypertension are still at a low level overall.

At present, whether the patient has the disease is judged based on the clinical manifestations of the patient, so that the best prevention opportunity of hypertension is easily missed. Therefore, it is important to take measures to prevent hypertension. However, the prior art cannot predict whether hypertension happens in the future, so that timely treatment and prevention cannot be realized.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a method and a system for constructing a hypertension prediction model, which are used for analyzing main influence factors of hypertension, determining a quantitative relation between various risk factors and the future occurrence of hypertension by adopting a Logistic regression model, and predicting the future occurrence probability of hypertension of an individual according to the levels of the various risk factors.

According to a first aspect of the present invention, there is provided a method for constructing a hypertension prediction model, including:

screening and determining characteristic variables of the hypertension prediction model based on a statistical method;

determining regression coefficients and corresponding scores of the characteristic variables by taking the characteristic variables as factors;

and constructing a multi-factor Logistic regression model for hypertension prediction according to the regression coefficient and the corresponding score of each characteristic variable, wherein the multi-factor Logistic regression model is a hypertension risk prediction probability value function corresponding to each characteristic variable.

On the basis of the technical scheme, the invention can be improved as follows.

Optionally, the process of screening and determining the characteristic variables of the hypertension prediction model includes: determining variables influencing hypertension, carrying out correlation analysis according to a binary Logistic regression method to obtain a hidden state P value of each variable, and selecting the variable with the P value smaller than a set threshold value as the characteristic variable.

Optionally, the feature variables include: age, gender, smoking, exercise, family history of hypertension, BMI, diabetes, systolic and diastolic blood pressure.

Optionally, the hypertension risk prediction probability value function is as follows:

wherein i and N respectively represent the serial number and the total number of the characteristic variables,

the value of (B) is β + β i Wij + B S, β i represents a regression coefficient of the i-th feature variable, Wij represents a reference value determined from the value of the i-th feature variable, B is a constant set according to the regression coefficient and the change rate of the reference value, and S represents the sum of the corresponding scores of the respective feature variables.

Optionally, the method for determining the reference value Wij includes:

grouping the values of the characteristic variables;

when the characteristic variable is a numerical variable, grouping of each segmentation range is set according to the numerical range of the characteristic variable, and a middle value is selected as a reference value Wij in each grouping;

and when the characteristic variables are classified variables, setting the characteristic variables into two groups respectively according to the types of the characteristic variables, wherein the reference values Wij of the two groups are 0 or 1.

Optionally, the method for determining the corresponding score poinsij of the ith characteristic variable includes:

selecting a group of characteristic variables as a basic risk reference value WiREF;

calculating the distance D between each characteristic variable group and the basic risk reference value WiREF by combining a regression coefficient beta i, (Wij-WiREF) × beta i;

determining a constant B x β i, x representing an interval of the grouping of the characteristic variables;

and calculating the corresponding score Pointsij (D)/B (Wij-WiREF) beta i/B of the ith characteristic variable.

Optionally, the construction method further includes: and (4) layering the risk of the hypertension according to the probability corresponding to each score, wherein the layering comprises high risk, medium risk and low risk.

According to a second aspect of the present invention, there is provided a system for constructing a hypertension prediction model, including:

the system comprises a characteristic variable screening module, a characteristic variable parameter calculating module and a model constructing module;

the characteristic variable screening module is used for screening and determining the characteristic variables of the hypertension prediction model based on a statistical method;

the characteristic variable parameter calculation module is used for determining a regression coefficient and a corresponding score of each characteristic variable by taking the characteristic variable as a factor;

and the model construction module is used for constructing a multi-factor Logistic regression model for hypertension prediction according to the regression coefficient and the corresponding score of each characteristic variable, and the multi-factor Logistic regression model is a hypertension risk prediction probability value function corresponding to each characteristic variable.

According to a third aspect of the present invention, there is provided an electronic device comprising a memory, and a processor, wherein the processor is configured to implement the steps of the method for constructing the hypertension prediction model when executing a computer management class program stored in the memory.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer management-like program, which when executed by a processor, implements the steps of the method of constructing a hypertension prediction model.

According to the method, the system, the electronic equipment and the storage medium for constructing the hypertension prediction model, provided by the invention, the main influence factors of hypertension are analyzed, the characteristic variables are screened based on statistics, the numerical values of the characteristic variables are classified and assigned, a Logistic regression model is adopted to determine a quantitative relation between various risk factors and the occurrence of future hypertension, the incidence probability of the future hypertension of an individual is predicted according to the levels of the various risk factors, and an instructive suggestion is provided for early screening of clinical hypertension; the risk of hypertension is layered according to the probability, such as high risk, medium risk and low risk, and personalized and specialized health management schemes are provided for different layers.

Drawings

FIG. 1 is a flow chart of a method for constructing a hypertension prediction model according to the present invention;

fig. 2 is a structural diagram of a system for constructing a hypertension prediction model according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an embodiment of a computer-readable storage medium provided in the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

It can be understood that, based on the defects in the background art, the embodiment of the invention provides a method for constructing a hypertension prediction model. Fig. 1 is a flowchart of a method for constructing a hypertension prediction model according to the present invention, and as shown in fig. 1, the method for constructing a hypertension prediction model includes:

and screening and determining characteristic variables of the hypertension prediction model based on a statistical method.

And determining the regression coefficient and the corresponding score of each characteristic variable by taking the characteristic variable as a factor.

And constructing a multi-factor Logistic regression model for hypertension prediction according to the regression coefficient and the corresponding score of each characteristic variable, wherein the multi-factor Logistic regression model is a hypertension risk prediction probability value function corresponding to each characteristic variable and is used for predicting the probability of hypertension of the middle-aged and elderly people in the future 3 years.

The invention provides a method for constructing a hypertension prediction model, which analyzes main influence factors of hypertension, adopts a Logistic regression model to determine a quantitative relation between various risk factors and future hypertension occurrence, and predicts the future hypertension occurrence probability of an individual according to the levels of various risk factors.

Example 1

Embodiment 1 provided by the present invention is an embodiment of constructing a hypertension prediction model provided by the present invention, and as can be seen from fig. 1, the embodiment includes:

Preferably, the process of screening and determining the characteristic variables of the hypertension prediction model comprises: determining variables influencing hypertension, performing correlation analysis according to a binary Logistic regression method to obtain the hidden state P value of each variable, and selecting the variable with the P value smaller than a set threshold value as a characteristic variable.

In specific implementation, some factors, namely variables, which may affect hypertension are determined according to chinese hypertension health management regulations (2019) and recent hypertension demographic tables. The method comprises 15 steps: age, sex, smoking, exercise, family history of hypertension, obesity, diabetes, long-term mental stress, smoking, hyperlipidemia, high salt intake, systolic blood pressure, diastolic blood pressure, excessive drinking and air pollution.

The screening aims to eliminate the variable with poor prediction efficiency from the 15 variables, and screen out the strongly correlated variable to serve as the basis for establishing a subsequent prediction model. Taking out hypertension disease factor data according to health information data provided by a CDC BRFSS database, screening variables by adopting a statistical method, and carrying out correlation analysis according to a binary Logistic regression method to obtain a P value of a single variable; the statistical significance is provided by the P value being less than 0.05, so that factors which have small influence on hypertension are eliminated. The final screened feature variables include: age, gender, smoking, exercise, family history of hypertension, obesity (BMI), diabetes, systolic and diastolic blood pressure. The selected 9 characteristic variables all reached the screening condition (P < 0.05) in the one-factor analysis. Obesity is expressed as BMI, which is weight (kg) divided by height (square meters).

The hypertension risk prediction probability value function is:

By constructing a multi-factor Logistic regression model, the risk factors mainly considered are included in the multi-factor Logistic regression model, so that the regression coefficients beta, OR (Odds ratio) values and 95% CI (Confidence interval) thereof of each factor are estimated. In the multi-factor Logistic regression model, the OR value is 1, which indicates that the factor does not work for the occurrence of diseases; an OR value greater than 1 indicates that the factor is a risk factor; an OR value less than 1 indicates that this factor is a protective factor. If the regression coefficient beta is positive, the logarithm of the dependent variable, namely ln (p/1-p) is also increased along with the increase of the independent variable, and the probability p of the value of the inevitable dependent variable is also increased, but the probability that the value of the dependent variable is low is increased at the moment, the independent variable is associated with the smaller value of the dependent variable; conversely, the regression coefficient β is negative, indicating that the effect of the independent variable on the dependent variable is negative, i.e., negatively correlated.

For example, in the embodiments provided by the present invention,

for example, when the total score S is 5 minutes, the corresponding risk probability value is 5.93%.

In a possible embodiment, the method of determining the reference value Wij comprises:

the values of the individual characteristic variables are grouped.

And when the characteristic variable is a numerical variable, setting groups of each segment range according to the numerical range of the characteristic variable, and selecting a middle value as a reference value Wij in each group.

When the characteristic variables are numerical variables, the risk factors are grouped according to clinical significance or use habits, an appropriate numerical value is selected as a reference value Wij in each group, and a middle value in the group is usually selected as a reference value.

For example, in the present example, the study population is in the age range of 45-84 years, and is usually divided into 5 groups according to an age group of 10 years, and each group selects the middle value as the reference value Wij, for example, the reference value Wij of the group of 45-54 years is (45+54)/2 ═ 44.5.

The systolic blood pressure was in the range of 70-139mmHg, one group of < 110mmHg, and we above 110mmHg were divided into 7 groups per 5mmHg, and the median value was selected as the reference value Wij for each group. For example, the reference value Wij of the group of 120-.

The diastolic pressure ranges from 50 to 89mmHg, one group is less than 70mmHg, and more than 70mmHg we divide each group into 3 groups per 10mmHg, and each group selects the middle value as the reference value Wij. For example, the reference Wij for the group 70-80mmHg is (70+ 80)/2-75.

The BMI ranges from 15 to 50, one group is less than 25, one group is 25 to 29, one group is 30 to 39, one group is more than or equal to 40, and the middle value is selected as the reference value Wij. For example, the reference value Wij of the group 25 to 59 is (25+29)/2 ═ 27.

When the characteristic variable is a classification variable, such as gender, a male can be set as a reference at the moment, namely the reference value Wij is 0, then the female is naturally assigned with a value of 1, and similarly, no smoking is set as 0, and smoking is 1; motion is set to 0 and no motion is 1; the family history of non-hypertension is set as 0, and the family history of hypertension is set as 1; the no diabetes setting is 0 and the diabetes setting is 1.

The method for determining the corresponding score Pointsij of the ith characteristic variable comprises the following steps:

a grouping of characteristic variables is selected as the base risk reference value WiREF.

For each risk factor, an appropriate group needs to be selected as a risk reference value WiREF, when a multi-factor Logistic regression model is constructed, the value of the group is marked as 0, the value of the risk factor is higher than a positive score when the risk factor is larger than the WiREF, the risk is higher when the score is higher, and the risk is opposite to the negative score when the score is lower than the WiREF.

For example, reference values Wij corresponding to age 45-54 years, male, no smoking, exercise, BMI < 25, family history of no hypertension, no diabetes, systolic blood pressure < 110mmHg, diastolic blood pressure < 70mmHg may be selected as the basal risk reference value WiREF for each risk factor.

And calculating the distance D between the grouping of the characteristic variables and the basic risk reference value WiREF by combining the regression coefficient beta i, wherein the distance D is (Wij-WiREF) beta i.

For example, in the embodiment of the present invention, the basal risk reference value WiREF of the age is 44.5, and the regression coefficient β i of the age corresponding to the multifactor Logistic regression model is 0.0575, then the reference value Wij of the age group is 59.5 for the age groups 55-64, and the distance between the age group and the basal risk reference value is (59.5-44.5) × 0.0575 ═ 0.8625. Similarly, the distance D from each group to the base risk reference is calculated by the other risk factors according to the above formula.

The constant B is determined x β i, x representing the interval of the grouping of the characteristic variables.

The constant B is a constant for changing each corresponding risk factor when 1 time is recorded in the set scoring tool. For example, in the embodiment provided by the present invention, if the set age is 1 point every 5 years old, then the constant B is 5 × β i — 5 × 0.0575 — 0.2875.

Finally, the calculated value can be rounded up to obtain the score corresponding to the group.

The scores of each risk factor are added up to calculate the total score, theoretically, when each risk factor takes the lowest value, the lowest value of the total score is 0+0+0+0+0+0+0+0, and the same principle can be used to obtain the highest value of the total score is 4+1+1+1+ 3+2+14+3, which is 0, so that the range of the total score is: 0-30 minutes.

And calculating a corresponding table of the total score and the risk prediction probability.

Preferably, the construction method further comprises: and (4) layering the risk of the hypertension according to the probability corresponding to each score, wherein the layering comprises high risk, medium risk and low risk.

Such as high risk (> 35%), medium risk (10% -35%) and low risk (< 10%), and provides personalized and specialized health management schemes for different layers, including diet, exercise, physical examination, daily attention, preventive measures, medical indication, and health education for hypertension.

To further verify the accuracy of the Logistic regression model, the present invention provides an example to compare the difference between the scoring tool and the original Logistic regression model prediction.

Assuming a male patient, age 75, with a systolic pressure of 129mmHg, a diastolic pressure of 85mmHg, no smoking, exercise, no family history of hypertension, diabetes, BMI 21.5, he was predicted to be at risk for developing hypertension in the next 3 years.

Firstly, according to the scores of all risk factors in the scoring tool, the scores are respectively marked as 0, 4, 6, 3, 0, 1 and 0, the total score is 14, and the risk probability corresponding to table lookup is 31.17%.

Then, carrying out primary calculation according to a multifactor logistic regression model:

and y is 30.23%, and it can be seen that the difference between the scoring tool and the prediction result of the Logistic regression model is only 1%, which is enough to meet the requirement of disease risk prediction and evaluation, and the application is also very intuitive and convenient.

Example 2

Embodiment 2 provided by the present invention is an embodiment of a system for constructing a hypertension prediction model provided by the present invention, fig. 2 is a structural diagram of a system for constructing a hypertension prediction model provided by the embodiment of the present invention, and it can be known from fig. 2 that the embodiment of the system for constructing a hypertension prediction model includes: the device comprises a characteristic variable screening module, a characteristic variable parameter calculating module and a model building module.

And the characteristic variable screening module is used for screening and determining the characteristic variables of the hypertension prediction model based on a statistical method.

And the characteristic variable parameter calculation module is used for determining the regression coefficient and the corresponding score of each characteristic variable by taking the characteristic variable as a factor.

It can be understood that the system for constructing a hypertension prediction model provided by the present invention corresponds to the method for constructing a hypertension prediction model provided in the foregoing embodiments, and the relevant technical features of the system for constructing a hypertension prediction model may refer to the relevant technical features of the method for constructing a hypertension prediction model, and are not described herein again.

Referring to fig. 3, fig. 3 is a schematic diagram of an embodiment of an electronic device according to an embodiment of the invention. As shown in fig. 3, an embodiment of the present invention provides an electronic device, which includes a memory 1310, a processor 1320, and a computer program 1311 stored in the memory 1320 and executable on the processor 1320, where the processor 1320 executes the computer program 1311 to implement the following steps:

screening and determining characteristic variables of a hypertension prediction model based on a statistical method; determining regression coefficients and corresponding scores of the characteristic variables by taking the characteristic variables as factors; and constructing a multi-factor Logistic regression model for hypertension prediction according to the regression coefficient and the corresponding score of each characteristic variable, wherein the multi-factor Logistic regression model is a hypertension risk prediction probability value function corresponding to each characteristic variable.

Referring to fig. 4, fig. 4 is a schematic diagram of an embodiment of a computer-readable storage medium according to the present invention. As shown in fig. 4, the present embodiment provides a computer-readable storage medium 1400, on which a computer program 1411 is stored, which computer program 1411, when executed by a processor, implements the steps of:

The hypertension risk assessment tool is a practical tool for assessing and health guiding the hypertension morbidity risk of middle-aged and elderly people, is mainly suitable for people over 45 years old, establishes a prediction rule through age, sex, smoking, exercise, hypertension family history, BMI, diabetes, SBP, DBP and other factor variables, achieves the hypertension morbidity risk assessment of people to be tested within 3 years in the future, and gives corresponding prompts and suggestions according to different risk stratification and single risk factor levels. Provides guiding suggestions for early screening of clinical hypertension. The risk of hypertension is layered according to the probability, such as high risk, medium risk and low risk, and personalized and specialized health management schemes are provided for different layers.

It should be noted that, in the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to relevant descriptions of other embodiments for parts that are not described in detail in a certain embodiment.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for constructing a hypertension prediction model is characterized by comprising the following steps:

2. The construction method according to claim 1, wherein the process of screening to determine the characteristic variables of the hypertension prediction model comprises: determining variables influencing hypertension, carrying out correlation analysis according to a binary Logistic regression method to obtain a hidden state P value of each variable, and selecting the variable with the P value smaller than a set threshold value as the characteristic variable.

3. The construction method according to claim 1 or 2, wherein the feature variables include: age, gender, smoking, exercise, family history of hypertension, BMI, diabetes, systolic and diastolic blood pressure.

4. The construction method according to claim 1, wherein the hypertension risk prediction probability value function is:

5. The building method according to claim 4, wherein the method of determining the reference value Wij comprises:

grouping the values of the characteristic variables;

6. The construction method according to claim 1, wherein the determination method of the corresponding score poinsij of the ith characteristic variable comprises:

7. The build method of claim 1, further comprising: and (4) layering the risk of the hypertension according to the probability corresponding to each score, wherein the layering comprises high risk, medium risk and low risk.

8. A system for constructing a hypertension prediction model, comprising: the system comprises a characteristic variable screening module, a characteristic variable parameter calculating module and a model constructing module;

9. An electronic device, comprising a memory, and a processor, wherein the processor is configured to implement the steps of the method for constructing a hypertension prediction model according to any one of claims 1-7 when executing a computer management-like program stored in the memory.

10. A computer-readable storage medium, having stored thereon a computer management-like program which, when executed by a processor, carries out the steps of the method of constructing a hypertension prediction model according to any one of claims 1 to 7.