CN112017785B

CN112017785B - Disease risk prediction system, method, device, equipment and medium

Info

Publication number: CN112017785B
Application number: CN202011200812.2A
Authority: CN
Inventors: 陈天歌
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-11-02
Filing date: 2020-11-02
Publication date: 2021-02-05
Anticipated expiration: 2040-11-02
Also published as: WO2021180244A1; CN112017785A

Abstract

The embodiment of the application discloses a disease risk prediction system, a disease risk prediction method, a disease risk prediction device, disease risk prediction equipment and a storage medium, and is applied to the technical field of medical treatment. Wherein, this disease risk prediction system includes: a risk prediction device and a storage device; the storage device is used for storing diagnosis and treatment data of a user; a risk prediction device for performing the steps of: acquiring diagnosis and treatment data of a plurality of users from a storage device; determining a plurality of first risk factors corresponding to the target disease according to the diagnosis and treatment data; screening a plurality of second risk factors from the plurality of first risk factors according to an objective function comprising the two norms of the plurality of first risk factors, and determining coefficients of the screened second risk factors to determine a risk prediction model; the coefficient is an integer determined from a set of integers; target diagnosis and treatment data of the target user are obtained, and a risk prediction model is called to determine a risk prediction result of the target user for the target disease. By adopting the embodiment of the application, the prediction effect of the disease risk is promoted.

Description

Disease risk prediction system, method, device, equipment and medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a system, a method, an apparatus, a device, and a medium for predicting disease risk.

Background

In the field of medical technology, it is of great significance to predict the risk of a user of a certain disease, for example, accurate risk prediction is helpful to make a diagnosis and treatment plan for a patient, improve the prognosis level of the patient, and the like. Therefore, how to predict the disease risk and improve the prediction effect becomes an urgent problem to be solved.

Disclosure of Invention

The embodiment of the application provides a disease risk prediction system, a disease risk prediction method, a disease risk prediction device and a disease risk prediction medium, which are beneficial to improving the prediction effect of disease risk.

In a first aspect, an embodiment of the present application provides a disease risk prediction system, including: a risk prediction device and a storage device; the storage device is used for storing diagnosis and treatment data of a user;

the risk prediction device is configured to perform the following steps:

acquiring diagnosis and treatment data corresponding to target diseases of a plurality of users from the storage equipment;

determining a plurality of first risk factors corresponding to the target disease according to the diagnosis and treatment data;

screening a plurality of second risk factors from the plurality of first risk factors according to an objective function comprising the second norms of the plurality of first risk factors, determining the coefficients of the screened second risk factors, and determining a risk prediction model based on the coefficients of the second risk factors; wherein the coefficients of the second risk factor are integers determined from a set of integers;

obtaining target diagnosis and treatment data of a target user, calling the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determining a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors.

Optionally, the obtaining target diagnosis and treatment data of a target user, calling the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determining a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors includes:

acquiring target diagnosis and treatment data of a target user, and converting the target diagnosis and treatment data into binary characteristics;

inputting the binary characteristics corresponding to the target diagnosis and treatment data into the risk prediction model to obtain target risk factors corresponding to the binary characteristics and coefficients of the target risk factors, and determining a risk prediction result based on the target risk factors and the coefficients of the target risk factors;

wherein the risk prediction result comprises a prediction score of the target user for the target disease, and the prediction score is the sum of coefficients of the target risk factors.

Optionally, the risk prediction device is further configured to obtain outcome data corresponding to the diagnosis and treatment data, where the outcome data is used to indicate a health state of a user;

the diagnosis and treatment data comprises a plurality of risk factor data; the determining a plurality of first risk factors of a target disease according to the diagnosis and treatment data comprises:

acquiring a plurality of risk factor data included in the diagnosis and treatment data;

and converting the risk factor data into binary characteristics according to the relationship between the risk factor data and outcome data corresponding to the diagnosis and treatment data so as to obtain a plurality of first risk factors.

Optionally, the risk prediction device is further configured to obtain outcome data corresponding to the diagnosis and treatment data;

the screening a plurality of second risk factors from the plurality of first risk factors according to an objective function including the second norms of the plurality of first risk factors, and determining coefficients of the screened second risk factors to determine a risk prediction model based on the coefficients of the second risk factors, including:

determining a plurality of screened second risk factors under the objective function and coefficients of the screened second risk factors according to the plurality of first risk factors corresponding to the diagnosis and treatment data and the outcome data so as to train and obtain the risk prediction model; and determining the target function according to a logistic loss function and a two-norm, wherein the two-norm is used for controlling the number of the screened second risk factors.

Optionally, the risk prediction device is further configured to receive a disease risk prediction request sent by a terminal, where the disease risk prediction request carries an identifier of the target user;

the risk prediction device is specifically configured to obtain the target diagnosis and treatment data according to the identifier of the target user;

the risk prediction equipment is further used for determining a target score threshold according to the type of the target disease and sending early warning information to the terminal when the prediction score included in the risk prediction result is greater than the target score threshold;

the prediction score is the sum of coefficients of the target risk factors, the early warning information comprises risk items corresponding to the target risk factors, the prediction score and a treatment scheme, and the treatment scheme is a treatment scheme corresponding to a user group to which the target user belongs.

Optionally, the storage device is a block link point;

the risk prediction device is further configured to send a diagnosis and treatment data acquisition request to the storage device, where the diagnosis and treatment data acquisition request carries an identifier of the target user;

the storage device is further configured to receive the diagnosis and treatment data acquisition request and verify the identity of the risk prediction device; if the verification is passed, inquiring and acquiring target diagnosis and treatment data of the target user according to the identification of the target user, and sending the diagnosis and treatment data to the risk prediction equipment;

the risk prediction device is specifically configured to receive the target diagnosis and treatment data sent by the storage device.

In a second aspect, an embodiment of the present application provides a disease risk prediction method, including:

acquiring diagnosis and treatment data corresponding to target diseases of a plurality of users;

In a third aspect, an embodiment of the present application provides a disease risk prediction apparatus, including:

the acquisition module is used for acquiring diagnosis and treatment data corresponding to target diseases of a plurality of users;

the determining module is used for determining a plurality of first risk factors corresponding to the target disease according to the diagnosis and treatment data;

the processing module is used for screening a plurality of second risk factors from the plurality of first risk factors according to an objective function comprising the second norms of the plurality of first risk factors, determining coefficients of the screened second risk factors and determining a risk prediction model based on the coefficients of the second risk factors; wherein the coefficients of the second risk factor are integers determined from a set of integers;

the acquisition module is also used for acquiring target diagnosis and treatment data of a target user;

the processing module is further configured to invoke the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determine a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors.

In a fourth aspect, embodiments of the present application provide a risk prediction device, which may include a processor and a memory, where the processor and the memory are connected to each other. Wherein the memory is configured to store a computer program supporting the terminal device to perform the above method or steps, the computer program comprising program instructions, and the processor is configured to call the program instructions to perform some or all of the steps performed by the risk prediction device of the first aspect.

In a fifth aspect, the present embodiments provide a computer-readable storage medium, which stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by a processor, cause the processor to perform some or all of the steps performed by the risk prediction device of the first aspect. Alternatively, the computer-readable storage medium may be non-volatile or volatile.

According to the method and the device, a plurality of risk factors corresponding to the target disease can be determined through the obtained diagnosis and treatment data of a plurality of users, the risk factors are screened out from the risk factors according to a target function comprising two norms of the risk factors, coefficients of the screened risk factors are determined, the coefficients are integers determined from an integer set, a risk prediction model is obtained through training, and then the target diagnosis and treatment data of the target user can be obtained, the risk prediction model is called to determine the target risk factors corresponding to the target diagnosis and treatment data and the coefficients of the target risk factors, so that a risk prediction result of the target user for the target disease is obtained. The embodiment of the application can apply an integer optimization algorithm, realize the control of the number of risk factors and optimize the prediction result by setting an integer constraint condition and a target function based on a two-norm, and thus is beneficial to improving the prediction effect of disease risk.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a disease risk prediction system provided in an embodiment of the present application;

fig. 2 is a schematic flow chart of a disease risk prediction method provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a disease risk prediction apparatus provided in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a risk prediction device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The technical scheme of the application can be applied to a disease risk prediction system, and can be particularly applied to risk prediction equipment (risk prediction device) for realizing prediction of disease risk. Optionally, the risk prediction device may be a terminal, a server, a data platform, or other devices. The terminal may include a mobile phone, a tablet computer, a computer, etc., and the application is not limited. It is understood that in other embodiments, the terminal may also be called other names, such as terminal device, intelligent terminal, user device, user terminal, etc., to name but a few.

In the field of medical technology, it is of great significance to predict the risk of a user of a certain disease, for example, accurate risk prediction is helpful to make a diagnosis and treatment plan for a patient, improve the prognosis level of the patient, and the like. However, the inventor realizes that the current disease risk prediction is mainly based on expert direct scoring or some algorithm scoring, but these methods have limitations, for example, the expert scoring method has the problem of strong subjectivity, and the prediction result is unreliable, while the algorithm method is based on direct rounding according to the model parameters, so that the final result is likely to lose the optimal performance, the prediction capability is lost, the prediction effect is poor, and the requirement of automatic development of the disease risk score cannot be met. The method and the device can determine a plurality of risk factors corresponding to the target disease according to the diagnosis and treatment data of a plurality of users, further screening a plurality of risk factors from the plurality of risk factors, determining the coefficient of each screened risk factor, the coefficient of the risk factor is an integer determined from the integer set, and further the target risk factor corresponding to the target diagnosis and treatment data and the coefficient of each target risk factor can be determined based on the determined coefficient of each risk factor by acquiring the target diagnosis and treatment data of the target user, and can determine the risk prediction result of the target user for the target disease based on the target risk factor and the coefficient of the target risk factor, so that the prediction requirement of the risk prediction can be met by setting an integer constraint condition, and the number of risk factors can be controlled, so that the reliable prediction of the disease risk is realized, and the disease risk prediction effect is improved. Optionally, the method and the device can combine a model algorithm, apply an integer optimization algorithm, meet the prediction requirement of risk prediction by setting an integer constraint condition, solve a model by optimizing a target function based on a two-norm and control the number of risk factors, so as to realize reliable prediction of disease risk and improve the disease risk prediction effect.

The technical scheme of the application can be applied to the technical field of artificial intelligence, smart cities, block chains and/or big data, for example, the technical scheme can be realized through a data platform or other equipment, the related data can be stored through the block chain link points or can be stored in a database, and the application is not limited.

The embodiment of the application provides a disease risk prediction system, a disease risk prediction method, a disease risk prediction device, a disease risk prediction medium and the like, so that the disease risk prediction effect can be improved. The details are described below.

Fig. 1 is a schematic structural diagram of a disease risk prediction system according to an embodiment of the present disclosure. As shown in fig. 1, the disease risk prediction system may include a risk prediction apparatus (risk prediction means) 101 and a storage apparatus (storage means) 102. Wherein the content of the first and second substances,

a storage device 102, operable to store user medical data;

a risk prediction device 101 operable to perform the steps of:

acquiring diagnosis and treatment data of a plurality of users from the storage device 102, such as diagnosis and treatment data corresponding to target diseases;

screening a plurality of second risk factors from the plurality of first risk factors according to an objective function comprising the second norms of the plurality of first risk factors, determining coefficients of the screened second risk factors, and determining a risk prediction model based on the coefficients of the second risk factors; wherein the coefficients of the second risk factor are integers determined from a set of integers;

Optionally, the storage device 102 may also be used to store other data related to the present application, such as risk factors and coefficients of risk factors, etc.

It is to be understood that the storage device and the risk prediction device may be independent devices, that is, may be deployed independently, or the storage device and the risk prediction device may also be deployed in the same device, which is not limited in this application, and fig. 1 only illustrates a scenario of independent deployment. For example, in some embodiments, the storage device and the risk prediction device may be deployed in a server, or alternatively, the storage device may be deployed in the risk prediction device.

Optionally, the clinical data may include vital sign data, examination data, and the like. Further optionally, the user may be a patient suffering from a target disease, such as may be referred to as a target patient. In some embodiments, diagnosis and treatment data corresponding to different diseases may be different, for example, collected diagnosis and treatment data may be determined according to a target disease, and then diagnosis and treatment data corresponding to the target disease may be obtained; alternatively, in some embodiments, the diagnosis and treatment data corresponding to different diseases may be the same, for example, all the diagnosis and treatment data within a preset time range of the user, such as a target patient, may be collected as the diagnosis and treatment data corresponding to the target disease. For example, the collected clinical data may be determined according to the type of the target disease, or the clinical data may be all clinical data of the target user, or may be clinical data within a preset time period (e.g., within the last year). Further optionally, the data may be extracted from the monitoring system, and the storage device is a storage device in the monitoring system, or the data may be extracted from the monitoring system and then stored in the storage device, which is not limited in this application.

Optionally, the medical data may be obtained by processing the acquired raw medical (medical) data, where the processing includes sampling, filling missing values, and so on. For example, raw medical data may be acquired for a patient, including historical baseline data for the patient, which may include multiple visit records, each of which may include various diagnoses, tests, examinations, medications, surgical procedures, and so forth. Further, the historical baseline data may be preprocessed, for example, the vital sign data may be obtained by sampling the acquired original vital sign data in a preset time unit (for example, in a unit of 1 h), and the original vital sign data may be continuous data; for another example, multiple interpolations (multiple interpolations) may be used to fill in missing values for the inspection data. Thereby obtaining the preprocessed diagnosis and treatment data. Further alternatively, the diagnosis and treatment data may be text data, or may be vectors, such as binary features, or two-dimensional feature vectors, and so on.

In some embodiments, outcome data corresponding to the medical data may also be obtained, and the outcome data may be used to indicate a health status of the user. The outcome data may also be referred to as an outcome, clinical outcome, or other name, and is not intended to be limiting. For example, the outcome data may be discharge diagnosis data corresponding to each visit record of the patient, such as death, exacerbation, complication, confirmed target disease, and the like. Optionally, the processing of the office data may be similar to the diagnosis and treatment data, and is not described herein. So as to carry out model training according to the diagnosis and treatment data and the outcome data of the patient to obtain a risk prediction model. Further alternatively, the ending data may be text data, or may be a vector, such as a binary feature, or referred to as a two-dimensional feature vector, or the like.

For example, taking the user as a myocardial infarction patient and the target disease as myocardial infarction as an example, the diagnosis and treatment data may include age, systolic blood pressure, cardiac function grading Killip, and the like. Accordingly, its outcome data may be death or other outcomes.

Optionally, the risk prediction result may include a prediction score of the target user for the target disease, and the prediction score may be a sum of coefficients (weights) of the target risk factors. The target risk factor corresponding to the target diagnosis and treatment data may be part or all of the screened second risk factors.

In some optional embodiments, the prediction score may also be obtained by processing the coefficient of the target risk factor, for example, when the target disease is an infectious disease, the prediction score may be obtained by weighting the coefficient of the target risk factor according to the incidence rate of the target disease in the area where the target user is located, and the like, which is not limited in the present application. Therefore, the accuracy and reliability of the determined disease risk prediction score can be improved.

For example, the higher the incidence of the target disease in the region where the target user is located, the larger the weighting coefficient may be set; conversely, the lower the incidence of the target disease in the region where the target user is located, the smaller the weighting coefficient may be set.

For another example, when the incidence of the target disease in the area where the target user is located is higher than the average incidence of each area, a preset first weighting coefficient is used for weighting, and when the incidence of the target disease in the area where the target user is located is lower than the average incidence of each area, a preset second weighting coefficient is used for weighting, where the first weighting coefficient is greater than the second weighting coefficient.

As another example, the target disease is an infectious disease, and the risk prediction device may obtain a target occurrence rate of the target disease in a target area where the target user is located, and compare the target occurrence rate with an average occurrence rate of the target disease. If the target incidence is higher than the average incidence of the target disease and the difference between the two exceeds a threshold, the coefficients of one or more target risk factors may be weighted (e.g., multiplied by a coefficient greater than 1, or the sum of the coefficients of the target risk factors may be weighted, or a score may be added), and the risk score may be added to the original predicted score to obtain a predicted score. Thereby contributing to further improving the reliability of disease risk prediction.

In some embodiments, the risk factor may be a binary feature. Optionally, when the risk prediction device 101 obtains target diagnosis and treatment data of a target user, invokes the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determines a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors, the risk prediction method may be specifically configured to: acquiring target diagnosis and treatment data of a target user, and converting the target diagnosis and treatment data into binary characteristics; inputting the binary characteristics corresponding to the target diagnosis and treatment data into the risk prediction model to obtain target risk factors corresponding to the binary characteristics and coefficients of the target risk factors, and determining a risk prediction result based on the target risk factors and the coefficients of the target risk factors.

In some embodiments, the clinical data may include a plurality of risk factor data. Further, when determining multiple risk factors of a target disease according to the diagnosis and treatment data, the risk prediction device 101 may convert the risk factor data into a binary feature by obtaining multiple risk factor data included in the diagnosis and treatment data and according to a relationship between the risk factor data and outcome data corresponding to the diagnosis and treatment data, so as to obtain multiple risk factors. Wherein the risk factor data may be a variable that affects the clinical outcome of the target disease.

In some embodiments, the risk prediction device 101 is further configured to obtain outcome data corresponding to the clinical data. Further, when the risk prediction apparatus 101 selects a plurality of second risk factors from the plurality of first risk factors according to an objective function including the second norms of the plurality of first risk factors, and determines coefficients of the selected second risk factors, so as to determine a risk prediction model based on the coefficients of the second risk factors, it may specifically be configured to: and determining a plurality of screened second risk factors under the objective function and coefficients of the screened second risk factors according to the plurality of first risk factors corresponding to the diagnosis and treatment data and the outcome data so as to train and obtain the risk prediction model. The two norms can be used for controlling the number of screened risk factors, namely the number of second risk factors. Alternatively, the objective function may be determined based on a logistic loss function and a two-norm.

In some embodiments, the risk prediction device 101 may be further configured to receive a disease risk prediction request sent by the terminal, where the disease risk prediction request carries the identifier of the target user;

the risk prediction device 101 may be specifically configured to obtain the target diagnosis and treatment data according to the identifier of the target user;

the risk prediction device 101 may further be configured to determine a target score threshold according to the type of the target disease, and send early warning information to the terminal when a prediction score included in the risk prediction result is greater than the target score threshold.

The terminal sending the disease risk prediction request may be any terminal, such as a terminal of a doctor, a patient, or another user, and the present application is not limited thereto. In some embodiments, the terminal may be a specific terminal, such as a legal terminal that is verified, or may obtain the target diagnosis and treatment data after receiving the disease risk prediction request and successfully verifying the terminal. For example, the risk prediction device may receive the disease risk prediction request and, after receiving the request, verify the identity of the terminal; and if the verification is passed, triggering to acquire the target diagnosis and treatment data. Optionally, the checking mode may be various. For example, the disease risk prediction request may also carry an identity of the terminal, and may verify whether the identity exists in a terminal white list, and if so, the check is passed; otherwise, the check fails. For another example, the disease risk prediction request may be encrypted by using a preset public key, and if the request is decrypted successfully based on the private key, the verification is determined to be passed; otherwise, the verification fails; and so on, not to mention here. Optionally, the disease risk prediction request may also carry an identification of the target disease and/or a type of the target disease.

Optionally, the early warning information includes a risk item corresponding to the target risk factor, the prediction score, and a treatment plan. Further optionally, the treatment plan may be a treatment plan corresponding to the user group to which the target user belongs.

Further optionally, the scoring thresholds corresponding to different disease types (or diseases) may be different, and may be determined according to risk levels corresponding to the disease types (or diseases). For example, the higher the risk rating corresponding to a disease type (or disease), the lower the score threshold corresponding to that disease type (or disease) may be; conversely, the lower the risk rating for a disease type (or disease), the higher the score threshold for that disease type (or disease) may be. Thereby helping to improve the flexibility of the information early warning operation.

Alternatively, the storage device 102 may be a blockchain node, and the medical data may be obtained from a blockchain. That is, the clinical data of each patient may be stored in the blockchain in advance. Through the diagnosis and treatment data of the user obtained from the block chain nodes, the reliability of the obtained diagnosis and treatment data can be improved, and the reliability of the disease risk determined based on the diagnosis and treatment data is improved.

For example, in some embodiments, the risk prediction device 101 may be further configured to send a medical data obtaining request to the storage device 102, where the medical data obtaining request carries the identifier of the target user;

the storage device 102 may be further configured to receive the diagnosis and treatment data acquisition request, and check the identity of the risk prediction device; if the verification is passed, inquiring and acquiring target diagnosis and treatment data of the target user according to the identification of the target user, and sending the diagnosis and treatment data to the risk prediction equipment;

the risk prediction device 101 may be specifically configured to receive the target medical data sent by the storage device, so as to obtain the target medical data.

In this embodiment of the application, the risk prediction device 101 may determine a plurality of risk factors corresponding to a target disease according to medical data of a plurality of users, such as a plurality of target patients, acquired from the storage device 102, screen a plurality of second risk factors from the plurality of first risk factors according to an objective function including two norms of the plurality of risk factors, and determine coefficients of the screened second risk factors from an integer set, so as to train and obtain a risk prediction model, and may further determine target risk factors corresponding to the acquired target medical data (for example, the target medical data may be acquired from the storage device 102) and coefficients of the target risk factors by calling the risk prediction model, so as to obtain a risk prediction result of the target user for the target disease. The embodiment of the application can apply an integer optimization algorithm, realize the control of the number of risk factors and optimize the prediction result by setting an integer constraint condition and a target function based on a two-norm, and thus is beneficial to improving the prediction effect of disease risk.

Referring to fig. 2, fig. 2 is a schematic flow chart of a disease risk prediction method provided in an embodiment of the present application. The method may be performed by the risk prediction apparatus described above, and as shown in fig. 2, the disease risk prediction method may include the steps of:

201. and acquiring diagnosis and treatment data corresponding to target diseases of a plurality of users.

Wherein the user may be a patient with a target disease. The medical data (sample) may include physical sign data, examination data, etc., which are not described herein.

For example, taking the user as a myocardial infarction patient and the target disease as myocardial infarction as an example, the diagnosis and treatment data may include age, systolic blood pressure, cardiac function grading Killip, and the like.

Optionally, the diagnosis and treatment data may be acquired from the blockchain, that is, the diagnosis and treatment data of each patient may be stored in the blockchain in advance, which is not described herein again. By acquiring diagnosis and treatment data of the user from the block chain, the reliability of disease risk predicted based on the diagnosis and treatment data can be improved.

202. And determining a plurality of first risk factors corresponding to the target disease according to the diagnosis and treatment data.

Alternatively, the risk factor may be a feature vector, such as a binary feature.

In some embodiments, the risk prediction device may be further configured to obtain outcome data corresponding to the clinical data.

After the diagnosis and treatment data of the target disease are obtained, the risk factor of the target disease, namely the first risk factor, can be determined. The risk factor may be a vector of partial data of the user diagnosis and treatment data, or a vector of data obtained by processing the diagnosis and treatment data, or the diagnosis and treatment data may be directly composed of a plurality of risk factors, and the like, which is not limited in the present application.

In some embodiments, when determining a plurality of risk factors corresponding to a target disease according to the medical data, the obtained medical data including a plurality of wind direction factor data may be converted into a binary feature. For example, the risk factor data may be feature engineered to convert the risk factor data into binary features suitable for use in integer optimization algorithms.

In some embodiments, the clinical data may include a plurality of risk factor data, and each risk factor referred to herein may be a binary characteristic. Optionally, when determining a plurality of risk factors corresponding to the target disease, a plurality of risk factor data included in the diagnosis and treatment data may be obtained, and then the risk factor data is converted into a binary feature according to a relationship between the risk factor data and outcome data corresponding to the diagnosis and treatment data, so as to obtain a plurality of risk factors. That is, when determining the risk factor according to the clinical data, the clinical data may be converted into a binary feature according to a relationship between the risk factor data (i.e., a variable that affects the clinical outcome) and the outcome.

Optionally, when the diagnosis and treatment data is converted into the binary features according to the relationship between the risk factor data and the outcome, the conversion of the binary features may be performed according to the parameter value corresponding to the risk factor data and the critical value corresponding to the outcome data, where the critical value may be used to indicate the risk information of the outcome. For example, the binary features of the risk factor data corresponding to the critical value below are the same, the binary features of the risk factor data corresponding to the critical value above are the same, and the like, which is not limited in the present application. Optionally, the threshold may be one or more, and if there are more than one threshold, the binary features of the risk factor data of the corresponding interval of each threshold may be the same.

For example, if the relationship between the risk factor data x (e.g., Killip for cardiac function classification) and the outcome y (e.g., whether myocardial infarction occurs) is layered according to the threshold c (e.g., Killip is higher than II, which significantly increases the risk of myocardial infarction), the risk factor data x may be segmented according to the threshold c, and the original continuous variable may be converted into a binary coding variable.

203. Screening a plurality of second risk factors from the plurality of first risk factors according to an objective function comprising the second norms of the plurality of first risk factors, determining coefficients of the screened second risk factors, and determining a risk prediction model based on the coefficients of the second risk factors; wherein the coefficients of the second risk factor are integers determined from a set of integers.

The screened multiple risk factors, namely the second risk factor, are partial risk factors in the multiple first risk factors corresponding to the target disease, and the partial risk factors can be used for representing key variables of the outcome of the target disease.

That is, the risk prediction model may be trained based on binary features, risk factors may be screened out from the plurality of risk factors by optimizing an objective function based on a binary norm, and coefficients of the risk factors may be determined.

In some embodiments, the risk prediction model may be trained as follows: and determining a plurality of screened second risk factors under the objective function and coefficients of the screened second risk factors according to the plurality of first risk factors corresponding to the diagnosis and treatment data and the outcome data so as to train and obtain the risk prediction model. Optionally, the objective function may be determined according to a logistic loss function and a two-norm, where the two-norm is used to control the number of screened risk factors.

Wherein, the risk prediction model satisfies an optimization objective function and sets integer constraint conditions. For example, the objective function may be as follows:

wherein the content of the first and second substances,

a coefficient vector that can represent the risk factors, i.e., a coefficient such as a score corresponding to each second risk factor;

may be a logistic loss function;

can represent the corresponding two-norm of risk factor set;

may be a set of integers. For example, the loss function may be expressed as follows:

where n may represent the number of samples in the data,

for the clinical outcome (outcome data) corresponding to sample i,

is a feature vector of sample i, such as the first risk factor.

The objective function of the present application may be a two-norm part (b) that adds a risk factor based on a conventional logistic loss function (b:)

) The method aims to achieve automatic screening of the optimal risk factor subset by adjusting the number of the risk factors selected by the parameter C control model while obtaining the optimal solution. Further defining integer constraints such that parameters

From a set of integers

The purpose is to make the model solving result meet the actual requirement of disease risk scoring.

Therefore, the optimal risk factor combination and the coefficient corresponding to each risk factor can be solved by minimizing the objective function meeting the constraint condition based on the feature vector containing n samples and the data set of the outcome

And finishing the model training. Coefficient of performance

A risk score may be expressed for each risk factor, e.g., a positive value representing an increased risk of occurrence of a clinical outcome and a negative score representing a decreased risk.

Therefore, the diagnosis and treatment data of the user can be acquired, and the disease risk score of the user can be judged and obtained based on the risk factors and the coefficients corresponding to the diagnosis and treatment data.

In other alternative embodiments, a plurality of risk factors may be screened from the plurality of risk factors in other manners, and coefficients of the screened risk factors may be determined. For example, the risk factors and the coefficients may be screened according to the probability (e.g., the percentage of a certain risk factor appearing in all samples) or the number (e.g., the number of samples of a certain risk factor appearing in all samples) of the risk factors involved in the diagnosis and treatment data of a plurality of users, such as a plurality of target patients, and the N risk factors with the highest probability or number of the risk factors may be used as the screened risk factors, i.e., the second risk factor, and the larger the probability or number of the risk factors is, the larger the coefficient corresponding to the risk factor is (e.g., the probability interval or number interval of the risk factors may be set, and the coefficient corresponding to each probability interval or number interval may be the same), and the coefficient is an integer, and N is an integer greater than 2.

204. And acquiring target diagnosis and treatment data of a target user.

The target diagnosis and treatment data may include sign data, examination and examination data, and the like, and may include various diagnosis, examination, medicine, and operation items.

Optionally, the target diagnosis and treatment data may be obtained by processing the acquired raw diagnosis and treatment data. For example, raw clinical data of the target user may be obtained, which may include a plurality of visit records of the target user, each visit record may include data of various diagnoses, tests, examinations, drugs, surgical procedures, and the like. Further, the original diagnosis and treatment data can be preprocessed, so that the preprocessed diagnosis and treatment data is obtained, which is not described herein any more.

Optionally, the risk prediction device may be triggered based on a request of a user when implementing disease risk prediction for the user, such as acquiring the target diagnosis and treatment data, or may be actively triggered for a specific user, or may be triggered in other manners, which is not limited in this application.

For example, in some embodiments, the risk prediction device may further receive a disease risk prediction request sent by the terminal, where the disease risk prediction request carries the identifier of the target user. And then the target diagnosis and treatment data can be obtained according to the identification of the target user.

In some embodiments, the risk prediction device may further send a diagnosis and treatment data acquisition request to the storage device, where the diagnosis and treatment data acquisition request carries the identifier of the target user. The storage device can receive the diagnosis and treatment data acquisition request, and can verify the identity of the risk prediction device after receiving the diagnosis and treatment data acquisition request; and if the verification is passed, inquiring and acquiring target diagnosis and treatment data of the target user according to the identification of the target user, and sending the diagnosis and treatment data to the risk prediction equipment. Therefore, the risk prediction equipment can receive the target diagnosis and treatment data sent by the storage equipment so as to obtain the target diagnosis and treatment data. Optionally, the storage device may be a blockchain node, or may be a server or other storage device.

Optionally, the checking may be performed in one or more ways. For example, the medical data acquisition request may also carry an identity of the risk prediction device, and the storage device may verify whether the identity of the risk prediction device exists in a preset white list, and if the identity of the risk prediction device exists in the white list, the verification passes; otherwise, the check fails. If the diagnosis and treatment data acquisition request is successfully decrypted based on the private key, the storage device determines that the verification is passed; otherwise, the verification fails; for another example, the storage device may be verified based on other methods, which are not listed here.

205. And calling the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determining a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors.

Optionally, the risk prediction result may include a prediction score of the target user for the target disease, where the prediction score may be a sum of coefficients of the target risk factors, or may be obtained by processing the coefficients of the target risk factors, and details are not repeated here.

In some embodiments, when determining the risk prediction result of the target disease, the risk prediction device may convert the obtained target diagnosis and treatment data into a binary feature, input the binary feature corresponding to the target diagnosis and treatment data into the risk prediction model, obtain target risk factors corresponding to the binary feature and coefficients of the target risk factors, and determine the risk prediction result based on the target risk factors and the coefficients of the target risk factors.

That is to say, when a disease risk prediction result of a target user is obtained, the target diagnosis and treatment data may be converted into a binary feature, and a trained risk prediction model is invoked to process the binary feature, so as to obtain a risk score of the target user for the target disease. For example, the coefficients of the risk factors determined by the risk prediction model are summed to obtain the final risk score, that is, the scores corresponding to the real values of the risk factors finally selected by the algorithm are summed to obtain the final risk score of the target user.

According to the method and the device, original data can be converted into a binary characteristic form which can be directly input into an integer optimization algorithm, the integer optimization algorithm is applied, the assigning requirements of risk scoring are met by setting an integer constraint condition, model solution is realized by optimizing a target function based on two norms, the number of risk factors is controlled, and the reliable prediction of disease risk is realized.

For example, taking the prediction of the risk of in-hospital death of myocardial infarction patients as an example, the risk factors screened by the risk prediction model include 4 risk factors of cardiac arrest, age, Killip and systolic blood pressure, and the risk factors are respectively assigned with coefficients (scores) of 2, 1 and 1, and then are assigned with a risk score of 5. That is, if a patient had a history of cardiac arrest, two points are added to the total risk score, and so on. From which the risk score of the user can be determined quickly.

In some embodiments, the risk prediction device may further determine a target score threshold according to the type of the target disease, and send warning information to the terminal when a prediction score included in the risk prediction result is greater than the target score threshold. The prediction score is the sum of the coefficients of the target risk factors, and the early warning information may include information such as risk items, the prediction score, and treatment schemes corresponding to the target risk factors.

Optionally, the treatment plan may be a treatment plan corresponding to the user group to which the target user belongs. Further optionally, the user cohort to which the target user belongs may be the cohort for which the net benefit under the treatment regime is greatest. For example, user clustering may be achieved based on the net benefit of the treatment plan, and the user cluster with the greatest net benefit under each treatment plan is obtained. Therefore, when the treatment scheme is recommended to the user, the net benefit can be pushed, for example, the treatment scheme with the maximum net benefit corresponding to the user group to which the target user belongs is recommended to the target user. Therefore, the optimal cost-effective treatment scheme recommendation conforming to the health economics is provided for the user, the most cost-effective treatment mode is selected for the patient on the premise of providing effective treatment, the economy of the patient is favorably reduced, and the medical insurance burden is reduced.

In the embodiment of the application, the risk prediction device may determine a plurality of risk factors corresponding to a target disease through the obtained diagnosis and treatment data of a plurality of users, screen the plurality of risk factors from the plurality of risk factors according to an objective function including a two-norm of the plurality of risk factors, and determine coefficients of the screened risk factors from an integer set, so as to train and obtain a risk prediction model, and then determine the target risk factors corresponding to the obtained target diagnosis and treatment data and the coefficients of the target risk factors by calling the risk prediction model, so as to obtain a risk prediction result of the target user for the target disease. The method and the device can apply an integer optimization algorithm, and get through the processes of feature selection, model parameter learning and risk factor assigning, so that the problem of subjectivity of the traditional method is avoided; the method can also be combined with an automatic data preprocessing mode, such as data filling and characteristic engineering, and automatically converts the original data into a form which can be directly input into an integer optimization algorithm; the scheme is convenient and fast in flow, high in automation degree, and results meet clinical requirements of disease risk scoring, so that the method can be used by clinicians without algorithms and development experiences.

It is to be understood that the above embodiments of the method are all illustrations of the disease risk prediction method or system of the present application, and the description of each embodiment has a respective emphasis, and reference may be made to the related descriptions of other embodiments for those parts that are not described in detail in a certain embodiment.

The embodiment of the application also provides a disease risk prediction device. The apparatus may include means for performing the method of fig. 2 as previously described. Please refer to fig. 3, which is a schematic structural diagram of a disease risk prediction apparatus according to an embodiment of the present application. The disease risk prediction apparatus described in this embodiment may be configured in a risk prediction device, as shown in fig. 3, the disease risk prediction apparatus 300 of this embodiment may include:

an obtaining module 301, configured to obtain diagnosis and treatment data corresponding to target diseases of multiple users;

a determining module 302, configured to determine a plurality of first risk factors corresponding to a target disease according to the diagnosis and treatment data;

a processing module 303, configured to screen a plurality of second risk factors from the plurality of first risk factors according to an objective function including the two-norm of the plurality of first risk factors, and determine coefficients of the screened second risk factors, so as to determine a risk prediction model based on the coefficients of the second risk factors; wherein the coefficients of the second risk factor are integers determined from a set of integers;

the obtaining module 301 is further configured to obtain target diagnosis and treatment data of a target user;

the processing module 303 is further configured to invoke the risk prediction model to determine target risk factors and coefficients of the target risk factors corresponding to the target diagnosis and treatment data, and determine a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors.

In some embodiments, when the processing module 303 invokes the risk prediction model to determine target risk factors and coefficients of the target risk factors corresponding to the target diagnosis and treatment data, and determines a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors, the following steps may be specifically performed:

converting the target diagnosis and treatment data into binary characteristics;

In some embodiments, the obtaining module 301 may be further configured to obtain outcome data corresponding to the diagnosis and treatment data, where the outcome data is used to indicate a health status of a user;

the diagnosis and treatment data comprises a plurality of risk factor data; when determining the multiple risk factors of the target disease according to the diagnosis and treatment data, the determining module 302 may specifically perform the following steps:

and converting the risk factor data into binary characteristics according to the relationship between the risk factor data and outcome data corresponding to the diagnosis and treatment data so as to obtain a plurality of risk factors.

In some embodiments, the obtaining module 301 is further configured to obtain outcome data corresponding to the diagnosis and treatment data;

the processing module 303 may specifically perform the following steps when screening a plurality of second risk factors from the plurality of first risk factors according to an objective function including the two norms of the plurality of first risk factors, and determining coefficients of the screened second risk factors, so as to determine a risk prediction model based on the coefficients of the second risk factors:

In some embodiments, the obtaining module 301 is further configured to receive a disease risk prediction request sent by a terminal, where the disease risk prediction request carries an identifier of the target user;

the obtaining module 301 is further configured to obtain the target diagnosis and treatment data according to the identifier of the target user;

the determining module 302 is further configured to determine a target score threshold according to the type of the target disease, and send early warning information to the terminal when a prediction score included in the risk prediction result is greater than the target score threshold;

In some embodiments, the storage device is a block link point;

the obtaining module 301 is further configured to send a diagnosis and treatment data obtaining request to the storage device, where the diagnosis and treatment data obtaining request carries an identifier of the target user, so that the storage device verifies an identity of the risk prediction device, and if the verification passes, the storage device queries and obtains target diagnosis and treatment data of the target user according to the identifier of the target user, and sends the diagnosis and treatment data to the risk prediction device;

the obtaining module 301 is specifically configured to receive the target diagnosis and treatment data sent by the storage device.

It can be understood that each functional module of the disease risk prediction apparatus of this embodiment can be specifically implemented according to the method in the above method embodiment fig. 2, and the specific implementation process thereof can refer to the related description of the above method embodiment fig. 2, which is not described herein again.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a risk prediction apparatus according to an embodiment of the present disclosure. As shown in fig. 4, the risk prediction apparatus may include: a processor 401 and a memory 402. Optionally, the risk prediction device may further comprise a communication interface 403. The processor 401, the memory 402 and the communication interface 403 may be connected by a bus or other means, and fig. 4 shows an example of the connection by the bus in the embodiment of the present application. Wherein the communication interface 403 is controllable by the processor for transceiving messages, the memory 402 is operable to store a computer program comprising program instructions, and the processor 401 is operable to execute the program instructions stored by the memory 402. Wherein the processor 401 is configured to call the program instruction to perform the following steps:

acquiring target diagnosis and treatment data of a target user;

and calling the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determining a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors.

In some embodiments, when the processor 401 invokes the risk prediction model to determine target risk factors and coefficients of the target risk factors corresponding to the target diagnosis and treatment data, and determines a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors, the following steps may be specifically performed:

converting the target diagnosis and treatment data into binary characteristics;

In some embodiments, the processor 401 may further perform:

acquiring ending data corresponding to the diagnosis and treatment data, wherein the ending data is used for indicating the health state of a user;

the diagnosis and treatment data comprises a plurality of risk factor data; when determining the plurality of first risk factors of the target disease according to the diagnosis and treatment data, the processor 401 may specifically perform the following steps:

In some embodiments, the processor 401 may further perform the steps of:

acquiring ending data corresponding to the diagnosis and treatment data;

when the processor 401 selects a plurality of second risk factors from the plurality of first risk factors according to an objective function including the second norms of the plurality of first risk factors, and determines coefficients of the selected second risk factors, so as to determine a risk prediction model based on the coefficients of the second risk factors, the following steps may be specifically performed:

In some embodiments, the processor 401 may further perform the steps of:

receiving a disease risk prediction request sent by a terminal through a communication interface 403, where the disease risk prediction request carries an identifier of the target user;

acquiring the target diagnosis and treatment data according to the identification of the target user;

determining a target score threshold according to the type of the target disease, and sending early warning information to the terminal when the prediction score included in the risk prediction result is greater than the target score threshold;

In some embodiments, the storage device is a block link point; the processor 401 may further perform the following steps:

sending a diagnosis and treatment data acquisition request to the storage device through a communication interface 403, where the diagnosis and treatment data acquisition request carries an identifier of the target user, so that the storage device verifies the identity of the risk prediction device, and if the verification passes, the storage device queries and acquires target diagnosis and treatment data of the target user according to the identifier of the target user, and sends the diagnosis and treatment data to the risk prediction device;

the target diagnosis and treatment data sent by the storage device is received through the communication interface 403.

It should be understood that, in the embodiment of the present Application, the Processor 401 may be a Central Processing Unit (CPU), and the Processor 401 may also be other general-purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 402 may include both read-only memory and random access memory, and provides instructions and data to the processor 401. A portion of the memory 402 may also include non-volatile random access memory. For example, the memory 402 may also store clinical data of the user.

The communication interface 403 may include an input device, such as a control panel, a microphone, a receiver, etc., and/or an output device, such as a display screen, a transmitter, etc., to name but a few.

In a specific implementation, the processor 401, the memory 402, and the communication interface 403 described in this embodiment of the present application may execute the implementation described in the method embodiment described in fig. 2 provided in this embodiment of the present application, and may also execute the implementation of the disease risk prediction apparatus described in this embodiment of the present application, which is not described herein again.

Also provided in embodiments of the present application is a computer-readable storage medium storing a computer program, where the computer program includes program instructions, and when the program instructions are executed by a processor, the computer program instructions may perform some or all of the steps performed in the above-mentioned disease risk prediction method embodiments, such as some or all of the steps performed by a risk prediction device.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the steps executed in the above-mentioned disease risk prediction apparatus method embodiments.

In some embodiments, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A disease risk prediction system, comprising: a risk prediction device and a storage device; the storage device is used for storing diagnosis and treatment data of a user;

the risk prediction device is configured to perform the following steps:

determining a plurality of second risk factors under a minimized objective function and coefficients of the second risk factors according to the first risk factors and the outcome data so as to determine a risk prediction model; wherein the objective function is determined according to a logistic loss function and a two-norm, and minimizing the objective function is:

，

a coefficient representing each of the second risk factors;

representing a logistic loss function;

expressing the second norm, wherein the second norm is used for controlling the number of the screened second risk factors, and the coefficients of the second risk factors are integers determined from an integer set;

2. The system of claim 1,

the risk prediction equipment is further used for acquiring ending data corresponding to the diagnosis and treatment data, and the ending data is used for indicating the health state of the user;

the diagnosis and treatment data comprises a plurality of risk factor data; the determining a plurality of first risk factors corresponding to a target disease according to the diagnosis and treatment data includes:

3. The system according to claim 1, wherein the obtaining target diagnosis and treatment data of a target user, invoking the risk prediction model to determine target risk factors corresponding to the target diagnosis and treatment data and coefficients of the target risk factors, and determining a risk prediction result of the target user for the target disease based on the target risk factors and the coefficients of the target risk factors comprises:

4. The system of any one of claims 1-3, wherein the risk prediction result comprises a prediction score for the target user for the target disease;

the risk prediction device is further configured to receive a disease risk prediction request sent by a terminal, where the disease risk prediction request carries an identifier of the target user;

the risk prediction equipment is further used for determining a target score threshold according to the type of the target disease and sending early warning information to the terminal when the prediction score is larger than the target score threshold;

5. The system of any of claims 1-3, wherein the storage device is a block link point;

the storage device is further configured to receive the diagnosis and treatment data acquisition request and verify the identity of the risk prediction device; if the verification is passed, inquiring and acquiring target diagnosis and treatment data of the target user according to the identification of the target user, and sending the target diagnosis and treatment data to the risk prediction equipment;

6. A method of predicting disease risk, comprising:

，

a coefficient representing each of the second risk factors;

representing a logistic loss function;

7. A disease risk prediction device, comprising:

the acquisition module is further used for acquiring ending data corresponding to the diagnosis and treatment data, and the ending data is used for indicating the health state of a user;

the processing module is used for determining a plurality of second risk factors and coefficients of the second risk factors under a minimized objective function according to the plurality of first risk factors and the outcome data so as to determine a risk prediction model; wherein the objective function is determined according to a logistic loss function and a two-norm, and minimizing the objective function is:

，

a coefficient representing each of the second risk factors;

representing a logistic loss function;

8. A risk prediction device comprising a processor and a memory, said processor and said memory being interconnected, wherein said memory is adapted to store a computer program comprising program instructions, said processor being configured to invoke said program instructions to perform the steps performed by the risk prediction device in the system according to any of claims 1-5.

9. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to carry out the steps performed by the risk prediction device in the system according to any one of claims 1-5.