CN116313086A

CN116313086A - Sub-health prediction model construction method, device, equipment and storage medium

Info

Publication number: CN116313086A
Application number: CN202310220390.2A
Authority: CN
Inventors: 党晓兵; 杨志敏; 邹佩芸; 杨小波; 黄鹂; 原嘉民; 陈贤帅; 杜如虚
Original assignee: Guangdong Jianchi Biotechnology Co ltd
Current assignee: Guangdong Jianchi Biotechnology Co ltd
Priority date: 2023-03-08
Filing date: 2023-03-08
Publication date: 2023-06-23

Abstract

The invention discloses a sub-health prediction model construction method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring initial sub-health sample data; preprocessing and labeling the initial sub-health sample data to obtain training data; according to a random forest algorithm, training and optimizing a prediction model by utilizing the training data to obtain an initial random forest model; performing feature selection on the initial random forest model, and selecting an optimal feature variable combination; and carrying out random forest modeling and optimization again on the optimal characteristic variable combination to obtain a sub-health prediction model. The invention ensures the accuracy and rationality of training data, avoids objective sample data from being influenced by subjective judgment, improves the accuracy of sub-health prediction by the model through objective judgment of the model, and reduces the complexity of sub-health prediction and assessment.

Description

Sub-health prediction model construction method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of medical treatment, in particular to a sub-health prediction model construction method, a device, equipment and a storage medium.

Background

Sub-health refers to a state in which the human body is between healthy and diseased. With the increase of social pressure, sub-health state has become a serious problem for people's life. However, the methods for judging sub-health and disease states are different, and a patient in sub-health cannot meet the health standard, and the symptoms of reduced activity, reduced function and reduced adaptability in a certain period of time are represented, but the clinical or sub-clinical diagnosis standards of diseases related to modern medicine are not met.

The prior art can be used for judging the sub-health by utilizing the existing medical diagnosis methods, such as medical history acquisition, evaluation of neuropsychiatric conditions and overall functions, image and laboratory examination and the like, and the patient can be judged to be sub-health if the patient has symptoms which cannot be explained in the prior art for 3 months or more on the basis of excluding diseases which can be explained in the prior art in the medical science according to the comprehensive evaluation flow of the sub-health. The disadvantages of the prior art are mainly represented by: (1) For sub-health judgment standards, the standards for judging diseases are mainly referred, and a plurality of means such as medical images, laboratory examination and the like are combined, so that the evaluation mode is complex; (2) There is no unified method for predicting sub-health, and the influence of subjective judgment of doctors is great.

Disclosure of Invention

The invention provides a sub-health prediction model construction method, device, equipment and storage medium, which are used for solving the technical problems of low accuracy of sub-health prediction and great influence on subjective judgment in the prior art.

In order to solve the above technical problems, an embodiment of the present invention provides a method for constructing a sub-health prediction model, including:

acquiring initial sub-health sample data;

preprocessing and labeling the initial sub-health sample data to obtain training data;

according to a random forest algorithm, training and optimizing a prediction model by utilizing the training data to obtain an initial random forest model;

performing feature selection on the initial random forest model, and selecting an optimal feature variable combination;

and carrying out random forest modeling and optimization again on the optimal characteristic variable combination to obtain a sub-health prediction model.

It can be appreciated that compared with the prior art, the method can obtain training data by preprocessing and labeling the obtained initial sub-health sample data, ensure the accuracy and rationality of the training data, obtain an initial random forest model by training and optimizing a prediction model by utilizing the training data, obtain an objective optimal characteristic variable combination by selecting the characteristics of the initial random forest model, avoid the influence of subjective judgment on the objective sample data, and accurately obtain a sub-health prediction model by carrying out random forest modeling and optimization on the optimal characteristic variable combination again, thereby avoiding complex evaluation modes caused by a plurality of means such as medical images, laboratory inspection and the like in the prior art.

As a preferred scheme, the preprocessing and labeling operations are performed on the initial sub-health sample data to obtain training data, specifically:

removing irrelevant data in the initial sub-health sample data, and performing numerical conversion on text data in the initial sub-health sample data after the removing operation;

filling the feature missing value and combining the feature variables of the initial sub-health sample data after the numerical conversion to obtain initial training data;

and performing secondary elimination on the initial training data, and labeling the initial training data subjected to secondary elimination, so as to obtain training data.

It can be understood that the initial training data is obtained by eliminating irrelevant data from the initial sub-health sample and performing numerical conversion on text data in the sample data after the elimination operation, so that filling of the feature missing values and combination of the feature variables are performed, the influence of a large amount of irrelevant data on the accuracy and training time of subsequent model training is avoided, the operation resource is wasted, and meanwhile, the secondary elimination and labeling operation are performed on the initial training data, so that the training data is accurately obtained, and the model can be accurately trained.

As a preferred scheme, the labeling is performed on the initial training data after the secondary rejection, so as to obtain training data, which specifically includes:

classifying the initial training data after the secondary rejection into two types of sub-health and non-sub-health, and performing sub-health and non-sub-health labeling operation on the initial training data after the secondary rejection according to the classification result, thereby obtaining the training data.

It can be understood that the initial training data after the secondary rejection is classified into two types of sub-health and non-sub-health, so that the two types of data distribution are subjected to labeling operation, the training set and the testing set can be conveniently distinguished in the subsequent model training, and the efficiency and the accuracy of the model training are improved.

As a preferred scheme, the training data is utilized to perform prediction model training and optimization according to a random forest algorithm to obtain an initial random forest model, which specifically comprises the following steps:

splitting the training data into a training set and a testing set according to a preset proportion, and constructing a plurality of decision trees by using the training set according to a random forest algorithm;

verifying a plurality of decision trees according to the test set, thereby completing construction and training of a random forest model;

and searching the maximum tree body and the maximum characteristic quantity of each branch of the random forest model through grid search and five-fold cross verification, and performing model optimization to obtain an initial random forest model. It can be understood that the training data are split into the training set and the testing set, the training set is utilized to construct a plurality of decision trees according to the random forest algorithm, and the testing set is verified, so that the random forest model is obtained through construction and training, and the verification of the testing set ensures the accuracy of the random forest model. And searching the maximum tree body and the maximum characteristic quantity of each branch of the random forest model through grid search and five-fold cross verification, and further obtaining optimal optimization parameters so as to obtain the random forest model meeting the standard, and ensuring the accuracy of selecting the optimal characteristic variables.

As a preferred scheme, the feature selection is performed on the initial random forest model, and an optimal feature variable combination is selected, specifically:

and selecting a new feature set as an optimal feature variable combination through feature importance sequencing.

It can be understood that by sorting the feature importance, a new feature set is selected, so that an optimal feature variable combination is obtained, the feature parameter which has the greatest influence on the random forest model relation in the invention can be accurately selected, and the efficiency and accuracy of model construction due to excessive and complicated feature parameters are reduced.

As a preferred scheme, the optimal characteristic variable combination is subjected to random forest modeling and optimization again to obtain a sub-health prediction model, which is specifically as follows: establishing and training a final random forest model according to the optimal characteristic variable combination;

and optimizing and cross-verifying the trained final random forest model until the final random forest model reaches a preset standard, and obtaining a sub-health prediction model.

It can be understood that the final random forest model is constructed and trained through the optimal characteristic variable combination, so that the final random forest model has stronger applicability compared with the previous random forest model, and the sub-health prediction model meeting the preset standard can be accurately obtained through optimizing and cross-verifying the trained final random forest model, and the model for accurately predicting the sub-health can be obtained without complex equipment or evaluation modes.

Preferably, the initial sub-health sample data comprises: personal basic information, health status information, lifestyle information, stress response information, and family history information.

Correspondingly, the invention also provides a device for constructing the sub-health prediction model, which comprises the following steps: the system comprises a data acquisition module, a data processing module, an initial modeling optimization module, a feature selection module and a final modeling optimization module;

the data acquisition module is used for acquiring initial sub-health sample data;

the data processing module is used for preprocessing and labeling the initial sub-health sample data to obtain training data; the initial modeling optimization module is used for carrying out prediction model training and optimization by utilizing the training data according to a random forest algorithm to obtain an initial random forest model;

the feature selection module is used for carrying out feature selection on the initial random forest model and selecting an optimal feature variable combination;

and the final modeling optimization module is used for carrying out random forest modeling and optimization on the optimal characteristic variable combination again to obtain a sub-health prediction model.

and searching the maximum tree body and the maximum characteristic quantity of each branch of the random forest model through grid search and five-fold cross verification, and performing model optimization to obtain an initial random forest model.

As a preferred scheme, the optimal characteristic variable combination is subjected to random forest modeling and optimization again to obtain a sub-health prediction model, which is specifically as follows:

establishing and training a final random forest model according to the optimal characteristic variable combination;

Correspondingly, the invention also provides a terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, wherein the processor realizes the sub-health prediction model construction method when executing the computer program.

Accordingly, the present invention also provides a computer-readable storage medium including a stored computer program; wherein the computer program, when run, controls a device in which the computer readable storage medium resides to perform the sub-health prediction model construction method as described above.

Drawings

Fig. 1: the method for constructing the sub-health prediction model provided by the embodiment of the invention comprises the following steps of;

fig. 2: the embodiment of the invention provides a specific flow chart of a sub-health prediction model construction method;

fig. 3: the embodiment of the invention provides a structural schematic diagram of a sub-health prediction model building device.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

Referring to fig. 1 and 2, a sub-health prediction model construction method provided by an embodiment of the invention includes the following steps S101-S10:

s101: initial sub-health sample data is obtained.

As a preferred aspect of this embodiment, the initial sub-health sample data includes: personal basic information, health status information, lifestyle information, stress response information, and family history information.

It should be noted that, the initial sub-health sample data may be obtained from a sub-health database, and the influence characteristics of the sub-health sample data may be classified into individual basic information, health status information, life habit information, stress response information, family medical history information, and the like, wherein the individual basic information includes gender, age, height, weight, BMI (Body Mass Index), occupation, education level, job title, marital status, economic status, and the like, the health status information includes sub-health, non-sub-health, and the presence of other diseases, the life habit information includes life work condition, physical exercise condition, smoking condition, drinking habit, diet taste, life work environment, birth condition, and the like, the stress response information includes fatigue stress, economic stress, personal stress, disease stress, external environmental change stress, medical event stress, no stress event, and the like, and the family medical history information includes direct system diseases, allergic history, and the like.

S102: and preprocessing and labeling the initial sub-health sample data to obtain training data.

removing irrelevant data in the initial sub-health sample data, and performing numerical conversion on text data in the initial sub-health sample data after the removing operation; filling the feature missing value and combining the feature variables of the initial sub-health sample data after the numerical conversion to obtain initial training data; and performing secondary elimination on the initial training data, and labeling the initial training data subjected to secondary elimination, so as to obtain training data.

It should be noted that, the rejection of the irrelevant data in the initial sub-health sample data is mainly to reject the sample data containing other diseases in the health status information, so as to avoid the problem that the accuracy of the model constructed later is reduced due to similar characteristics of sub-health status possibly existing in the sample caused by other diseases.

Further, because personal basic information, health state information, life habit information, stress response information and family history information in the initial sub-health sample data are all basically text data, in order to facilitate training of the model, numerical values are used for representing the state of characteristic items, and meanwhile qualitative characteristics are encoded, so that prediction result errors caused by data redundancy are reduced, for example: for the life work and rest conditions, including three levels of basic regularity, constant irregularity and day-night inversion, adopting 0 to represent basic regularity, 1 to represent constant irregularity and 2 to represent day-night inversion; the continuous variable feature is segmented, for example, the values of the age, height, weight, BMI, etc. features are segmented.

It should be noted that after performing the text-numerical conversion, there may be some cases where there are missing values in the feature variables, so it is necessary to fill the feature missing values in the initial sub-health sample data after the numerical conversion, and in this embodiment, for the features with the missing rate greater than 15%, the feature variables are deleted; and for the characteristics with the missing value less than 15%, supplementing missing data by adopting a neighboring value filling method, and ensuring the integrity of the data.

In this embodiment, the features are classified, and the data with the feature types that are obviously repeated are marked and combined, for example: four characteristic variables used for describing life work conditions are obviously repeated, the life work are generally usual, the life work is too easy and the life work is busy, the characteristic variable life work is summarized and summarized on the rest three characteristic variables, namely the life work is generally usual, the life work is too easy and the life work is busy, the characteristic variable life work is only reserved, the rest three characteristic variables, namely the life work is generally usual, the life work is too easy and the life work is busy, so that the characteristic variables are combined, and the accuracy of training data is improved.

In this embodiment, the step of performing secondary culling on the initial training data, for example, searching for multiple co-linearity feature variables except for the tag item (health status information), where correlation between the feature variables is too high, may affect accuracy of the prediction result, and preferably, multiple co-linearity analysis is used. And selecting characteristic variables with larger mutual correlation, performing correlation analysis on the selected variables and the tag characteristics, deleting the characteristic variables with lower correlation with the tag characteristics, and reserving the characteristic variables with higher correlation with the tag characteristics.

As a preferred scheme of this embodiment, the labeling is performed on the initial training data after the second culling, so as to obtain training data, which specifically includes:

In this embodiment, the data are illustratively labeled, and are classified into sub-health and non-sub-health, and sub-health and non-sub-health are respectively labeled with '1' and '0' in the data.

S103: and according to a random forest algorithm, training and optimizing the prediction model by utilizing the training data to obtain an initial random forest model.

As a preferred solution of this embodiment, the training data is used to perform predictive model training and optimization according to a random forest algorithm to obtain an initial random forest model, which specifically includes:

splitting the training data into a training set and a testing set according to a preset proportion, and constructing a plurality of decision trees by using the training set according to a random forest algorithm; verifying a plurality of decision trees according to the test set, thereby completing construction and training of a random forest model; and searching the maximum tree body and the maximum characteristic quantity of each branch of the random forest model through grid search and five-fold cross verification, and performing model optimization to obtain an initial random forest model.

In this embodiment, training data is analyzed by using a random forest algorithm, and illustratively, sub-health data and non-sub-health data are 5230 cases and 2002 cases respectively, health states (0: non-sub-health, 1: sub-health) are set as labels y, remaining feature items are set as feature variables x, and data are expressed in terms of 7: the 3 scale is divided into training sets (xtrain, ytrain) and test sets (xtest, ytest). The data is split and combined into a plurality of decision trees. First, a sample with a put back is taken from the original dataset and split into a plurality of sub-datasets. And secondly, constructing a plurality of sub-decision trees by utilizing the sub-data set, wherein each sub-decision tree outputs a result. Finally, when new data is needed to obtain a classification result through the random forest, the voting result can finally form a random forest prediction result through voting on the judgment result of the sub decision tree. For example, in the multiple decision trees, more than 50% of the tree classification results are non-sub-health classes, and less than 50% of the tree classification results are sub-health classes, and the random forest classification results are non-sub-health classes; otherwise, the sub-health class is defined. Further, through grid search and five-fold cross verification, searching the maximum tree body and the maximum characteristic quantity of each branch of the random forest model, and performing model optimization to obtain optimal optimization parameters so as to obtain an optimized initial random forest model.

It can be understood that the training data are split into the training set and the testing set, the training set is utilized to construct a plurality of decision trees according to the random forest algorithm, the testing set is verified, the random forest model is further constructed and trained, the verification of the testing set ensures the accuracy of the random forest model, the grid search and the five-fold cross verification are utilized to search the maximum tree body and the maximum characteristic quantity of each branch of the random forest model, and then optimal optimization parameters are obtained, so that the random forest model conforming to the standard is obtained, and the accuracy of selecting the characteristic variables is ensured.

S104: and selecting the characteristics of the random forest model, and selecting the optimal characteristic variable combination.

As a preferred scheme, the feature selection is performed on the random forest model, and an optimal feature variable combination is selected, specifically:

In the present embodiment, the optimal feature variable combination includes the first 12 important features, which are respectively: industry occupation, age, title, several fetuses, physical exercise, orthodox no disease, height segment, no stress event, weight segment, highest school level segment, gender, BMI segment.

S105: and carrying out random forest modeling and optimization again on the optimal characteristic variable combination to obtain a sub-health prediction model.

establishing and training a final random forest model according to the optimal characteristic variable combination; and optimizing and cross-verifying the trained final random forest model until the final random forest model reaches a preset standard, and obtaining a sub-health prediction model.

It should be noted that, in this embodiment, according to the selected optimal feature variable combination, that is, 12 features, random forest modeling and optimization are performed, a final random forest model is built and trained, and the trained final random forest model is optimized and cross-validated by re-adopting the optimization model method in step S103, so as to obtain a final sub-health prediction model.

Further, deployment of the final sub-health predictive model can provide sub-health analytical assessment for users of the system. Meanwhile, after the final sub-health prediction model is obtained, the accuracy of the sub-health prediction model can be further verified by inputting sample data for sub-health evaluation test.

The implementation of the above embodiment has the following effects:

compared with the prior art, the embodiment of the invention can obtain training data by preprocessing and labeling the obtained initial sub-health sample data, ensure the accuracy and rationality of the training data, perform optimization and parameter adjustment after obtaining a random forest model by training the constructed random forest model by using the training data, perform feature selection, further obtain objective optimal feature variable combination, prevent objective sample data from being influenced by subjective judgment, and perform random forest modeling and optimization again on the optimal feature variable combination, thereby accurately obtaining a sub-health prediction model, avoiding complex evaluation modes caused by various means such as medical images, laboratory examination and the like in the prior art.

Example two

Referring to fig. 3, the present invention provides a sub-health prediction model construction device, which includes: a data acquisition module 201, a data processing module 202, an initial modeling optimization module 203, a feature selection module 204, and a final modeling optimization module 205.

The data acquisition module 201 is configured to acquire initial sub-health sample data.

The data processing module 202 is configured to perform preprocessing and labeling operations on the initial sub-health sample data to obtain training data.

The initial modeling optimization module 203 is configured to perform prediction model training and optimization by using the training data according to a random forest algorithm, so as to obtain an initial random forest model.

The feature selection module 204 is configured to perform feature selection on the initial random forest model, and select an optimal feature variable combination.

The final modeling optimization module 205 is configured to perform random forest modeling and optimization again on the optimal feature variable combination to obtain a sub-health prediction model.

It will be clear to those skilled in the art that, for convenience and brevity of description, reference may be made to the corresponding process in the foregoing method embodiment for the specific working process of the apparatus described above, which is not described herein again.

The implementation of the embodiment of the invention has the following effects:

Example III

Correspondingly, the invention also provides a terminal device, comprising: a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the sub-health prediction model construction method of any one of the embodiments above when the computer program is executed.

The terminal device of this embodiment includes: a processor, a memory, a computer program stored in the memory and executable on the processor, and computer instructions. The processor, when executing the computer program, implements the steps of the first embodiment described above, such as steps S101 to S105 shown in fig. 1. Alternatively, the processor, when executing the computer program, performs the functions of the modules/units of the apparatus embodiments described above, such as the data processing module 202.

The computer program may be divided into one or more modules/units, which are stored in the memory and executed by the processor to accomplish the present invention, for example. The one or more modules/units may be a series of computer program instruction segments capable of performing the specified functions, which instruction segments are used for describing the execution of the computer program in the terminal device. For example, the data processing module 202 is configured to perform preprocessing and labeling operations on the initial sub-health sample data to obtain training data.

The terminal equipment can be computing equipment such as a desktop computer, a notebook computer, a palm computer, a cloud server and the like. The terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the schematic diagram is merely an example of a terminal device and does not constitute a limitation of the terminal device, and may include more or less components than illustrated, or may combine some components, or different components, e.g., the terminal device may further include an input-output device, a network access device, a bus, etc.

The processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the terminal device, and which connects various parts of the entire terminal device using various interfaces and lines.

The memory may be used to store the computer program and/or the module, and the processor may implement various functions of the terminal device by running or executing the computer program and/or the module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the mobile terminal, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.

Wherein the terminal device integrated modules/units may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as stand alone products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.

Example IV

Correspondingly, the invention further provides a computer readable storage medium, which comprises a stored computer program, wherein the computer program is used for controlling equipment where the computer readable storage medium is located to execute the sub-health prediction model construction method according to any embodiment.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not to be construed as limiting the scope of the invention. It should be noted that any modifications, equivalent substitutions, improvements, etc. made by those skilled in the art without departing from the spirit and principles of the present invention are intended to be included in the scope of the present invention.

Claims

1. The sub-health prediction model construction method is characterized by comprising the following steps of:

acquiring initial sub-health sample data;

2. The method for constructing a sub-health prediction model according to claim 1, wherein the preprocessing and labeling operations are performed on the initial sub-health sample data to obtain training data, specifically:

3. The method for constructing a sub-health prediction model according to claim 2, wherein the labeling of the initial training data after the secondary culling is performed to obtain training data, specifically:

4. The method for constructing a sub-health prediction model according to claim 1, wherein the training data is used for performing prediction model training and optimization according to a random forest algorithm to obtain an initial random forest model, specifically:

5. The method for constructing a sub-health prediction model according to claim 1, wherein the feature selection is performed on the initial random forest model, and an optimal feature variable combination is selected, specifically:

6. The method for constructing the sub-health prediction model according to claim 1, wherein the method for constructing the sub-health prediction model by performing random forest modeling and optimization again on the optimal characteristic variable combination is as follows:

7. The method for constructing a sub-health prediction model according to any one of claims 1 to 6, wherein the initial sub-health sample data comprises: personal basic information, health status information, lifestyle information, stress response information, and family history information.

8. A sub-health prediction model construction apparatus, comprising: the system comprises a data acquisition module, a data processing module, an initial modeling optimization module, a feature selection module and a final modeling optimization module;

the data processing module is used for preprocessing and labeling the initial sub-health sample data to obtain training data;

the initial modeling optimization module is used for carrying out prediction model training and optimization by utilizing the training data according to a random forest algorithm to obtain an initial random forest model;

9. A terminal device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the sub-health prediction model construction method according to any one of claims 1-7 when the computer program is executed.

10. A computer readable storage medium, wherein the computer readable storage medium comprises a stored computer program; wherein the computer program, when run, controls a device in which the computer readable storage medium is located to perform the sub-health prediction model construction method according to any one of claims 1-7.