WO2018188533A1

WO2018188533A1 - Health model construction method, terminal and storage medium for health assessment

Info

Publication number: WO2018188533A1
Application number: PCT/CN2018/082173
Authority: WO
Inventors: 李菲菲; 徐亮; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2017-04-10
Filing date: 2018-04-08
Publication date: 2018-10-18
Also published as: CN107818824A

Abstract

Disclosed in the present application are a health model construction method, a terminal and a storage medium for health assessment. The method comprises: obtaining user information; preprocessing the user information to obtain a sample data set; dividing the sample data set into a training set and a test set; constructing a health model on the basis of data in the training set and a preset algorithm; and optimizing the health model on the basis of the data in the test set to evaluate the health of the user according to the optimized health model and to obtain a health assessment result of the user.

Description

Health model construction method, terminal and storage medium for health assessment

This application claims priority to Chinese Patent Application filed on April 10, 2017, the Chinese Patent Office, Application No. 201710229172.X, entitled "A Health Model Construction Method and Terminal for Health Assessment", all of which The content is incorporated herein by reference.

Technical field

The present application relates to the field of data processing technologies, and in particular, to a health model construction method, a terminal, and a storage medium for health assessment.

Background technique

Users often know that their health is not good after they have symptoms of the disease. They only go to the hospital for examination or visit, which leads to check-up or late treatment. They have missed the best treatment time or caused an increase in treatment costs, so they can predict the user's Health is a vital thing for the user. In the existing method, there is a method for estimating the health status of the user by using the data model, but the health model is based on the relatively complete user health data, so when the health prediction is made, the data integrity provided by the user will affect the estimation result. The accuracy of the user is difficult to accurately predict the health of the user when the user information is not complete compared to the sample data established by the model.

Summary of the invention

The embodiment of the present application provides a health model construction method, a terminal, and a storage medium for health assessment, which can accurately predict the health status of the user.

In a first aspect, an embodiment of the present application provides a health model construction method for health assessment, the method comprising:

Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health;

Pre-processing the user information to obtain a sample data set;

Dividing the sample data set into a training set and a test set;

Constructing a health model according to the data in the training set and a preset algorithm;

The health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.

In a second aspect, the embodiment of the present application provides a terminal, where the terminal includes:

An obtaining unit, configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health;

a preprocessing unit, configured to preprocess the user information to obtain a sample data set;

a dividing unit, configured to divide the sample data set into a training set and a test set;

a building unit, configured to construct a health model according to the data in the training set and a preset algorithm;

And an optimization unit, configured to optimize the health model according to the data of the test set, to evaluate the current user's health according to the optimized health model to obtain a current user's health assessment result.

In a third aspect, an embodiment of the present application further provides a terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer The program is implemented: acquiring user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health; preprocessing the user information to obtain a sample data set; and the sample data is The set is divided into a training set and a test set; a health model is constructed according to the data in the training set and a preset algorithm; and the health model is optimized according to the data of the test set to perform current user health according to the optimized health model The assessment gets the health assessment results of the current user.

In a fourth aspect, the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health; preprocessing the user information to obtain a sample data set; and dividing the sample data set into a training set and a test set; constructing a health model according to the data in the training set and a preset algorithm; optimizing the health model according to the data of the test set to evaluate current health of the current user according to the optimized health model User's health assessment results.

The health model construction method for health assessment provided by the embodiments of the present application can improve the accuracy of the user health assessment.

DRAWINGS

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are some embodiments of the present application, For the ordinary technicians, other drawings can be obtained based on these drawings without any creative work.

1 is a schematic flow chart of a health model construction method for health assessment provided by an embodiment of the present application;

2 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application;

3 is a schematic diagram of a sub-flow of a health model construction method for health assessment provided by an embodiment of the present application;

4 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application; FIG.

FIG. 6 is a schematic block diagram of a terminal according to an embodiment of the present disclosure;

7 is a schematic block diagram of a pre-processing unit provided by an embodiment of the present application;

FIG. 8 is a schematic block diagram of a preprocessing unit according to another embodiment of the present application; FIG.

FIG. 9 is a schematic block diagram of a terminal according to another embodiment of the present disclosure;

FIG. 10 is a schematic block diagram of a terminal according to another embodiment of the present application.

detailed description

The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.

FIG. 1 is a schematic flowchart diagram of a health model construction method for health assessment according to an embodiment of the present application. The method includes S101 to S106.

S101. Acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health. Such as health-related height, weight, physical examination data, user health files, medical payment information and other characteristics, health-related hobbies, lifestyle habits, consumer, social and other characteristics. When obtaining user information, not only the feature information related to health is acquired, but also the feature information not related to health is obtained, that is, the feature information of different dimensions is included, and the health status of the user is comprehensively expressed.

S102. Preprocess the user information to obtain a sample data set. There are multiple user information. The obtained multiple user information is preprocessed to obtain a sample data set.

Specifically, as shown in FIG. 2, S102 includes S201-S205. S201. Filter user information whose user information integrity is higher than a preset value. The user information integrity is quantified, and the user information whose user information integrity is higher than the preset value is selected. S202. Calculate, according to the filtered user information and the preset health score rule, the health score corresponding to each filtered user information. The default health score rule can be a health score rule given by an expert, or it can be a default health score rule already in the industry. S203. Construct a sample according to the filtered user information and the health score corresponding to each user information. S204. Identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample. Such as abnormal feature information of height more than 3m. Among them, the statistical discriminant method is used to find out the value of the large data containing a large error. Specifically, comparing the value of each object of the variable with the first preset data (such as the mean value), if the absolute value of the comparison result is greater than the second preset data (such as three times the standard deviation), The value of the object is considered to be a value of a coarse error. S205, if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method, To form a sample data set. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. Wherein, the mean interpolation method takes the average value of the values of all other objects of the variable (the variable is a numerical type) or the value of the variable having the highest number of times (the variable is non-numeric) to fill or replace the interpolation The multi-interpolation method is to construct m (m>1) substitute values for each missing value or outlier, thereby generating m complete data sets corresponding to the variable, and then adopting the same for each data set. The data analysis method is processed to obtain m processing results, and the processing results are integrated, and an estimated value of the interpolation value is obtained based on a certain principle.

Specifically, as shown in FIG. 3, in other embodiments, S102 includes S301-S303. S301. Identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information. S302. If there is at least one user information with missing or abnormal feature information, fill one feature information missing in the user information according to the interpolation method or replace one feature information of the abnormality in the user information according to the interpolation method. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. S303. Perform feature reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set. After using the PCA for dimensionality reduction, a part of the feature with low correlation can be removed to obtain a feature with high correlation. Among them, PCA transforms the original data into a set of linearly independent representations of each dimension through linear transformation, which can be used to extract the main feature components of the data, which is often used for dimensionality reduction of high-dimensional data.

S103. The sample data set is divided into a training set and a test set. Preferably, the sample data of the preset proportion is randomly sampled from the sample data set to form a training set, and the remaining sample data form a test set. Preferably, the preset ratio is 70%, that is, 70% of the sample data is randomly sampled from the sample data set to form a training set, and the remaining 30% is used as a test set.

S104. Construct a health model according to the data in the training set and a preset algorithm. Preferably, the preset algorithm is a regression algorithm. Preferably, if the user information is preprocessed by using the method shown in FIG. 2, a logistic regression (LR) and a Gradient Boosting Decision Tree (GBDT) combination are used to establish a combined regression model, and the Gaussian positive is selected. State distribution function. GBDT is a nonlinear model. Each iteration creates a decision tree in the direction of decreasing the residual gradient. How many decision trees are generated by iteration, and the path of the decision tree is used as the LR input feature. Preferably, if the user information is preprocessed using the method shown in FIG. 3, the Bernoulli distribution function is selected using the GBDT model.

S105. Optimize the health model according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result. The data of the test set is used to adjust the parameters of the constructed health model to obtain an optimized health model. If the adjustment parameter is used, the user's information in the test set uses the health model and the standard deviation or root mean square error of the calculated health score gradually approaches zero. Among them, the parameters of the health model include the number of decision trees, the depth of the tree, and the like. For the current user, inputting the current user's user information, the user can be evaluated for health using the optimized health model to obtain the current user's health assessment result.

As shown in FIG. 4, if the interpolation method involved is the mean interpolation method, the steps of filling the missing one of the feature information in the user information according to the interpolation method mentioned in S205 and S302 include S401-S403. S401. Acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value. S402. Calculate an average value of the data corresponding to the missing feature information in the pieces of user information. S403. Fill the average value with a value corresponding to one feature information that is missing in the user information. This padding method further improves the integrity of user information.

As shown in FIG. 5, if the interpolation method involved is the mean interpolation method, as mentioned in S205 and S302, the step of replacing one feature information of the abnormality in the user information according to the interpolation method, including S501-S503 . S501. Acquire a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value. S502. Calculate an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information. S503. Fill the average value with a value corresponding to one feature information of the abnormality in the user information. This replacement method further improves the accuracy of user information.

The user information obtained by the foregoing method embodiment includes not only health-related feature information, but also feature information that is not related to health, that is, includes feature information of multiple different dimensions, and preprocesses the user information to obtain a high degree of integrity. The sample data set divides the sample data set into a training set and a test set to construct a health model according to the data in the training set and a preset algorithm to optimize the health model according to the data of the test set. The health model and the optimized health model are constructed with a plurality of different dimensions of user information and a high integrity sample data set and a preset algorithm, so that the use of the optimized health model is more accurate for user health assessment.

FIG. 6 is a schematic block diagram of a terminal according to an embodiment of the present application. The terminal 60 includes an obtaining unit 601, a pre-processing unit 602, a dividing unit 603, a building unit 604, and an optimizing unit 605.

The obtaining unit 601 is configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health. Such as health-related height, weight, physical examination data, user health records, medical payment information and other characteristics, health-related hobbies, lifestyle habits, consumer, social and other characteristics. When obtaining user information, not only the feature information related to health is acquired, but also the feature information not related to health is obtained, that is, the feature information of different dimensions is included, and the health status of the user is comprehensively expressed.

The pre-processing unit 602 is configured to pre-process the user information to obtain a sample data set. There are multiple user information. The obtained multiple user information is preprocessed to obtain a sample data set.

Specifically, as shown in FIG. 7, the pre-processing unit 602 includes a screening unit 701, a calculation unit 702, a sample construction unit 703, a first identification unit 704, and a first interpolation unit 705. The filtering unit 701 is configured to filter user information whose user information integrity is higher than a preset value. The user information integrity is quantified, and the user information whose user information integrity is higher than the preset value is selected. The calculating unit 702 is configured to calculate, according to the filtered user information and the preset health score rule, the health score corresponding to each filtered user information. The default health score rule can be a health score rule given by an expert, or it can be a default health score rule already in the industry. The sample construction unit 703 is configured to construct a sample according to the filtered user information and the health score corresponding to each user information. The first identifying unit 704 is configured to identify, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample. Such as abnormal feature information of height more than 3m. Among them, the statistical discriminant method is used to find out the value of the large data containing a large error. Specifically, comparing the value of each object of the variable with the first preset data (if desired), if the absolute value of the comparison result is greater than the second preset data (such as the standard deviation), the object is considered The value of the value is the value of the coarse error. The first interpolation unit 705 is configured to: if there is at least one user information with missing or abnormal feature information, fill one feature information missing in the user information according to the interpolation method or abnormality in the user information according to the interpolation method A feature information is replaced to form a sample data set. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. Wherein, the mean interpolation method takes the average value of the values of all other objects of the variable (the variable is a numerical type) or the value of the variable having the highest number of times (the variable is non-numeric) to fill or replace the interpolation The multi-interpolation method is to construct m (m>1) substitute values for each missing value or outlier, thereby generating m complete data sets corresponding to the variable, and then adopting the same for each data set. The data analysis method is processed to obtain m processing results, and the processing results are integrated, and an estimated value of the interpolation value is obtained based on a certain principle.

Specifically, as shown in FIG. 8 , in other embodiments, the pre-processing unit 602 includes a second identification unit 801 , a second interpolation unit 802 , and a dimension reduction unit 803 . The second identifying unit 801 is configured to identify, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information. The second interpolation unit 802 is configured to: if there is at least one user information that is missing or abnormal in the feature information, fill in a feature information that is missing in the user information according to the interpolation method, or to abnormalize the user information according to the interpolation method. A feature information is replaced. User information with missing or abnormal feature information is filled or replaced to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. The dimension reduction unit 803 performs dimensionality reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set. After using the PCA for dimensionality reduction, a part of the feature with low correlation can be removed to obtain a feature with high correlation. Among them, PCA transforms the original data into a set of linearly independent representations of each dimension through linear transformation, which can be used to extract the main feature components of the data, which is often used for dimensionality reduction of high-dimensional data.

The dividing unit 603 is configured to divide the sample data set into a training set and a test set. Preferably, the sample data of the preset proportion is randomly sampled from the sample data set to form a training set, and the remaining sample data form a test set. Preferably, the preset ratio is 70%, that is, 70% of the sample data is randomly sampled from the sample data set to form a training set, and the remaining 30% is used as a test set.

The building unit 604 is configured to construct a health model according to the data in the training set and a preset algorithm. Preferably, the preset algorithm is a regression algorithm. Preferably, if the user information is processed by using the preprocessing unit shown in FIG. 7, a logistic regression (LR) and a Gradient Boosting Decision Tree (GBDT) combination are used to establish a combined regression model, and Gauss is selected. Normal distribution function. GBDT is a nonlinear model. Each iteration creates a decision tree in the direction of decreasing the residual gradient. How many decision trees are generated by iteration, and the path of the decision tree is used as the LR input feature. Preferably, if the user information is preprocessed using the preprocessing unit shown in FIG. 8, the Bernoulli distribution function is selected using the GBDT model.

The optimization unit 605 is configured to optimize the health model according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result. The data of the test set is used to adjust the parameters of the constructed health model to obtain an optimized health model. If the adjustment parameter is used, the user's information in the test set uses the health model and the standard deviation or root mean square error of the calculated health score gradually approaches zero. Among them, the parameters of the health model include the number of decision trees, the depth of the tree, and the like. For the current user, inputting the current user's user information, the user can be evaluated for health using the optimized health model to obtain the current user's health assessment result.

As shown in FIG. 9, if the interpolation method involved is the mean interpolation method, the first interpolation unit 705 and the second interpolation unit 802 each include a first acquisition unit 901, a first calculation unit 902, and a filling unit 903. The second obtaining unit 904, the second calculating unit 905, and the replacing unit 906. The first obtaining unit 901 is configured to acquire a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value. The first calculating unit 902 is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information. The filling unit 903 is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information. The second obtaining unit 904 is configured to acquire a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value. The second calculating unit 905 is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information. The replacing unit 906 is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information. The imputation unit further improves the integrity of the user information and the accuracy of the user information.

FIG. 10 is a schematic block diagram of a terminal according to another embodiment of the present application. The terminal 100 includes an input device 101, an output device 102, a memory 103, and a processor 104. The input device 101, the output device 102, the memory 103, and the processor 104 are connected by a bus 105. among them:

The input device 101 is configured to provide input user information. In a specific implementation, the input device 101 of the embodiment of the present application may include a keyboard, a mouse, a photoelectric input device, a sound input device, a touch input device, and the like.

The output device 102 is configured to output a health assessment result of the user and the like. In a specific implementation, the output device 102 of the embodiment of the present application may include a display, a display screen, a touch screen, a sound output device, and the like.

The memory 103 is configured to store a computer program with various functions, which when executed, may cause the processor 104 to execute a healthy model construction method for health assessment. In a specific implementation, the memory 103 of the embodiment of the present application may be a system memory, such as a non-volatile (such as a ROM, a flash memory, etc.). In a specific implementation, the memory 803 of the embodiment of the present application may also be an external memory outside the system, such as a magnetic disk, an optical disk, a magnetic tape, or the like.

The processor 104 is configured to invoke a computer program stored in the memory 103 and implement: acquiring user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health; and the user information is Performing pre-processing to obtain a sample data set; dividing the sample data set into a training set and a test set; constructing a health model according to the data in the training set and a preset algorithm; and optimizing the health model according to data of the test set To evaluate the current user's health based on the preset health model to obtain the current user's health assessment result.

When the processor 104 performs pre-processing on the user information to obtain a sample data set, the specific implementation is: filtering user information whose user information integrity is higher than a preset value; calculating and filtering according to the filtered user information and a preset health score rule. a health score corresponding to each user information; constructing a sample according to the filtered user information and a health score corresponding to each user information; and identifying, according to a statistical discriminant method, whether at least one feature information is missing or abnormal in the sample User information; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to an interpolation method or a feature information of the abnormality in the user information according to an interpolation method Replacement is made to form a sample data set.

When the processor 104 performs the pre-processing of the user information to obtain the sample data set, the specific implementation is: identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information; if at least one feature exists The user information with missing or abnormal information is filled with one feature information missing from the user information according to the interpolation method or a feature information of the abnormality in the user information is replaced according to the interpolation method; according to the principal component analysis method ( Principal Component Analysis (PCA) performs dimensionality reduction on the feature information of the user to form a sample data set.

When the processor 104 performs the filling of one feature information that is missing in the user information according to the interpolation method, the specific implementation is: acquiring a plurality of user information that the similarity of the user information with the missing information exceeds a specific value; and calculating the information about the plurality of users. The average value of the data corresponding to one of the missing feature information is filled in; the average value is filled with a value corresponding to a missing feature information in the user information.

When the processor 104 performs the replacement of the feature information of the abnormality in the user information according to the interpolation method, the specific implementation is: acquiring a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value; and calculating the plurality of users in the user An average value of data corresponding to a certain type of characteristic information of the abnormality in the information; and the average value is filled with a value corresponding to one feature information of the abnormality in the user information.

When the processor 104 performs the process of dividing the sample data set into a training set and a test set, the specific implementation is: randomly sampling a preset proportion of sample data from the sample data set to form a training set, and the remaining sample data forms a test set.

In another embodiment of the present application, a storage medium is provided. The storage medium can be a computer readable storage medium. The storage medium stores a computer program, wherein the computer program includes program instructions. The program instructions, when executed by the processor, cause the processor to perform the health model construction method for health assessment in the present application. The storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.

A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the terminal and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, for clarity of hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.

In the several embodiments provided by the present application, it should be understood that the disclosed terminal and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.

The functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.

The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any equivalents can be easily conceived by those skilled in the art within the technical scope disclosed in the present application. Modifications or substitutions are intended to be included within the scope of the present application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.

Claims

A health model construction method for health assessment, comprising:

Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health;

Pre-processing the user information to obtain a sample data set;

Dividing the sample data set into a training set and a test set;

Constructing a health model according to the data in the training set and a preset algorithm;

The health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
The method of claim 1 wherein said preprocessing said user information to obtain a sample data set comprises:

Filtering user information whose user information integrity is higher than a preset value;

Calculating the health score corresponding to each filtered user information according to the filtered user information and the preset health score rule;

Constructing a sample according to the filtered user information and the health score corresponding to each user information;

Identifying, by the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample;

If there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method, Form a sample data set.
The method of claim 1 wherein said preprocessing said user information to obtain a sample data set comprises:

Identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information;

If there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method;

The feature information of the user is reduced in accordance with a Principal Component Analysis (PCA) to form a sample data set.
The method of claim 2, wherein the filling the missing one of the feature information in the user information according to the interpolation method comprises: acquiring a plurality of user information whose presence information is missing and the similarity of the user information exceeds a specific value; Calculating an average value of the data corresponding to the missing one of the pieces of user information; and filling the average value with a value corresponding to a missing feature information in the user information;

The replacing, by the interpolation method, a feature information of the abnormality in the user information, including:

Obtaining a plurality of user information whose similarity with the user information having the abnormal information exceeds a specific value;

Calculating an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information;

The average value is filled with a value corresponding to one of the feature information of the abnormality in the user information.
The method of claim 3, wherein the filling the missing one of the feature information in the user information according to the interpolation method comprises: acquiring a plurality of user information that the similarity of the user information with the missing information exceeds a specific value; Calculating an average value of the data corresponding to the missing one of the pieces of user information; and filling the average value with a value corresponding to a missing feature information in the user information;

The replacing, by the interpolation method, a feature information of the abnormality in the user information, including:

Obtaining a plurality of user information whose similarity with the user information having the abnormal information exceeds a specific value;

Calculating an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information;

The average value is filled with a value corresponding to one of the feature information of the abnormality in the user information.
The method of claim 1, wherein the dividing the sample data set into a training set and a test set comprises: randomly sampling a predetermined proportion of sample data from the sample data set to form a training set, and remaining samples The data forms a test set.
A terminal comprising:

An obtaining unit, configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health;

a preprocessing unit, configured to preprocess the user information to obtain a sample data set;

a dividing unit, configured to divide the sample data set into a training set and a test set;

a building unit, configured to construct a health model according to the data in the training set and a preset algorithm;

And an optimization unit, configured to optimize the health model according to the data of the test set, to evaluate the current user's health according to the optimized health model to obtain a current user's health assessment result.
The terminal according to claim 7, wherein the pre-processing unit comprises a screening unit, a calculation unit, a sample construction unit, a first identification unit, and a first interpolation unit;

The screening unit is configured to filter user information whose user information integrity is higher than a preset value;

The calculating unit is configured to calculate, according to the filtered user information and the preset health score rule, a health score corresponding to each filtered user information;

The sample construction unit is configured to construct a sample according to the filtered user information and a health score corresponding to each user information;

The first identifying unit is configured to identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample;

The first interpolation unit is configured to: if there is at least one user information that is missing or abnormal in feature information, fill in a feature information that is missing in the user information according to an interpolation method or to use the user information according to an interpolation method; A feature information of the abnormality is replaced to form a sample data set.
The terminal according to claim 7, wherein the pre-processing unit comprises a second identification unit, a second interpolation unit, and a dimension reduction unit;

The second identifying unit is configured to identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information;

The second interpolation unit is configured to: if there is at least one user information with missing or abnormal feature information, fill in a feature information that is missing in the user information according to an interpolation method or to use the user information according to an interpolation method; Replace one of the feature information of the exception;

The dimension reduction unit is configured to perform dimension reduction on the feature information of the user according to the PCA to form a sample data set.
The terminal according to claim 8, wherein the first interpolation unit and the second interpolation unit comprise a first acquisition unit, a first calculation unit, a filling unit, a second acquisition unit, a second calculation unit, and a replacement unit. ;among them,

The first obtaining unit is configured to acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value;

The first calculating unit is configured to calculate an average value of the data corresponding to the missing feature information in the pieces of user information;

The filling unit is configured to fill the average value with a value corresponding to a feature information that is missing in the user information.

The second obtaining unit is configured to acquire, by using a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value;

The second calculating unit is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information;

And the replacing unit is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information.
The terminal according to claim 9, wherein the first interpolation unit and the second interpolation unit comprise a first acquisition unit, a first calculation unit, a filling unit, a second acquisition unit, a second calculation unit, and a replacement unit. ;among them,

The first obtaining unit is configured to acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value;

The first calculating unit is configured to calculate an average value of the data corresponding to the missing feature information in the pieces of user information;

The filling unit is configured to fill the average value with a value corresponding to a feature information that is missing in the user information.

The second obtaining unit is configured to acquire, by using a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value;

The second calculating unit is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information;

And the replacing unit is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information.
The terminal according to claim 7, wherein the dividing unit is configured to randomly sample a preset proportion of sample data from the sample data set to form a training set, and the remaining sample data form a test set to divide the sample data set. For training sets and test sets.
A terminal comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to: obtain user information, The user information includes a plurality of feature information related to health, and a plurality of feature information unrelated to health; preprocessing the user information to obtain a sample data set; dividing the sample data set into a training set and a test set; The data in the training set and the preset algorithm construct a health model; the health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
The terminal according to claim 13, wherein the processor performs pre-processing of the user information to obtain a sample data set, and specifically implements: filtering user information whose user information integrity is higher than a preset value; The user information and the preset health score rule calculate the health score corresponding to each filtered user information; construct a sample according to the filtered user information and the health score corresponding to each user information; identify the clinic according to the statistical discriminant method Whether there is at least one user information with missing or abnormal feature information in the sample; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or according to the interpolation The method replaces one feature information of the abnormality in the user information to form a sample data set.
The terminal according to claim 13, wherein the processor performs pre-processing of the user information to obtain a sample data set, and specifically implements: identifying, according to a statistical discriminant method, whether at least one feature information exists in the user information. Missing or abnormal user information; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or abnormally in the user information according to the interpolation method A feature information is replaced; the feature information of the user is reduced in accordance with a Principal Component Analysis (PCA) to form a sample data set.
The terminal according to claim 13, wherein when the processor performs the process of dividing the sample data set into a training set and a test set, the specific implementation is: randomly sampling a predetermined proportion of sample data from the sample data set to form a training Set, the remaining sample data form a test set.
A storage medium, wherein the storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform: acquiring user information, the user information including a plurality of feature information related to health, and a plurality of feature information unrelated to health; preprocessing the user information to obtain a sample data set; dividing the sample data set into a training set and a test set; The centralized data and the preset algorithm construct a health model; the health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
The storage medium of claim 17, wherein the program instructions, when executed by the processor to perform pre-processing of the user information to obtain a sample data set, cause the processor to perform: screening user information integrity is higher than a preset User information of the value; calculating the health score corresponding to each filtered user information according to the filtered user information and the preset health score rule; constructing a sample according to the filtered user information and the health score corresponding to each user information And identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample; if there is at least one user information with missing or abnormal feature information, the missing one of the user information according to the interpolation method The feature information is filled or a feature information of the abnormality in the user information is replaced according to an interpolation method to form a sample data set.
The storage medium of claim 17, wherein the program instructions, when executed by the processor to perform pre-processing of the user information to obtain a sample data set, cause the processor to execute: identifying the user according to a statistical discriminant Whether there is at least one user information with missing or abnormal feature information in the information; if there is at least one user information with missing or abnormal feature information, one feature information missing in the user information is filled according to the interpolation method or according to the interpolation method And replacing one feature information of the abnormality in the user information; and performing feature reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set.
The storage medium of claim 17 wherein said program instructions, when executed by a processor to cause said sample data set to be divided into a training set and a test set, is executed by said processor: randomly sampling from said sample data set The preset ratio of sample data forms a training set, and the remaining sample data form a test set.