WO2018188533A1 - Health model construction method, terminal and storage medium for health assessment - Google Patents

Health model construction method, terminal and storage medium for health assessment Download PDF

Info

Publication number
WO2018188533A1
WO2018188533A1 PCT/CN2018/082173 CN2018082173W WO2018188533A1 WO 2018188533 A1 WO2018188533 A1 WO 2018188533A1 CN 2018082173 W CN2018082173 W CN 2018082173W WO 2018188533 A1 WO2018188533 A1 WO 2018188533A1
Authority
WO
WIPO (PCT)
Prior art keywords
user information
information
health
unit
missing
Prior art date
Application number
PCT/CN2018/082173
Other languages
French (fr)
Chinese (zh)
Inventor
李菲菲
徐亮
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2018188533A1 publication Critical patent/WO2018188533A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a health model construction method, a terminal, and a storage medium for health assessment.
  • the embodiment of the present application provides a health model construction method, a terminal, and a storage medium for health assessment, which can accurately predict the health status of the user.
  • an embodiment of the present application provides a health model construction method for health assessment, the method comprising:
  • the user information including a plurality of feature information related to health, and a plurality of feature information not related to health;
  • the health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
  • the embodiment of the present application provides a terminal, where the terminal includes:
  • An obtaining unit configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health;
  • a preprocessing unit configured to preprocess the user information to obtain a sample data set
  • a dividing unit configured to divide the sample data set into a training set and a test set
  • a building unit configured to construct a health model according to the data in the training set and a preset algorithm
  • an optimization unit configured to optimize the health model according to the data of the test set, to evaluate the current user's health according to the optimized health model to obtain a current user's health assessment result.
  • an embodiment of the present application further provides a terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer
  • the program is implemented: acquiring user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health; preprocessing the user information to obtain a sample data set; and the sample data is The set is divided into a training set and a test set; a health model is constructed according to the data in the training set and a preset algorithm; and the health model is optimized according to the data of the test set to perform current user health according to the optimized health model The assessment gets the health assessment results of the current user.
  • the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health; preprocessing the user information to obtain a sample data set; and dividing the sample data set into a training set and a test set; constructing a health model according to the data in the training set and a preset algorithm; optimizing the health model according to the data of the test set to evaluate current health of the current user according to the optimized health model User's health assessment results.
  • the health model construction method for health assessment provided by the embodiments of the present application can improve the accuracy of the user health assessment.
  • FIG. 1 is a schematic flow chart of a health model construction method for health assessment provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-flow of a health model construction method for health assessment provided by an embodiment of the present application
  • FIG. 4 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application
  • FIG. 5 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application
  • FIG. 6 is a schematic block diagram of a terminal according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic block diagram of a pre-processing unit provided by an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a preprocessing unit according to another embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a terminal according to another embodiment of the present disclosure.
  • FIG. 10 is a schematic block diagram of a terminal according to another embodiment of the present application.
  • FIG. 1 is a schematic flowchart diagram of a health model construction method for health assessment according to an embodiment of the present application. The method includes S101 to S106.
  • S101 Acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health. Such as health-related height, weight, physical examination data, user health files, medical payment information and other characteristics, health-related hobbies, lifestyle habits, consumer, social and other characteristics.
  • feature information related to health is acquired, but also the feature information not related to health is obtained, that is, the feature information of different dimensions is included, and the health status of the user is comprehensively expressed.
  • S102 Preprocess the user information to obtain a sample data set. There are multiple user information. The obtained multiple user information is preprocessed to obtain a sample data set.
  • S102 includes S201-S205.
  • S201 Filter user information whose user information integrity is higher than a preset value. The user information integrity is quantified, and the user information whose user information integrity is higher than the preset value is selected.
  • S202 Calculate, according to the filtered user information and the preset health score rule, the health score corresponding to each filtered user information.
  • the default health score rule can be a health score rule given by an expert, or it can be a default health score rule already in the industry.
  • S203 Construct a sample according to the filtered user information and the health score corresponding to each user information.
  • S204. Identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample.
  • the statistical discriminant method is used to find out the value of the large data containing a large error. Specifically, comparing the value of each object of the variable with the first preset data (such as the mean value), if the absolute value of the comparison result is greater than the second preset data (such as three times the standard deviation), The value of the object is considered to be a value of a coarse error. S205, if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method, To form a sample data set.
  • the first preset data such as the mean value
  • the second preset data such as three times the standard deviation
  • the interpolation method may be a mean interpolation method or a multiple interpolation method.
  • the mean interpolation method takes the average value of the values of all other objects of the variable (the variable is a numerical type) or the value of the variable having the highest number of times (the variable is non-numeric) to fill or replace the interpolation
  • the multi-interpolation method is to construct m (m>1) substitute values for each missing value or outlier, thereby generating m complete data sets corresponding to the variable, and then adopting the same for each data set.
  • the data analysis method is processed to obtain m processing results, and the processing results are integrated, and an estimated value of the interpolation value is obtained based on a certain principle.
  • S102 includes S301-S303.
  • S301. Identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information.
  • S302. If there is at least one user information with missing or abnormal feature information, fill one feature information missing in the user information according to the interpolation method or replace one feature information of the abnormality in the user information according to the interpolation method. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data.
  • the interpolation method may be a mean interpolation method or a multiple interpolation method.
  • PCA Principal Component Analysis
  • the sample data set is divided into a training set and a test set.
  • the sample data of the preset proportion is randomly sampled from the sample data set to form a training set, and the remaining sample data form a test set.
  • the preset ratio is 70%, that is, 70% of the sample data is randomly sampled from the sample data set to form a training set, and the remaining 30% is used as a test set.
  • S104 Construct a health model according to the data in the training set and a preset algorithm.
  • the preset algorithm is a regression algorithm.
  • a logistic regression (LR) and a Gradient Boosting Decision Tree (GBDT) combination are used to establish a combined regression model, and the Gaussian positive is selected.
  • GBDT is a nonlinear model. Each iteration creates a decision tree in the direction of decreasing the residual gradient. How many decision trees are generated by iteration, and the path of the decision tree is used as the LR input feature.
  • the Bernoulli distribution function is selected using the GBDT model.
  • S105 Optimize the health model according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
  • the data of the test set is used to adjust the parameters of the constructed health model to obtain an optimized health model. If the adjustment parameter is used, the user's information in the test set uses the health model and the standard deviation or root mean square error of the calculated health score gradually approaches zero.
  • the parameters of the health model include the number of decision trees, the depth of the tree, and the like. For the current user, inputting the current user's user information, the user can be evaluated for health using the optimized health model to obtain the current user's health assessment result.
  • the steps of filling the missing one of the feature information in the user information according to the interpolation method mentioned in S205 and S302 include S401-S403.
  • S401. Acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value.
  • S402. Calculate an average value of the data corresponding to the missing feature information in the pieces of user information.
  • S403. Fill the average value with a value corresponding to one feature information that is missing in the user information. This padding method further improves the integrity of user information.
  • the step of replacing one feature information of the abnormality in the user information according to the interpolation method including S501-S503 .
  • S501. Acquire a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value.
  • S502. Calculate an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information.
  • S503. Fill the average value with a value corresponding to one feature information of the abnormality in the user information. This replacement method further improves the accuracy of user information.
  • the user information obtained by the foregoing method embodiment includes not only health-related feature information, but also feature information that is not related to health, that is, includes feature information of multiple different dimensions, and preprocesses the user information to obtain a high degree of integrity.
  • the sample data set divides the sample data set into a training set and a test set to construct a health model according to the data in the training set and a preset algorithm to optimize the health model according to the data of the test set.
  • the health model and the optimized health model are constructed with a plurality of different dimensions of user information and a high integrity sample data set and a preset algorithm, so that the use of the optimized health model is more accurate for user health assessment.
  • FIG. 6 is a schematic block diagram of a terminal according to an embodiment of the present application.
  • the terminal 60 includes an obtaining unit 601, a pre-processing unit 602, a dividing unit 603, a building unit 604, and an optimizing unit 605.
  • the obtaining unit 601 is configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health. Such as health-related height, weight, physical examination data, user health records, medical payment information and other characteristics, health-related hobbies, lifestyle habits, consumer, social and other characteristics.
  • feature information related to health is acquired, but also the feature information not related to health is obtained, that is, the feature information of different dimensions is included, and the health status of the user is comprehensively expressed.
  • the pre-processing unit 602 is configured to pre-process the user information to obtain a sample data set. There are multiple user information. The obtained multiple user information is preprocessed to obtain a sample data set.
  • the pre-processing unit 602 includes a screening unit 701, a calculation unit 702, a sample construction unit 703, a first identification unit 704, and a first interpolation unit 705.
  • the filtering unit 701 is configured to filter user information whose user information integrity is higher than a preset value.
  • the user information integrity is quantified, and the user information whose user information integrity is higher than the preset value is selected.
  • the calculating unit 702 is configured to calculate, according to the filtered user information and the preset health score rule, the health score corresponding to each filtered user information.
  • the default health score rule can be a health score rule given by an expert, or it can be a default health score rule already in the industry.
  • the sample construction unit 703 is configured to construct a sample according to the filtered user information and the health score corresponding to each user information.
  • the first identifying unit 704 is configured to identify, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample. Such as abnormal feature information of height more than 3m.
  • the statistical discriminant method is used to find out the value of the large data containing a large error. Specifically, comparing the value of each object of the variable with the first preset data (if desired), if the absolute value of the comparison result is greater than the second preset data (such as the standard deviation), the object is considered The value of the value is the value of the coarse error.
  • the first interpolation unit 705 is configured to: if there is at least one user information with missing or abnormal feature information, fill one feature information missing in the user information according to the interpolation method or abnormality in the user information according to the interpolation method A feature information is replaced to form a sample data set. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data.
  • the interpolation method may be a mean interpolation method or a multiple interpolation method.
  • the mean interpolation method takes the average value of the values of all other objects of the variable (the variable is a numerical type) or the value of the variable having the highest number of times (the variable is non-numeric) to fill or replace the interpolation
  • the multi-interpolation method is to construct m (m>1) substitute values for each missing value or outlier, thereby generating m complete data sets corresponding to the variable, and then adopting the same for each data set.
  • the data analysis method is processed to obtain m processing results, and the processing results are integrated, and an estimated value of the interpolation value is obtained based on a certain principle.
  • the pre-processing unit 602 includes a second identification unit 801 , a second interpolation unit 802 , and a dimension reduction unit 803 .
  • the second identifying unit 801 is configured to identify, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information.
  • the second interpolation unit 802 is configured to: if there is at least one user information that is missing or abnormal in the feature information, fill in a feature information that is missing in the user information according to the interpolation method, or to abnormalize the user information according to the interpolation method. A feature information is replaced.
  • the interpolation method may be a mean interpolation method or a multiple interpolation method.
  • the dimension reduction unit 803 performs dimensionality reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set. After using the PCA for dimensionality reduction, a part of the feature with low correlation can be removed to obtain a feature with high correlation.
  • PCA transforms the original data into a set of linearly independent representations of each dimension through linear transformation, which can be used to extract the main feature components of the data, which is often used for dimensionality reduction of high-dimensional data.
  • the dividing unit 603 is configured to divide the sample data set into a training set and a test set.
  • the sample data of the preset proportion is randomly sampled from the sample data set to form a training set, and the remaining sample data form a test set.
  • the preset ratio is 70%, that is, 70% of the sample data is randomly sampled from the sample data set to form a training set, and the remaining 30% is used as a test set.
  • the building unit 604 is configured to construct a health model according to the data in the training set and a preset algorithm.
  • the preset algorithm is a regression algorithm.
  • a logistic regression (LR) and a Gradient Boosting Decision Tree (GBDT) combination are used to establish a combined regression model, and Gauss is selected. Normal distribution function.
  • GBDT is a nonlinear model. Each iteration creates a decision tree in the direction of decreasing the residual gradient. How many decision trees are generated by iteration, and the path of the decision tree is used as the LR input feature.
  • the Bernoulli distribution function is selected using the GBDT model.
  • the optimization unit 605 is configured to optimize the health model according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
  • the data of the test set is used to adjust the parameters of the constructed health model to obtain an optimized health model. If the adjustment parameter is used, the user's information in the test set uses the health model and the standard deviation or root mean square error of the calculated health score gradually approaches zero.
  • the parameters of the health model include the number of decision trees, the depth of the tree, and the like. For the current user, inputting the current user's user information, the user can be evaluated for health using the optimized health model to obtain the current user's health assessment result.
  • the first interpolation unit 705 and the second interpolation unit 802 each include a first acquisition unit 901, a first calculation unit 902, and a filling unit 903.
  • the first obtaining unit 901 is configured to acquire a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value.
  • the first calculating unit 902 is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information.
  • the filling unit 903 is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information.
  • the second obtaining unit 904 is configured to acquire a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value.
  • the second calculating unit 905 is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information.
  • the replacing unit 906 is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information.
  • the imputation unit further improves the integrity of the user information and the accuracy of the user information.
  • FIG. 10 is a schematic block diagram of a terminal according to another embodiment of the present application.
  • the terminal 100 includes an input device 101, an output device 102, a memory 103, and a processor 104.
  • the input device 101, the output device 102, the memory 103, and the processor 104 are connected by a bus 105. among them:
  • the input device 101 is configured to provide input user information.
  • the input device 101 of the embodiment of the present application may include a keyboard, a mouse, a photoelectric input device, a sound input device, a touch input device, and the like.
  • the output device 102 is configured to output a health assessment result of the user and the like.
  • the output device 102 of the embodiment of the present application may include a display, a display screen, a touch screen, a sound output device, and the like.
  • the memory 103 is configured to store a computer program with various functions, which when executed, may cause the processor 104 to execute a healthy model construction method for health assessment.
  • the memory 103 of the embodiment of the present application may be a system memory, such as a non-volatile (such as a ROM, a flash memory, etc.).
  • the memory 803 of the embodiment of the present application may also be an external memory outside the system, such as a magnetic disk, an optical disk, a magnetic tape, or the like.
  • the processor 104 is configured to invoke a computer program stored in the memory 103 and implement: acquiring user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health; and the user information is Performing pre-processing to obtain a sample data set; dividing the sample data set into a training set and a test set; constructing a health model according to the data in the training set and a preset algorithm; and optimizing the health model according to data of the test set To evaluate the current user's health based on the preset health model to obtain the current user's health assessment result.
  • the specific implementation is: filtering user information whose user information integrity is higher than a preset value; calculating and filtering according to the filtered user information and a preset health score rule.
  • a health score corresponding to each user information constructing a sample according to the filtered user information and a health score corresponding to each user information; and identifying, according to a statistical discriminant method, whether at least one feature information is missing or abnormal in the sample User information; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to an interpolation method or a feature information of the abnormality in the user information according to an interpolation method Replacement is made to form a sample data set.
  • the specific implementation is: identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information; if at least one feature exists The user information with missing or abnormal information is filled with one feature information missing from the user information according to the interpolation method or a feature information of the abnormality in the user information is replaced according to the interpolation method; according to the principal component analysis method ( Principal Component Analysis (PCA) performs dimensionality reduction on the feature information of the user to form a sample data set.
  • PCA Principal Component Analysis
  • the specific implementation is: acquiring a plurality of user information that the similarity of the user information with the missing information exceeds a specific value; and calculating the information about the plurality of users.
  • the average value of the data corresponding to one of the missing feature information is filled in; the average value is filled with a value corresponding to a missing feature information in the user information.
  • the specific implementation is: acquiring a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value; and calculating the plurality of users in the user An average value of data corresponding to a certain type of characteristic information of the abnormality in the information; and the average value is filled with a value corresponding to one feature information of the abnormality in the user information.
  • the specific implementation is: randomly sampling a preset proportion of sample data from the sample data set to form a training set, and the remaining sample data forms a test set.
  • a storage medium in another embodiment, can be a computer readable storage medium.
  • the storage medium stores a computer program, wherein the computer program includes program instructions.
  • the program instructions when executed by the processor, cause the processor to perform the health model construction method for health assessment in the present application.
  • the storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.
  • the disclosed terminal and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.

Landscapes

  • Medical Informatics (AREA)
  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Health & Medical Sciences (AREA)
  • Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Disclosed in the present application are a health model construction method, a terminal and a storage medium for health assessment. The method comprises: obtaining user information; preprocessing the user information to obtain a sample data set; dividing the sample data set into a training set and a test set; constructing a health model on the basis of data in the training set and a preset algorithm; and optimizing the health model on the basis of the data in the test set to evaluate the health of the user according to the optimized health model and to obtain a health assessment result of the user.

Description

用于健康评估的健康模型构建方法、终端及存储介质Health model construction method, terminal and storage medium for health assessment
本申请要求于2017年4月10日提交中国专利局、申请号为201710229172.X、发明名称为“一种用于健康评估的健康模型构建方法及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to Chinese Patent Application filed on April 10, 2017, the Chinese Patent Office, Application No. 201710229172.X, entitled "A Health Model Construction Method and Terminal for Health Assessment", all of which The content is incorporated herein by reference.
技术领域Technical field
本申请涉及数据处理技术领域,尤其涉及一种用于健康评估的健康模型构建方法、终端及存储介质。The present application relates to the field of data processing technologies, and in particular, to a health model construction method, a terminal, and a storage medium for health assessment.
背景技术Background technique
用户常常是出现疾病症状之后才知道自己的健康状况不佳,才到医院检查或者就诊导致检查或者就诊时间过晚,已错过了最佳治疗时间或者导致了治疗费用增加,因此能预测到用户的健康状况是对于用户来说至关重要的事情。现有方式中已存在利用数据模型来预估用户的健康状况的方法,但是健康模型是基于较为完整的用户健康数据得到,因此健康预测时,用户提供的数据完整度将会影响到预估结果的准确性,当用户信息与模型建立的样本数据相比不够完整时难以准确预估出用户健康状况。Users often know that their health is not good after they have symptoms of the disease. They only go to the hospital for examination or visit, which leads to check-up or late treatment. They have missed the best treatment time or caused an increase in treatment costs, so they can predict the user's Health is a vital thing for the user. In the existing method, there is a method for estimating the health status of the user by using the data model, but the health model is based on the relatively complete user health data, so when the health prediction is made, the data integrity provided by the user will affect the estimation result. The accuracy of the user is difficult to accurately predict the health of the user when the user information is not complete compared to the sample data established by the model.
发明内容Summary of the invention
本申请实施例提供了一种用于健康评估的健康模型构建方法、终端及存储介质,可以准确预估出用户健康状况。The embodiment of the present application provides a health model construction method, a terminal, and a storage medium for health assessment, which can accurately predict the health status of the user.
第一方面,本申请实施例提供了一种用于健康评估的健康模型构建方法,该方法包括:In a first aspect, an embodiment of the present application provides a health model construction method for health assessment, the method comprising:
获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health;
对所述用户信息进行预处理得到样本数据集;Pre-processing the user information to obtain a sample data set;
将所述样本数据集划分为训练集和测试集;Dividing the sample data set into a training set and a test set;
根据所述训练集中的数据和预设的算法构建健康模型;Constructing a health model according to the data in the training set and a preset algorithm;
根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。The health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
第二方面,本申请实施例提供了一种终端,该终端包括:In a second aspect, the embodiment of the present application provides a terminal, where the terminal includes:
获取单元,用于获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;An obtaining unit, configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health;
预处理单元,用于对所述用户信息进行预处理得到样本数据集;a preprocessing unit, configured to preprocess the user information to obtain a sample data set;
划分单元,用于将所述样本数据集划分为训练集和测试集;a dividing unit, configured to divide the sample data set into a training set and a test set;
构建单元,用于根据所述训练集中的数据和预设的算法构建健康模型;a building unit, configured to construct a health model according to the data in the training set and a preset algorithm;
优化单元,用于根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。And an optimization unit, configured to optimize the health model according to the data of the test set, to evaluate the current user's health according to the optimized health model to obtain a current user's health assessment result.
第三方面,本申请实施例还提供一种终端,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;对所述用户信息进行预处理得到样本数据集;将所述样本数据集划分为训练集和测试集;根据所述训练集中的数据和预设的算法构建健康模型;根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。In a third aspect, an embodiment of the present application further provides a terminal, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the computer The program is implemented: acquiring user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health; preprocessing the user information to obtain a sample data set; and the sample data is The set is divided into a training set and a test set; a health model is constructed according to the data in the training set and a preset algorithm; and the health model is optimized according to the data of the test set to perform current user health according to the optimized health model The assessment gets the health assessment results of the current user.
第四方面,本申请实施例还提供一种存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行:获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;对所述用户信息进行预处理得到样本数据集;将所述样本数据集划分为训练集和测试集;根据所述训练集中的数据和预设的算法构建健康模型;根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。In a fourth aspect, the embodiment of the present application further provides a storage medium, wherein the storage medium stores a computer program, where the computer program includes program instructions, and the program instructions, when executed by the processor, cause the processor to execute Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health; preprocessing the user information to obtain a sample data set; and dividing the sample data set into a training set and a test set; constructing a health model according to the data in the training set and a preset algorithm; optimizing the health model according to the data of the test set to evaluate current health of the current user according to the optimized health model User's health assessment results.
本申请实施例提供的用于健康评估的健康模型构建方法可以提高对用户健康评估的准确性。The health model construction method for health assessment provided by the embodiments of the present application can improve the accuracy of the user health assessment.
附图说明DRAWINGS
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. Obviously, the drawings in the following description are some embodiments of the present application, For the ordinary technicians, other drawings can be obtained based on these drawings without any creative work.
图1是本申请实施例提供的一种用于健康评估的健康模型构建方法的示意流程图;1 is a schematic flow chart of a health model construction method for health assessment provided by an embodiment of the present application;
图2是本申请实施例提供的一种用于健康评估的健康模型构建方法的子流程示意图;2 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application;
图3是本申请实施例提供的一种用于健康评估的健康模型构建方法的子流程示意图;3 is a schematic diagram of a sub-flow of a health model construction method for health assessment provided by an embodiment of the present application;
图4是本申请实施例提供的一种用于健康评估的健康模型构建方法的子流程示意图;4 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application;
图5是本申请实施例提供的一种用于健康评估的健康模型构建方法的子流程示意图;FIG. 5 is a schematic diagram of a sub-flow of a health model construction method for health assessment according to an embodiment of the present application; FIG.
图6是本申请实施例提供的一种终端的示意性框图;FIG. 6 is a schematic block diagram of a terminal according to an embodiment of the present disclosure;
图7是本申请实施例提供的预处理单元的示意性框图;7 is a schematic block diagram of a pre-processing unit provided by an embodiment of the present application;
图8是本申请另一实施例提供的预处理单元的示意性框图;FIG. 8 is a schematic block diagram of a preprocessing unit according to another embodiment of the present application; FIG.
图9是本申请另一实施例提供的一种终端的示意性框图;FIG. 9 is a schematic block diagram of a terminal according to another embodiment of the present disclosure;
图10是本申请另一实施例提供的一种终端示意性框图。FIG. 10 is a schematic block diagram of a terminal according to another embodiment of the present application.
具体实施方式detailed description
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application are clearly and completely described in the following with reference to the drawings in the embodiments of the present application. It is obvious that the described embodiments are a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of the present application.
图1为本申请实施例提供的一种用于健康评估的健康模型构建方法的流程示意图。该方法包括S101~S106。FIG. 1 is a schematic flowchart diagram of a health model construction method for health assessment according to an embodiment of the present application. The method includes S101 to S106.
S101,获取用户信息,该用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息。如与健康有关的身高、体重、体检数据、用户健康档案、医疗缴费信息等特征信息,与健康无关的兴趣爱好、生活习惯、消费 类、社交类等特征信息。获取用户信息时不仅获取与健康有关的特征信息,还获取与健康无关的特征信息,即包括了不同维度的特征信息,全方位的表达了用户的健康状况。S101. Acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health. Such as health-related height, weight, physical examination data, user health files, medical payment information and other characteristics, health-related hobbies, lifestyle habits, consumer, social and other characteristics. When obtaining user information, not only the feature information related to health is acquired, but also the feature information not related to health is obtained, that is, the feature information of different dimensions is included, and the health status of the user is comprehensively expressed.
S102,对该用户信息进行预处理得到样本数据集。该用户信息有多个。对获取的多个用户信息进行预处理得到样本数据集。S102. Preprocess the user information to obtain a sample data set. There are multiple user information. The obtained multiple user information is preprocessed to obtain a sample data set.
具体地,如图2所示,S102包括S201-S205。S201,筛选用户信息完整度高于预设值的用户信息。对用户信息完整度进行量化计算,选择用户信息完整度高于预设值的用户信息。S202,根据筛选过的用户信息以及预设的健康评分规则计算筛选过的每个用户信息对应的健康分。预设的健康评分规则可以是专家给定的健康评分规则,也可以是行业内已有的默认健康评分规则。S203,根据所述筛选过的用户信息以及每个用户信息对应的健康分构建样本。S204,根据统计判别法识别出该样本中是否存在至少一个特征信息缺失或者异常的用户信息。如身高超过3m的异常特征信息。其中,统计判别法用于将众多数据中含有粗大误差的值找出。具体地,将该变量的每一个对象的值与第一预设数据(如均值)进行比较,若比较得出的结果的绝对值大于第二预设数据(如三倍的标准偏差),则认为该对象的值为粗大误差的值。S205,若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将该用户信息中异常的一个特征信息进行替换,以形成样本数据集。将特征信息缺失或者异常的用户信息进行填充或者替换,以对数据进行修正,提高数据的完整度和数据的准确性。具体地,插补方法可以是均值插补方法,也可以是多重插补方法。其中,均值插补方法是取该变量的其他所有对象的取值的平均值(该变量为数值型)或者该变量取值次数最多的值(该变量为非数值型)来填充或者替换该插补值;多重插补方法是指给每个缺失值或者异常值都构造m(m>1)个替代值,从而产生该变量对应的m个完全数据集,然后对每个数据集采用完全相同的数据分析方法进行处理,得到m个处理结果,综合这些处理结果,基于某种原则,得到该插补值的估计值。Specifically, as shown in FIG. 2, S102 includes S201-S205. S201. Filter user information whose user information integrity is higher than a preset value. The user information integrity is quantified, and the user information whose user information integrity is higher than the preset value is selected. S202. Calculate, according to the filtered user information and the preset health score rule, the health score corresponding to each filtered user information. The default health score rule can be a health score rule given by an expert, or it can be a default health score rule already in the industry. S203. Construct a sample according to the filtered user information and the health score corresponding to each user information. S204. Identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample. Such as abnormal feature information of height more than 3m. Among them, the statistical discriminant method is used to find out the value of the large data containing a large error. Specifically, comparing the value of each object of the variable with the first preset data (such as the mean value), if the absolute value of the comparison result is greater than the second preset data (such as three times the standard deviation), The value of the object is considered to be a value of a coarse error. S205, if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method, To form a sample data set. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. Wherein, the mean interpolation method takes the average value of the values of all other objects of the variable (the variable is a numerical type) or the value of the variable having the highest number of times (the variable is non-numeric) to fill or replace the interpolation The multi-interpolation method is to construct m (m>1) substitute values for each missing value or outlier, thereby generating m complete data sets corresponding to the variable, and then adopting the same for each data set. The data analysis method is processed to obtain m processing results, and the processing results are integrated, and an estimated value of the interpolation value is obtained based on a certain principle.
具体地,如图3所示,在其他的实施例中,S102包括S301-S303。S301,根据统计判别法识别出该用户信息中是否存在至少一个特征信息缺失或者异常的用户信息。S302,若存在至少一个特征信息缺失或者异常的用户信息,根据 插补方法将该用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换。将特征信息缺失或者异常的用户信息进行填充或者替换,以对数据进行修正,提高数据的完整度和数据的准确性。具体地,插补方法可以是均值插补方法,也可以是多重插补方法。S303,根据主元分析方法(Principal Component Analysis,PCA)对所述用户的特征信息进行降维以形成样本数据集。使用PCA进行降维后可以剔除掉一部分相关度小的特征,以得到相关度高的特征。其中,PCA通过线性变换将原始数据变换为一组各维度线性无关的表示,可用于提取数据的主要特征分量,常用于高维数据的降维。Specifically, as shown in FIG. 3, in other embodiments, S102 includes S301-S303. S301. Identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information. S302. If there is at least one user information with missing or abnormal feature information, fill one feature information missing in the user information according to the interpolation method or replace one feature information of the abnormality in the user information according to the interpolation method. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. S303. Perform feature reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set. After using the PCA for dimensionality reduction, a part of the feature with low correlation can be removed to obtain a feature with high correlation. Among them, PCA transforms the original data into a set of linearly independent representations of each dimension through linear transformation, which can be used to extract the main feature components of the data, which is often used for dimensionality reduction of high-dimensional data.
S103,将该样本数据集划分为训练集和测试集。优选地,从该样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集。优选地,预设比例为70%,即从样本数据集中随机抽样70%的样本数据形成训练集,其余的30%作为测试集。S103. The sample data set is divided into a training set and a test set. Preferably, the sample data of the preset proportion is randomly sampled from the sample data set to form a training set, and the remaining sample data form a test set. Preferably, the preset ratio is 70%, that is, 70% of the sample data is randomly sampled from the sample data set to form a training set, and the remaining 30% is used as a test set.
S104,根据该训练集中的数据和预设的算法构建健康模型。优选地,预设的算法为回归算法。优选地,若使用图2所示的方法对用户信息进行预处理时,使用逻辑回归算法(Logistic Regression,LR)和梯度提升树(Gradient Boosting Decision Tree,GBDT)组合建立组合回归模型,选择高斯正态分布函数。GBDT是一种非线性的模型,该模型每次迭代都在减少残差的梯度方向新建立一颗决策树,迭代多少次就会生成多少颗决策树,决策树的路径作为LR输入特征使用。优选地,若使用图3所示的方法对用户信息进行预处理时,使用GBDT模型,选择伯努利分布函数。S104. Construct a health model according to the data in the training set and a preset algorithm. Preferably, the preset algorithm is a regression algorithm. Preferably, if the user information is preprocessed by using the method shown in FIG. 2, a logistic regression (LR) and a Gradient Boosting Decision Tree (GBDT) combination are used to establish a combined regression model, and the Gaussian positive is selected. State distribution function. GBDT is a nonlinear model. Each iteration creates a decision tree in the direction of decreasing the residual gradient. How many decision trees are generated by iteration, and the path of the decision tree is used as the LR input feature. Preferably, if the user information is preprocessed using the method shown in FIG. 3, the Bernoulli distribution function is selected using the GBDT model.
S105,根据测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。利用测试集的数据来调整所构建的健康模型的参数以得到优化的健康模型。如调整参数将测试集中的用户信息使用健康模型得出的健康分与计算出的健康分的标准误差或者均方根误差逐渐趋近于零。其中,健康模型的参数包括决策树的数目、树的深度等。对于当前用户来说,输入当前用户的用户信息,即可利用优化的健康模型对该用户进行健康评估以得到当前用户的健康评估结果。S105. Optimize the health model according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result. The data of the test set is used to adjust the parameters of the constructed health model to obtain an optimized health model. If the adjustment parameter is used, the user's information in the test set uses the health model and the standard deviation or root mean square error of the calculated health score gradually approaches zero. Among them, the parameters of the health model include the number of decision trees, the depth of the tree, and the like. For the current user, inputting the current user's user information, the user can be evaluated for health using the optimized health model to obtain the current user's health assessment result.
如图4所示,若涉及的插补方法为均值插补方法,S205和S302中提及的, 根据插补方法将该用户信息中缺失的一个特征信息进行填充的步骤,包括S401-S403。S401,获取存在信息缺失的用户信息相似度超过特定值的若干用户信息。S402,计算在所述若干用户信息中所述缺失的一个特征信息相对应的数据的平均值。S403,将所述平均值填充用户信息中缺失的一个特征信息对应的数值。该填充方法更一步提高用户信息的完整度。As shown in FIG. 4, if the interpolation method involved is the mean interpolation method, the steps of filling the missing one of the feature information in the user information according to the interpolation method mentioned in S205 and S302 include S401-S403. S401. Acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value. S402. Calculate an average value of the data corresponding to the missing feature information in the pieces of user information. S403. Fill the average value with a value corresponding to one feature information that is missing in the user information. This padding method further improves the integrity of user information.
如图5所示,若涉及的插补方法为均值插补方法,S205和S302中提及的,根据插补方法将所述用户信息中异常的一个特征信息进行替换的步骤,包括S501-S503。S501,获取与存在信息异常的用户信息相似度超过特定值的若干用户信息。S502,计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值。S503,将所述平均值填充用户信息中异常的一个特征信息对应的数值。该替换方法更一步提高用户信息的准确性。As shown in FIG. 5, if the interpolation method involved is the mean interpolation method, as mentioned in S205 and S302, the step of replacing one feature information of the abnormality in the user information according to the interpolation method, including S501-S503 . S501. Acquire a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value. S502. Calculate an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information. S503. Fill the average value with a value corresponding to one feature information of the abnormality in the user information. This replacement method further improves the accuracy of user information.
上述方法实施例获取的用户信息不仅包括与健康有关的特征信息,还包括与健康无关的特征信息,即包括了多个不同维度的特征信息,同时对用户信息进行预处理以得到完整度高的样本数据集,将样本数据集划分为训练集和测试集,以根据训练集中的数据和预设的算法构建健康模型,以根据测试集的数据优化健康模型。用多个不同维度的用户信息和完整度高的样本数据集和预设的算法构建健康模型和优化健康模型,使得使用优化后的健康模型对用户健康评估的准确性更高。The user information obtained by the foregoing method embodiment includes not only health-related feature information, but also feature information that is not related to health, that is, includes feature information of multiple different dimensions, and preprocesses the user information to obtain a high degree of integrity. The sample data set divides the sample data set into a training set and a test set to construct a health model according to the data in the training set and a preset algorithm to optimize the health model according to the data of the test set. The health model and the optimized health model are constructed with a plurality of different dimensions of user information and a high integrity sample data set and a preset algorithm, so that the use of the optimized health model is more accurate for user health assessment.
图6为本申请实施例提供的一种终端的示意性框图。该终端60包括获取单元601、预处理单元602、划分单元603、构建单元604、优化单元605。FIG. 6 is a schematic block diagram of a terminal according to an embodiment of the present application. The terminal 60 includes an obtaining unit 601, a pre-processing unit 602, a dividing unit 603, a building unit 604, and an optimizing unit 605.
获取单元601,用于获取用户信息,该用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息。如与健康有关的身高、体重、体检数据、用户健康档案、医疗缴费信息等特征信息,与健康无关的兴趣爱好、生活习惯、消费类、社交类等特征信息。获取用户信息时不仅获取与健康有关的特征信息,还获取与健康无关的特征信息,即包括了不同维度的特征信息,全方位的表达了用户的健康状况。The obtaining unit 601 is configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health. Such as health-related height, weight, physical examination data, user health records, medical payment information and other characteristics, health-related hobbies, lifestyle habits, consumer, social and other characteristics. When obtaining user information, not only the feature information related to health is acquired, but also the feature information not related to health is obtained, that is, the feature information of different dimensions is included, and the health status of the user is comprehensively expressed.
预处理单元602,用于对该用户信息进行预处理得到样本数据集。该用户信息有多个。对获取的多个用户信息进行预处理得到样本数据集。The pre-processing unit 602 is configured to pre-process the user information to obtain a sample data set. There are multiple user information. The obtained multiple user information is preprocessed to obtain a sample data set.
具体地,如图7所示,预处理单元602包括筛选单元701、计算单元702、 样本构建单元703、第一识别单元704、第一插补单元705。筛选单元701,用于筛选用户信息完整度高于预设值的用户信息。对用户信息完整度进行量化计算,选择用户信息完整度高于预设值的用户信息。计算单元702,用于根据筛选过的用户信息以及预设的健康评分规则计算筛选过的每个用户信息对应的健康分。预设的健康评分规则可以是专家给定的健康评分规则,也可以是行业内已有的默认健康评分规则。样本构建单元703,用于根据所述筛选过的用户信息以及每个用户信息对应的健康分构建样本。第一识别单元704,用于根据统计判别法识别出该样本中是否存在至少一个特征信息缺失或者异常的用户信息。如身高超过3m的异常特征信息。其中,统计判别法用于将众多数据中含有粗大误差的值找出。具体地,将该变量的每一个对象的值与第一预设数据(如期望)进行比较,若比较得出的结果的绝对值大于第二预设数据(如标准差),则认为该对象的值为粗大误差的值。第一插补单元705,用于若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将该用户信息中异常的一个特征信息进行替换,以形成样本数据集。将特征信息缺失或者异常的用户信息进行填充或者替换,以对数据进行修正,提高数据的完整度和数据的准确性。具体地,插补方法可以是均值插补方法,也可以是多重插补方法。其中,均值插补方法是取该变量的其他所有对象的取值的平均值(该变量为数值型)或者该变量取值次数最多的值(该变量为非数值型)来填充或者替换该插补值;多重插补方法是指给每个缺失值或者异常值都构造m(m>1)个替代值,从而产生该变量对应的m个完全数据集,然后对每个数据集采用完全相同的数据分析方法进行处理,得到m个处理结果,综合这些处理结果,基于某种原则,得到该插补值的估计值。Specifically, as shown in FIG. 7, the pre-processing unit 602 includes a screening unit 701, a calculation unit 702, a sample construction unit 703, a first identification unit 704, and a first interpolation unit 705. The filtering unit 701 is configured to filter user information whose user information integrity is higher than a preset value. The user information integrity is quantified, and the user information whose user information integrity is higher than the preset value is selected. The calculating unit 702 is configured to calculate, according to the filtered user information and the preset health score rule, the health score corresponding to each filtered user information. The default health score rule can be a health score rule given by an expert, or it can be a default health score rule already in the industry. The sample construction unit 703 is configured to construct a sample according to the filtered user information and the health score corresponding to each user information. The first identifying unit 704 is configured to identify, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample. Such as abnormal feature information of height more than 3m. Among them, the statistical discriminant method is used to find out the value of the large data containing a large error. Specifically, comparing the value of each object of the variable with the first preset data (if desired), if the absolute value of the comparison result is greater than the second preset data (such as the standard deviation), the object is considered The value of the value is the value of the coarse error. The first interpolation unit 705 is configured to: if there is at least one user information with missing or abnormal feature information, fill one feature information missing in the user information according to the interpolation method or abnormality in the user information according to the interpolation method A feature information is replaced to form a sample data set. Fill or replace user information with missing or abnormal feature information to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. Wherein, the mean interpolation method takes the average value of the values of all other objects of the variable (the variable is a numerical type) or the value of the variable having the highest number of times (the variable is non-numeric) to fill or replace the interpolation The multi-interpolation method is to construct m (m>1) substitute values for each missing value or outlier, thereby generating m complete data sets corresponding to the variable, and then adopting the same for each data set. The data analysis method is processed to obtain m processing results, and the processing results are integrated, and an estimated value of the interpolation value is obtained based on a certain principle.
具体地,如图8所示,在其他的实施例中,预处理单元602包括第二识别单元801、第二插补单元802、降维单元803。第二识别单元801,用于根据统计判别法识别出该用户信息中是否存在至少一个特征信息缺失或者异常的用户信息。第二插补单元802,用于若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将该用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换。将特征信息缺失或者异常的用户信息进行填充或者替换,以对数据进行修正,提高数据的完整度和 数据的准确性。具体地,插补方法可以是均值插补方法,也可以是多重插补方法。降维单元803,根据主元分析方法(Principal Component Analysis,PCA)对所述用户的特征信息进行降维以形成样本数据集。使用PCA进行降维后可以剔除掉一部分相关度小的特征,以得到相关度高的特征。其中,PCA通过线性变换将原始数据变换为一组各维度线性无关的表示,可用于提取数据的主要特征分量,常用于高维数据的降维。Specifically, as shown in FIG. 8 , in other embodiments, the pre-processing unit 602 includes a second identification unit 801 , a second interpolation unit 802 , and a dimension reduction unit 803 . The second identifying unit 801 is configured to identify, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information. The second interpolation unit 802 is configured to: if there is at least one user information that is missing or abnormal in the feature information, fill in a feature information that is missing in the user information according to the interpolation method, or to abnormalize the user information according to the interpolation method. A feature information is replaced. User information with missing or abnormal feature information is filled or replaced to correct the data to improve the integrity of the data and the accuracy of the data. Specifically, the interpolation method may be a mean interpolation method or a multiple interpolation method. The dimension reduction unit 803 performs dimensionality reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set. After using the PCA for dimensionality reduction, a part of the feature with low correlation can be removed to obtain a feature with high correlation. Among them, PCA transforms the original data into a set of linearly independent representations of each dimension through linear transformation, which can be used to extract the main feature components of the data, which is often used for dimensionality reduction of high-dimensional data.
划分单元603,用于将该样本数据集划分为训练集和测试集。优选地,从该样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集。优选地,预设比例为70%,即从样本数据集中随机抽样70%的样本数据形成训练集,其余的30%作为测试集。The dividing unit 603 is configured to divide the sample data set into a training set and a test set. Preferably, the sample data of the preset proportion is randomly sampled from the sample data set to form a training set, and the remaining sample data form a test set. Preferably, the preset ratio is 70%, that is, 70% of the sample data is randomly sampled from the sample data set to form a training set, and the remaining 30% is used as a test set.
构建单元604,用于根据该训练集中的数据和预设的算法构建健康模型。优选地,预设的算法为回归算法。优选地,若使用图7所示的预处理单元对用户信息进行处理时,使用逻辑回归算法(Logistic Regression,LR)和梯度提升树(Gradient Boosting Decision Tree,GBDT)组合建立组合回归模型,选择高斯正态分布函数。GBDT是一种非线性的模型,该模型每次迭代都在减少残差的梯度方向新建立一颗决策树,迭代多少次就会生成多少颗决策树,决策树的路径作为LR输入特征使用。优选地,若使用图8所示的预处理单元对用户信息进行预处理时,使用GBDT模型,选择伯努利分布函数。The building unit 604 is configured to construct a health model according to the data in the training set and a preset algorithm. Preferably, the preset algorithm is a regression algorithm. Preferably, if the user information is processed by using the preprocessing unit shown in FIG. 7, a logistic regression (LR) and a Gradient Boosting Decision Tree (GBDT) combination are used to establish a combined regression model, and Gauss is selected. Normal distribution function. GBDT is a nonlinear model. Each iteration creates a decision tree in the direction of decreasing the residual gradient. How many decision trees are generated by iteration, and the path of the decision tree is used as the LR input feature. Preferably, if the user information is preprocessed using the preprocessing unit shown in FIG. 8, the Bernoulli distribution function is selected using the GBDT model.
优化单元605,用于根据该测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。利用测试集的数据来调整所构建的健康模型的参数以得到优化的健康模型。如调整参数将测试集中的用户信息使用健康模型得出的健康分与计算出的健康分的标准误差或者均方根误差逐渐趋近于零。其中,健康模型的参数包括决策树的数目、树的深度等。对于当前用户来说,输入当前用户的用户信息,即可利用优化的健康模型对该用户进行健康评估以得到当前用户的健康评估结果。The optimization unit 605 is configured to optimize the health model according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result. The data of the test set is used to adjust the parameters of the constructed health model to obtain an optimized health model. If the adjustment parameter is used, the user's information in the test set uses the health model and the standard deviation or root mean square error of the calculated health score gradually approaches zero. Among them, the parameters of the health model include the number of decision trees, the depth of the tree, and the like. For the current user, inputting the current user's user information, the user can be evaluated for health using the optimized health model to obtain the current user's health assessment result.
其中,如图9所示,若涉及的插补方法为均值插补方法,第一插补单元705和第二插补单元802均包括第一获取单元901、第一计算单元902、填充单元903、第二获取单元904、第二计算单元905、替换单元906。第一获取单元901,用于获取与存在信息异常的用户信息相似度超过特定值的若干用户信息。第一 计算单元902,用于计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值。填充单元903,用于将所述平均值填充用户信息中异常的一个特征信息对应的数值。第二获取单元904,用于获取与存在信息异常的用户信息相似度超过特定值的若干用户信息。第二计算单元905,用于计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值。替换单元906,用于将所述平均值填充用户信息中异常的一个特征信息对应的数值。该插补单元更一步提高用户信息的完整度和用户信息的准确性。As shown in FIG. 9, if the interpolation method involved is the mean interpolation method, the first interpolation unit 705 and the second interpolation unit 802 each include a first acquisition unit 901, a first calculation unit 902, and a filling unit 903. The second obtaining unit 904, the second calculating unit 905, and the replacing unit 906. The first obtaining unit 901 is configured to acquire a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value. The first calculating unit 902 is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information. The filling unit 903 is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information. The second obtaining unit 904 is configured to acquire a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value. The second calculating unit 905 is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information. The replacing unit 906 is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information. The imputation unit further improves the integrity of the user information and the accuracy of the user information.
图10为本申请另一实施例提供的一种终端的示意性框图。该终端100包括包括输入装置101、输出装置102、存储器103以及处理器104,上述输入装置101、输出装置102、存储器103以及处理器104通过总线105连接。其中:FIG. 10 is a schematic block diagram of a terminal according to another embodiment of the present application. The terminal 100 includes an input device 101, an output device 102, a memory 103, and a processor 104. The input device 101, the output device 102, the memory 103, and the processor 104 are connected by a bus 105. among them:
输入装置101,用于提供输入用户信息。具体实现中,本申请实施例的输入装置101可包括键盘、鼠标、光电输入装置、声音输入装置、触摸式输入装置等。The input device 101 is configured to provide input user information. In a specific implementation, the input device 101 of the embodiment of the present application may include a keyboard, a mouse, a photoelectric input device, a sound input device, a touch input device, and the like.
输出装置102,用于输出用户的健康评估结果等。具体实现中,本申请实施例的输出装置102可包括显示器、显示屏、触摸屏、声音输出装置等。The output device 102 is configured to output a health assessment result of the user and the like. In a specific implementation, the output device 102 of the embodiment of the present application may include a display, a display screen, a touch screen, a sound output device, and the like.
存储器103,用于存储带有各种功能的计算机程序,该计算机程序被执行时,可以使得处理器104执行一种用于健康评估的健康模型构建方法。具体实现中,本申请实施例的存储器103可以是系统存储器,比如非易失性的(诸如ROM,闪存等)。具体实现中,本申请实施例的存储器803还可以是系统之外的外部存储器,比如,磁盘、光盘、磁带等。The memory 103 is configured to store a computer program with various functions, which when executed, may cause the processor 104 to execute a healthy model construction method for health assessment. In a specific implementation, the memory 103 of the embodiment of the present application may be a system memory, such as a non-volatile (such as a ROM, a flash memory, etc.). In a specific implementation, the memory 803 of the embodiment of the present application may also be an external memory outside the system, such as a magnetic disk, an optical disk, a magnetic tape, or the like.
处理器104,用于调用存储器103中存储的计算机程序并实现:获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;对所述用户信息进行预处理得到样本数据集;将所述样本数据集划分为训练集和测试集;根据所述训练集中的数据和预设的算法构建健康模型;根据所述测试集的数据优化所述健康模型,以根据预设的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。The processor 104 is configured to invoke a computer program stored in the memory 103 and implement: acquiring user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health; and the user information is Performing pre-processing to obtain a sample data set; dividing the sample data set into a training set and a test set; constructing a health model according to the data in the training set and a preset algorithm; and optimizing the health model according to data of the test set To evaluate the current user's health based on the preset health model to obtain the current user's health assessment result.
处理器104执行对所述用户信息进行预处理得到样本数据集时,具体实现:筛选用户信息完整度高于预设值的用户信息;根据筛选过的用户信息以及预设的健康评分规则计算筛选过的每个用户信息对应的健康分;根据所述筛选过的 用户信息以及每个用户信息对应的健康分构建样本;根据统计判别法识别出所述样本中是否存在至少一个特征信息缺失或者异常的用户信息;若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换,以形成样本数据集。When the processor 104 performs pre-processing on the user information to obtain a sample data set, the specific implementation is: filtering user information whose user information integrity is higher than a preset value; calculating and filtering according to the filtered user information and a preset health score rule. a health score corresponding to each user information; constructing a sample according to the filtered user information and a health score corresponding to each user information; and identifying, according to a statistical discriminant method, whether at least one feature information is missing or abnormal in the sample User information; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to an interpolation method or a feature information of the abnormality in the user information according to an interpolation method Replacement is made to form a sample data set.
处理器104执行对所述用户信息进行预处理得到样本数据集时,具体实现:根据统计判别法识别出所述用户信息中是否存在至少一个特征信息缺失或者异常的用户信息;若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换;根据主元分析方法(Principal Component Analysis,PCA)对所述用户的特征信息进行降维以形成样本数据集。When the processor 104 performs the pre-processing of the user information to obtain the sample data set, the specific implementation is: identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information; if at least one feature exists The user information with missing or abnormal information is filled with one feature information missing from the user information according to the interpolation method or a feature information of the abnormality in the user information is replaced according to the interpolation method; according to the principal component analysis method ( Principal Component Analysis (PCA) performs dimensionality reduction on the feature information of the user to form a sample data set.
处理器104执行根据插补方法将所述用户信息中缺失的一个特征信息进行填充时,具体实现:获取存在信息缺失的用户信息相似度超过特定值的若干用户信息;计算在所述若干用户信息中所述缺失的一个特征信息相对应的数据的平均值;将所述平均值填充用户信息中缺失的一个特征信息对应的数值。When the processor 104 performs the filling of one feature information that is missing in the user information according to the interpolation method, the specific implementation is: acquiring a plurality of user information that the similarity of the user information with the missing information exceeds a specific value; and calculating the information about the plurality of users. The average value of the data corresponding to one of the missing feature information is filled in; the average value is filled with a value corresponding to a missing feature information in the user information.
处理器104执行根据插补方法将所述用户信息中异常的一个特征信息进行替换时,具体实现:获取与存在信息异常的用户信息相似度超过特定值的若干用户信息;计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值;将所述平均值填充用户信息中异常的一个特征信息对应的数值。When the processor 104 performs the replacement of the feature information of the abnormality in the user information according to the interpolation method, the specific implementation is: acquiring a plurality of user information whose similarity with the user information of the presence information abnormality exceeds a specific value; and calculating the plurality of users in the user An average value of data corresponding to a certain type of characteristic information of the abnormality in the information; and the average value is filled with a value corresponding to one feature information of the abnormality in the user information.
处理器104执行将所述样本数据集划分为训练集和测试集时,具体实现:从所述样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集。When the processor 104 performs the process of dividing the sample data set into a training set and a test set, the specific implementation is: randomly sampling a preset proportion of sample data from the sample data set to form a training set, and the remaining sample data forms a test set.
在本申请的另一实施例中提供一种存储介质。该存储介质可以为计算机可读存储介质。该存储介质存储有计算机程序,其中计算机程序包括程序指令。该程序指令被处理器执行时使处理器执行本申请中的用于健康评估的健康模型构建方法。该存储介质可以是U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、闪存、磁碟或者光盘等各种可以存储程序代码的介质。In another embodiment of the present application, a storage medium is provided. The storage medium can be a computer readable storage medium. The storage medium stores a computer program, wherein the computer program includes program instructions. The program instructions, when executed by the processor, cause the processor to perform the health model construction method for health assessment in the present application. The storage medium may be a medium that can store program codes, such as a USB flash drive, a removable hard disk, a Read-Only Memory (ROM), a flash memory, a magnetic disk, or an optical disk.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的终端和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在 此不再赘述。本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the terminal and the unit described above can refer to the corresponding process in the foregoing method embodiment, and details are not described herein again. Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware, computer software, or a combination of both, for clarity of hardware and software. Interchangeability, the composition and steps of the various examples have been generally described in terms of function in the above description. Whether these functions are implemented in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
在本申请所提供的几个实施例中,应该理解到,所揭露的终端和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。In the several embodiments provided by the present application, it should be understood that the disclosed terminal and method may be implemented in other manners. For example, the device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner, for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。The functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。The integrated unit, if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be in essence or part of the contribution to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。The foregoing is only a specific embodiment of the present application, but the scope of protection of the present application is not limited thereto, and any equivalents can be easily conceived by those skilled in the art within the technical scope disclosed in the present application. Modifications or substitutions are intended to be included within the scope of the present application. Therefore, the scope of protection of this application should be determined by the scope of protection of the claims.

Claims (20)

  1. 一种用于健康评估的健康模型构建方法,其包括:A health model construction method for health assessment, comprising:
    获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;Obtaining user information, the user information including a plurality of feature information related to health, and a plurality of feature information not related to health;
    对所述用户信息进行预处理得到样本数据集;Pre-processing the user information to obtain a sample data set;
    将所述样本数据集划分为训练集和测试集;Dividing the sample data set into a training set and a test set;
    根据所述训练集中的数据和预设的算法构建健康模型;Constructing a health model according to the data in the training set and a preset algorithm;
    根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。The health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
  2. 如权利要求1所述的方法,其中,所述对所述用户信息进行预处理得到样本数据集,包括:The method of claim 1 wherein said preprocessing said user information to obtain a sample data set comprises:
    筛选用户信息完整度高于预设值的用户信息;Filtering user information whose user information integrity is higher than a preset value;
    根据筛选过的用户信息以及预设的健康评分规则计算筛选过的每个用户信息对应的健康分;Calculating the health score corresponding to each filtered user information according to the filtered user information and the preset health score rule;
    根据所述筛选过的用户信息以及每个用户信息对应的健康分构建样本;Constructing a sample according to the filtered user information and the health score corresponding to each user information;
    根据统计判别法识别出所述样本中是否存在至少一个特征信息缺失或者异常的用户信息;Identifying, by the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample;
    若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换,以形成样本数据集。If there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method, Form a sample data set.
  3. 如权利要求1所述的方法,其中,所述对所述用户信息进行预处理得到样本数据集,包括:The method of claim 1 wherein said preprocessing said user information to obtain a sample data set comprises:
    根据统计判别法识别出所述用户信息中是否存在至少一个特征信息缺失或者异常的用户信息;Identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information;
    若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换;If there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or replacing one feature information of the abnormality in the user information according to the interpolation method;
    根据主元分析方法(Principal Component Analysis,PCA)对所述用户的特征信息进行降维以形成样本数据集。The feature information of the user is reduced in accordance with a Principal Component Analysis (PCA) to form a sample data set.
  4. 如权利要求2所述的方法,其中,所述根据插补方法将所述用户信息中缺失的一个特征信息进行填充,包括:获取存在信息缺失的用户信息相似度超过特定值的若干用户信息;计算在所述若干用户信息中所述缺失的一个特征信 息相对应的数据的平均值;将所述平均值填充用户信息中缺失的一个特征信息对应的数值;The method of claim 2, wherein the filling the missing one of the feature information in the user information according to the interpolation method comprises: acquiring a plurality of user information whose presence information is missing and the similarity of the user information exceeds a specific value; Calculating an average value of the data corresponding to the missing one of the pieces of user information; and filling the average value with a value corresponding to a missing feature information in the user information;
    所述根据插补方法将所述用户信息中异常的一个特征信息进行替换,包括:The replacing, by the interpolation method, a feature information of the abnormality in the user information, including:
    获取与存在信息异常的用户信息相似度超过特定值的若干用户信息;Obtaining a plurality of user information whose similarity with the user information having the abnormal information exceeds a specific value;
    计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值;Calculating an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information;
    将所述平均值填充用户信息中异常的一个特征信息对应的数值。The average value is filled with a value corresponding to one of the feature information of the abnormality in the user information.
  5. 如权利要求3所述的方法,其中,所述根据插补方法将所述用户信息中缺失的一个特征信息进行填充,包括:获取存在信息缺失的用户信息相似度超过特定值的若干用户信息;计算在所述若干用户信息中所述缺失的一个特征信息相对应的数据的平均值;将所述平均值填充用户信息中缺失的一个特征信息对应的数值;The method of claim 3, wherein the filling the missing one of the feature information in the user information according to the interpolation method comprises: acquiring a plurality of user information that the similarity of the user information with the missing information exceeds a specific value; Calculating an average value of the data corresponding to the missing one of the pieces of user information; and filling the average value with a value corresponding to a missing feature information in the user information;
    所述根据插补方法将所述用户信息中异常的一个特征信息进行替换,包括:The replacing, by the interpolation method, a feature information of the abnormality in the user information, including:
    获取与存在信息异常的用户信息相似度超过特定值的若干用户信息;Obtaining a plurality of user information whose similarity with the user information having the abnormal information exceeds a specific value;
    计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值;Calculating an average value of data corresponding to a type of characteristic information of the abnormality in the plurality of user information;
    将所述平均值填充用户信息中异常的一个特征信息对应的数值。The average value is filled with a value corresponding to one of the feature information of the abnormality in the user information.
  6. 如权利要求1所述的方法,其中,所述将所述样本数据集划分为训练集和测试集,包括:从所述样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集。The method of claim 1, wherein the dividing the sample data set into a training set and a test set comprises: randomly sampling a predetermined proportion of sample data from the sample data set to form a training set, and remaining samples The data forms a test set.
  7. 一种终端,其包括:A terminal comprising:
    获取单元,用于获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;An obtaining unit, configured to acquire user information, where the user information includes a plurality of feature information related to health, and a plurality of feature information not related to health;
    预处理单元,用于对所述用户信息进行预处理得到样本数据集;a preprocessing unit, configured to preprocess the user information to obtain a sample data set;
    划分单元,用于将所述样本数据集划分为训练集和测试集;a dividing unit, configured to divide the sample data set into a training set and a test set;
    构建单元,用于根据所述训练集中的数据和预设的算法构建健康模型;a building unit, configured to construct a health model according to the data in the training set and a preset algorithm;
    优化单元,用于根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。And an optimization unit, configured to optimize the health model according to the data of the test set, to evaluate the current user's health according to the optimized health model to obtain a current user's health assessment result.
  8. 如权利要求7所述的终端,其中,所述预处理单元包括筛选单元、计算单元、样本构建单元、第一识别单元、第一插补单元;The terminal according to claim 7, wherein the pre-processing unit comprises a screening unit, a calculation unit, a sample construction unit, a first identification unit, and a first interpolation unit;
    所述筛选单元,用于筛选用户信息完整度高于预设值的用户信息;The screening unit is configured to filter user information whose user information integrity is higher than a preset value;
    所述计算单元,用于根据筛选过的用户信息以及预设的健康评分规则计算 筛选过的每个用户信息对应的健康分;The calculating unit is configured to calculate, according to the filtered user information and the preset health score rule, a health score corresponding to each filtered user information;
    所述样本构建单元,用于根据所述筛选过的用户信息以及每个用户信息对应的健康分构建样本;The sample construction unit is configured to construct a sample according to the filtered user information and a health score corresponding to each user information;
    所述第一识别单元,用于根据统计判别法识别出所述样本中是否存在至少一个特征信息缺失或者异常的用户信息;The first identifying unit is configured to identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample;
    所述第一插补单元,用于若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换,以形成样本数据集。The first interpolation unit is configured to: if there is at least one user information that is missing or abnormal in feature information, fill in a feature information that is missing in the user information according to an interpolation method or to use the user information according to an interpolation method; A feature information of the abnormality is replaced to form a sample data set.
  9. 如权利要求7所述的终端,其中,所述预处理单元包括包括第二识别单元、第二插补单元、降维单元;The terminal according to claim 7, wherein the pre-processing unit comprises a second identification unit, a second interpolation unit, and a dimension reduction unit;
    所述第二识别单元,用于根据统计判别法识别出所述用户信息中是否存在至少一个特征信息缺失或者异常的用户信息;The second identifying unit is configured to identify, according to a statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the user information;
    所述第二插补单元,用于若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换;The second interpolation unit is configured to: if there is at least one user information with missing or abnormal feature information, fill in a feature information that is missing in the user information according to an interpolation method or to use the user information according to an interpolation method; Replace one of the feature information of the exception;
    所述降维单元,用于根据PCA对所述用户的特征信息进行降维以形成样本数据集。The dimension reduction unit is configured to perform dimension reduction on the feature information of the user according to the PCA to form a sample data set.
  10. 如权利要求8所述的终端,其中,所述第一插补单元和第二插补单元包括第一获取单元、第一计算单元、填充单元、第二获取单元、第二计算单元、替换单元;其中,The terminal according to claim 8, wherein the first interpolation unit and the second interpolation unit comprise a first acquisition unit, a first calculation unit, a filling unit, a second acquisition unit, a second calculation unit, and a replacement unit. ;among them,
    所述第一获取单元,用于获取存在信息缺失的用户信息相似度超过特定值的若干用户信息;The first obtaining unit is configured to acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value;
    所述第一计算单元,用于计算在所述若干用户信息中所述缺失的一个特征信息相对应的数据的平均值;The first calculating unit is configured to calculate an average value of the data corresponding to the missing feature information in the pieces of user information;
    所述填充单元,用于将所述平均值填充用户信息中缺失的一个特征信息对应的数值。The filling unit is configured to fill the average value with a value corresponding to a feature information that is missing in the user information.
    所述第二获取单元,用于获取与存在信息异常的用户信息相似度超过特定值的若干用户信息;The second obtaining unit is configured to acquire, by using a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value;
    所述第二计算单元,用于计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值;The second calculating unit is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information;
    所述替换单元,用于将所述平均值填充用户信息中异常的一个特征信息对应的数值。And the replacing unit is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information.
  11. 如权利要求9所述的终端,其中,所述第一插补单元和第二插补单元包括第一获取单元、第一计算单元、填充单元、第二获取单元、第二计算单元、替换单元;其中,The terminal according to claim 9, wherein the first interpolation unit and the second interpolation unit comprise a first acquisition unit, a first calculation unit, a filling unit, a second acquisition unit, a second calculation unit, and a replacement unit. ;among them,
    所述第一获取单元,用于获取存在信息缺失的用户信息相似度超过特定值的若干用户信息;The first obtaining unit is configured to acquire a plurality of user information that the similarity of the user information with the missing information exceeds a specific value;
    所述第一计算单元,用于计算在所述若干用户信息中所述缺失的一个特征信息相对应的数据的平均值;The first calculating unit is configured to calculate an average value of the data corresponding to the missing feature information in the pieces of user information;
    所述填充单元,用于将所述平均值填充用户信息中缺失的一个特征信息对应的数值。The filling unit is configured to fill the average value with a value corresponding to a feature information that is missing in the user information.
    所述第二获取单元,用于获取与存在信息异常的用户信息相似度超过特定值的若干用户信息;The second obtaining unit is configured to acquire, by using a plurality of user information that the similarity of the user information with the abnormal information exceeds a specific value;
    所述第二计算单元,用于计算在所述若干用户信息中所述异常的一类个特征信息相对应的数据的平均值;The second calculating unit is configured to calculate an average value of data corresponding to the type of feature information of the abnormality in the plurality of user information;
    所述替换单元,用于将所述平均值填充用户信息中异常的一个特征信息对应的数值。And the replacing unit is configured to fill the average value with a value corresponding to one feature information of the abnormality in the user information.
  12. 如权利要求7所述的终端,其中,所述划分单元用于从所述样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集以将所述样本数据集划分为训练集和测试集。The terminal according to claim 7, wherein the dividing unit is configured to randomly sample a preset proportion of sample data from the sample data set to form a training set, and the remaining sample data form a test set to divide the sample data set. For training sets and test sets.
  13. 一种终端,其包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;对所述用户信息进行预处理得到样本数据集;将所述样本数据集划分为训练集和测试集;根据所述训练集中的数据和预设的算法构建健康模型;根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。A terminal comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor executes the computer program to: obtain user information, The user information includes a plurality of feature information related to health, and a plurality of feature information unrelated to health; preprocessing the user information to obtain a sample data set; dividing the sample data set into a training set and a test set; The data in the training set and the preset algorithm construct a health model; the health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
  14. 如权利要求13所述的终端,其中,所述处理器执行对所述用户信息进行预处理得到样本数据集时,具体实现:筛选用户信息完整度高于预设值的用户信息;根据筛选过的用户信息以及预设的健康评分规则计算筛选过的每个用户信息对应的健康分;根据所述筛选过的用户信息以及每个用户信息对应的健康分构建样本;根据统计判别法识别出所述样本中是否存在至少一个特征信息缺失或者异常的用户信息;若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方 法将所述用户信息中异常的一个特征信息进行替换,以形成样本数据集。The terminal according to claim 13, wherein the processor performs pre-processing of the user information to obtain a sample data set, and specifically implements: filtering user information whose user information integrity is higher than a preset value; The user information and the preset health score rule calculate the health score corresponding to each filtered user information; construct a sample according to the filtered user information and the health score corresponding to each user information; identify the clinic according to the statistical discriminant method Whether there is at least one user information with missing or abnormal feature information in the sample; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or according to the interpolation The method replaces one feature information of the abnormality in the user information to form a sample data set.
  15. 如权利要求13所述的终端,其中,所述处理器执行对所述用户信息进行预处理得到样本数据集时,具体实现:根据统计判别法识别出所述用户信息中是否存在至少一个特征信息缺失或者异常的用户信息;若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换;根据主元分析方法(Principal Component Analysis,PCA)对所述用户的特征信息进行降维以形成样本数据集。The terminal according to claim 13, wherein the processor performs pre-processing of the user information to obtain a sample data set, and specifically implements: identifying, according to a statistical discriminant method, whether at least one feature information exists in the user information. Missing or abnormal user information; if there is at least one user information with missing or abnormal feature information, filling one feature information missing in the user information according to the interpolation method or abnormally in the user information according to the interpolation method A feature information is replaced; the feature information of the user is reduced in accordance with a Principal Component Analysis (PCA) to form a sample data set.
  16. 如权利要求13所述的终端,其中,所述处理器执行将所述样本数据集划分为训练集和测试集时,具体实现:从所述样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集。The terminal according to claim 13, wherein when the processor performs the process of dividing the sample data set into a training set and a test set, the specific implementation is: randomly sampling a predetermined proportion of sample data from the sample data set to form a training Set, the remaining sample data form a test set.
  17. 一种存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行时使所述处理器执行:获取用户信息,所述用户信息包括与健康有关的多个特征信息,以及与健康无关的多个特征信息;对所述用户信息进行预处理得到样本数据集;将所述样本数据集划分为训练集和测试集;根据所述训练集中的数据和预设的算法构建健康模型;根据所述测试集的数据优化所述健康模型,以根据优化的健康模型对当前用户的健康进行评估得到当前用户的健康评估结果。A storage medium, wherein the storage medium stores a computer program, the computer program including program instructions that, when executed by a processor, cause the processor to perform: acquiring user information, the user information including a plurality of feature information related to health, and a plurality of feature information unrelated to health; preprocessing the user information to obtain a sample data set; dividing the sample data set into a training set and a test set; The centralized data and the preset algorithm construct a health model; the health model is optimized according to the data of the test set to evaluate the current user's health according to the optimized health model to obtain the current user's health assessment result.
  18. 如权利要求17所述的存储介质,其中,所述程序指令当被处理器执行对所述用户信息进行预处理得到样本数据集时使所述处理器执行:筛选用户信息完整度高于预设值的用户信息;根据筛选过的用户信息以及预设的健康评分规则计算筛选过的每个用户信息对应的健康分;根据所述筛选过的用户信息以及每个用户信息对应的健康分构建样本;根据统计判别法识别出所述样本中是否存在至少一个特征信息缺失或者异常的用户信息;若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中异常的一个特征信息进行替换,以形成样本数据集。The storage medium of claim 17, wherein the program instructions, when executed by the processor to perform pre-processing of the user information to obtain a sample data set, cause the processor to perform: screening user information integrity is higher than a preset User information of the value; calculating the health score corresponding to each filtered user information according to the filtered user information and the preset health score rule; constructing a sample according to the filtered user information and the health score corresponding to each user information And identifying, according to the statistical discriminant method, whether at least one user information with missing or abnormal feature information exists in the sample; if there is at least one user information with missing or abnormal feature information, the missing one of the user information according to the interpolation method The feature information is filled or a feature information of the abnormality in the user information is replaced according to an interpolation method to form a sample data set.
  19. 如权利要求17所述的存储介质,其中,所述程序指令当被处理器执行对所述用户信息进行预处理得到样本数据集时使所述处理器执行:根据统计判别法识别出所述用户信息中是否存在至少一个特征信息缺失或者异常的用户信息;若存在至少一个特征信息缺失或者异常的用户信息,根据插补方法将所述用户信息中缺失的一个特征信息进行填充或者根据插补方法将所述用户信息中 异常的一个特征信息进行替换;根据主元分析方法(Principal Component Analysis,PCA)对所述用户的特征信息进行降维以形成样本数据集。The storage medium of claim 17, wherein the program instructions, when executed by the processor to perform pre-processing of the user information to obtain a sample data set, cause the processor to execute: identifying the user according to a statistical discriminant Whether there is at least one user information with missing or abnormal feature information in the information; if there is at least one user information with missing or abnormal feature information, one feature information missing in the user information is filled according to the interpolation method or according to the interpolation method And replacing one feature information of the abnormality in the user information; and performing feature reduction on the feature information of the user according to a Principal Component Analysis (PCA) to form a sample data set.
  20. 如权利要求17所述的存储介质,其中,所述程序指令当被处理器执行将所述样本数据集划分为训练集和测试集时使所述处理器执行:从所述样本数据集中随机抽样预设比例的样本数据形成训练集,其余的样本数据形成测试集。The storage medium of claim 17 wherein said program instructions, when executed by a processor to cause said sample data set to be divided into a training set and a test set, is executed by said processor: randomly sampling from said sample data set The preset ratio of sample data forms a training set, and the remaining sample data form a test set.
PCT/CN2018/082173 2017-04-10 2018-04-08 Health model construction method, terminal and storage medium for health assessment WO2018188533A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710229172.X 2017-04-10
CN201710229172.XA CN107818824A (en) 2017-04-10 2017-04-10 A kind of health model construction method and terminal for health evaluating

Publications (1)

Publication Number Publication Date
WO2018188533A1 true WO2018188533A1 (en) 2018-10-18

Family

ID=61601407

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/082173 WO2018188533A1 (en) 2017-04-10 2018-04-08 Health model construction method, terminal and storage medium for health assessment

Country Status (2)

Country Link
CN (1) CN107818824A (en)
WO (1) WO2018188533A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409976A (en) * 2023-12-15 2024-01-16 深圳市微克科技有限公司 User health monitoring method, system and medium based on intelligent wearable equipment

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818824A (en) * 2017-04-10 2018-03-20 平安科技(深圳)有限公司 A kind of health model construction method and terminal for health evaluating
CN109009020A (en) * 2018-07-26 2018-12-18 湖南城市学院 A kind of health monitoring systems and health monitor method
CN109147949A (en) * 2018-08-16 2019-01-04 辽宁大学 A method of based on post-class processing come for detecting teacher's sub-health state
CN109712711A (en) * 2018-12-12 2019-05-03 平安科技(深圳)有限公司 Health evaluating method, apparatus, electronic equipment and medium based on machine learning
CN110491510A (en) * 2019-07-22 2019-11-22 缤刻普达(北京)科技有限责任公司 Body measuring device, method and health control comments generating means
CN112185556A (en) * 2020-09-15 2021-01-05 珠海格力电器股份有限公司 Method and device for determining health state, storage medium and electronic device
CN113069108A (en) * 2021-03-19 2021-07-06 北京京东拓先科技有限公司 User state monitoring method and device, electronic equipment and storage medium
CN113724875A (en) * 2021-09-10 2021-11-30 北京思泰瑞健康科技有限公司 Method, device and equipment for predicting cancer recurrence rate

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105603101A (en) * 2016-03-03 2016-05-25 博奥颐和健康科学技术(北京)有限公司 Application of system for detecting expression quantity of eight miRNAs in preparation of product for diagnosing or assisting in diagnosing hepatocellular carcinoma
CN105868532A (en) * 2016-03-22 2016-08-17 曾金生 Method and system for intelligently evaluating heart ageing degree
CN106339593A (en) * 2016-08-31 2017-01-18 青岛睿帮信息技术有限公司 Kawasaki disease classification and prediction method based on medical data modeling
CN107818824A (en) * 2017-04-10 2018-03-20 平安科技(深圳)有限公司 A kind of health model construction method and terminal for health evaluating

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105603101A (en) * 2016-03-03 2016-05-25 博奥颐和健康科学技术(北京)有限公司 Application of system for detecting expression quantity of eight miRNAs in preparation of product for diagnosing or assisting in diagnosing hepatocellular carcinoma
CN105868532A (en) * 2016-03-22 2016-08-17 曾金生 Method and system for intelligently evaluating heart ageing degree
CN106339593A (en) * 2016-08-31 2017-01-18 青岛睿帮信息技术有限公司 Kawasaki disease classification and prediction method based on medical data modeling
CN107818824A (en) * 2017-04-10 2018-03-20 平安科技(深圳)有限公司 A kind of health model construction method and terminal for health evaluating

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117409976A (en) * 2023-12-15 2024-01-16 深圳市微克科技有限公司 User health monitoring method, system and medium based on intelligent wearable equipment

Also Published As

Publication number Publication date
CN107818824A (en) 2018-03-20

Similar Documents

Publication Publication Date Title
WO2018188533A1 (en) Health model construction method, terminal and storage medium for health assessment
CN112365987B (en) Diagnostic data abnormality detection method, diagnostic data abnormality detection device, computer device, and storage medium
Muthukrishnan et al. LASSO: A feature selection technique in predictive modeling for machine learning
AU2012245343B2 (en) Predictive modeling
Hwang et al. Estimating lifetime medical costs from censored claims data
Chen Semiparametric regression in size-biased sampling
AU2023226784A1 (en) Interactive model performance monitoring
US11663358B2 (en) Perturbation-based techniques for anonymizing datasets
US20170004401A1 (en) Artificial intuition
JP6484657B2 (en) Information processing apparatus, information processing method, and program
Zhang et al. Patient safety in hospitals–a bayesian analysis of unobservable hospital and specialty level risk factors
García‐Hernandez et al. MMRM vs joint modeling of longitudinal responses and time to study drug discontinuation in clinical trials using a “de jure” estimand
WO2019105800A1 (en) Apparatus for patient data availability analysis
Riani et al. Information criteria for outlier detection avoiding arbitrary significance levels
US8751286B2 (en) Loss distribution calculation system, loss distribution calculation method and loss distribution calculation-use program
Elerian Simulation estimation of continuous-time models with applications to finance
US10621155B2 (en) Method and apparatus for data integration
Mertens et al. Augmented and doubly robust g-estimation of causal effects under a structural nested failure time model
JP2017084249A (en) Data classifying system, method and program, and recording medium therefor
Martinussen et al. Alternatives to the Cox model
JP2017199281A (en) Estimation device, estimation method, and estimation program
Garces et al. Affine Mortality Models with Jumps: Parameter Estimation and Forecasting
Othman et al. Applying Time Series Analysis (Box-Jenkins) to Predict the Number of Patients with Cancer at Benghazi Medical Center
Wu et al. Joint modeling in presence of informative censoring on the retrospective time scale with application to palliative care research
US20200211700A1 (en) Performance opportunity analysis system and method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18783954

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 23.01.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18783954

Country of ref document: EP

Kind code of ref document: A1