WO2017173012A1 - Methods, systems, and devices for evaluating a health condition of an internet user - Google Patents

Methods, systems, and devices for evaluating a health condition of an internet user Download PDF

Info

Publication number
WO2017173012A1
WO2017173012A1 PCT/US2017/024886 US2017024886W WO2017173012A1 WO 2017173012 A1 WO2017173012 A1 WO 2017173012A1 US 2017024886 W US2017024886 W US 2017024886W WO 2017173012 A1 WO2017173012 A1 WO 2017173012A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
users
health
sample users
data
Prior art date
Application number
PCT/US2017/024886
Other languages
French (fr)
Inventor
Yu Xu
Yinzi REN
Yan Sun
Bangyu XIANG
Yaguang Liu
Jianwei YANG
Original Assignee
Alibaba Group Holding Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201610201241.1A external-priority patent/CN107291739A/en
Application filed by Alibaba Group Holding Limited filed Critical Alibaba Group Holding Limited
Priority to EP17776613.6A priority Critical patent/EP3411850A4/en
Publication of WO2017173012A1 publication Critical patent/WO2017173012A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Definitions

  • the disclosure relates to the field of communications, and in particular to methods, systems and devices for evaluating a health condition of an Internet user.
  • Current techniques evaluate a health condition of a user based on medical test data.
  • medical test data sets e.g., blood pressure, blood sugar and body mass index, bone mineral density, cardiovascular, arteriosclerosis, blood oxygen, and other medical test
  • the current techniques then apply various measurement methods (e.g., the equal ratio and/or interval value methods) to calculate a single index score for each of the medical test data sets collected.
  • measurement methods e.g., the equal ratio and/or interval value methods
  • the cost of updating a user's health condition based on obtained medical test data is high. Since the collection cost of the medical test data is relatively high, a health condition obtained based on the medical test data is likely not updated periodically since each update implicates a collection cost required to obtain updated medical test data.
  • the present disclosure describes methods, systems and devices for evaluating a health condition of an Internet user.
  • the method comprises acquiring Internet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Internet activities identified in Internet activity data associated with the first user; extracting characteristic data for the first user and the set of sample users from the Internet activity data; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.
  • an apparatus comprises one or more processors and a non- transitory memory storing computer-executable instructions therein that, when executed by the processor, cause the apparatus to perform the operations of acquiring Internet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Internet activities identified in Internet activity data associated with the first user; extracting characteristic data for the first user and the set of sample users from the Internet activity data; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.
  • characteristic data comprises any one of body mass index ("BMI"); a degree of an addiction to gaming; a degree of preference for junk foods; age; sex; whether the user stays up late frequently; the frequency of purchasing medical products over a given time period (e.g., the last two weeks); or whether the user performs manual labor.
  • BMI body mass index
  • the systems, devices, and methods disclosed herein evaluate the health condition of the user based on Internet activity data, which establishes a new mode for evaluating the health condition of a user versus current techniques.
  • the systems, devices, and methods described herein provide low cost, high feasibility and fast updates.
  • the disclosure also describes a device for evaluating a health condition of an Internet user, comprising the system for evaluating the health condition of the Internet user according to any of the claims mentioned below. Based on the Internet activity data of user, the health condition of the user can be evaluated by the device for evaluating the health condition of an Internet user provided by the embodiment of the disclosure, comprising a system for evaluating the health condition of the Internet user, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates.
  • Figure 1 is a flow diagram illustrating a method for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
  • Figure 2 is a flow diagram illustrating a method for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
  • Figure 3 is a block diagram illustrating a system for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
  • Figure 4 is a block diagram illustrating a system for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
  • Figure 5 is a block diagram illustrating a device for evaluating a health condition of an Internet user, according to some embodiments of the disclosure. Detailed Description
  • terms, such as “a,” “an,” or “the,” again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context.
  • the term “based on” may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
  • Figure 1 is a flow diagram illustrating a method for evaluating a health condition of an Intemet user, according to some embodiments of the disclosure.
  • step SI 01 the method acquires Internet activity data during a predefined period of history for a user to be tested among a plurality of users.
  • characteristic data comprises data such as e- commerce activity data, web browsing activity data, body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, an indication of whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
  • BMI body mass index
  • the set period of history may be the past two weeks, the past month, or the past year, etc.
  • the set period of history may differ for different types of Intemet activity data. For example, when the acquired Intemet activity data is e-commerce activity data, the set period of history may be the past month, whereas when the acquired Internet activity data is whether a user stays up late frequently, the set period of history may be the past two weeks.
  • Internet activity data may be automatically recorded by a network server and may be acquired from the network server (e.g., via an API).
  • Intemet activity data is not private data (e.g., personally identifiable information or health data)
  • the Internet activity data does not need to be explicitly provided by the user and can be acquired easily and with low cost. Therefore, the feasibility of evaluating the health condition of the user based on Intemet activity data is very high.
  • step SI 02 the method evaluates the health condition of the user to be tested based on the obtained Internet activity data.
  • Internet activity data can reflect the health condition of the user.
  • people's daily lives are oftentimes inseparable from their activities involving the Internet. Users engage in Internet activity nearly everywhere; therefore, the disclosure provides a method to evaluate the health condition of the user based on this Internet activity data. It has the revolutionary significance as compared to conventional ways of evaluating the health conditions based on medical test data.
  • the cost of updates to Internet activity data is minimal. Thus, it is both fast and cost effective to update the health condition of the user based on constantly updating Internet activity data.
  • the health condition of the user can be evaluated by a method for evaluating the health condition of the Internet user based on the Internet activity data, which establishes a new mode for evaluating the health condition.
  • the method for evaluating the health condition of the Internet user in the illustrated embodiments provides low cost, high feasibility and fast updates.
  • Figure 2 is a flow diagram illustrating a method for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
  • step S201 the method acquires Internet activity data during a set period of history for a plurality of users, including a user to be tested.
  • step S202 the method selects a set of sample users from the plurality of users according to one or more specified Internet activities.
  • selecting sample users from the plurality of users according to specified Internet activity data in the Internet activity data may include selecting a positive sample user from the plurality of users according to a first specified Internet activity data in the Internet activity data, wherein the positive sample user does not include the user to be tested; and selecting a negative sample user from the plurality of users according to a second specified Internet activity data in the Internet activity data, wherein the negative sample user does not include the user to be tested.
  • selecting a set of sample users from the plurality of users according to specified Internet activity data in the Internet activity data can also further include eliminating overlapping sample users from the positive sample users and the negative sample users respectively, wherein the overlapping sample user refers to a sample user who is both a positive sample user and a negative sample user and balancing the ratio of the number of the positive sample user to the negative sample user so that the ratio of the numbers can be within a set threshold.
  • the first specified Internet activity data may be purchasing activity data under a sports category within a preset first period of history
  • the second specified Internet activity data may be the activity data of searching and browsing a medical registration website in a preset second period of history.
  • the positive sample user refers to a healthy user
  • the negative sample user refers to an unhealthy user.
  • the method extracts characteristic data of the user to be tested and the sample users from the Internet activity data.
  • the characteristic data can comprise any one or more of the body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
  • BMI body mass index
  • step S204 the method uses the characteristic data as parameters of a preset health index calculation model, and then calculates the health index of the user to be tested.
  • steps S202, S203, and S204 may be implemented as part, or the entirety of, step S102 discussed in connection with Figure 1.
  • calculating the parameter of a preset health index calculation model based on the characteristic data, and then obtaining the health index of the user to be tested may comprise: training the health index calculation model by applying the characteristic data of the sample users to obtain a parameter value in the health index calculation model; predicting the health probability of the user to be tested by using the characteristic data of the user to be tested as the input of the health index calculation model with the parameter value as the parameter; and carrying out normalization processing for the health probability of the user to be tested, in order to obtain the health index of the user to be tested.
  • the comparison between the characteristic data of the user to be tested and the corresponding characteristic data of the sample users is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.
  • the method for evaluating a health condition of an Internet user may comprise the following steps.
  • the method may receive Internet activity data during a set period of history for a user to be tested among a plurality of users
  • the method may select positive sample users according to the Internet activity data; [0050] For example, it may be assumed that people who are fond of sports are in good health. Based on such an assumption, the method selects the set of positive samples according to the user's purchasing activity data under a sports category within the past month.
  • the method may conduct an initial cleaning (i.e., excluding) of the user's purchasing activity data under a sports category within the past one month. Considering that the online shopping data may include fake orders, the method may exclude obviously unusual data.
  • the method may further set thresholds for the orders of the user under certain subcategories within the last one year, one month, two weeks, etc. and may then exclude users whose orders within the last one year, one month, two weeks, etc. exceed the set thresholds.
  • the method may add up the total purchasing frequency X within the last one month for each user with the initially cleaned data and calculate the average purchasing frequency ⁇ and variance ⁇ x 2 of the users. Later, the method may standardize the purchasing frequency by utilizing the z-score method to obtain
  • X > 3 may indicate small probability events, which can be deemed as unusual values, thus the positive sample users can be selected from the users satisfying X ⁇ 3.
  • it may be required for the method to select the users with relatively higher purchasing frequency, thus the users satisfying 2 ⁇ X ⁇ 3 may be marked as positive the sample users.
  • step c the method may select negative sample users according to the Internet activity data.
  • selecting negative sample users may comprises summing the searching and browsing frequencies of each user and selecting the users whose total frequency is greater than the set threshold as negative sample users according to the medical registration website searching and browsing data of the users within the last one month.
  • step d the method may exclude the overlapping sample users from the positive and negative sample users.
  • the positive and negative sample users may be overlapping, and the overlapping sample users may be excluded from the positive and negative sample users.
  • the overlapping sample user refers to a sample user who is both a positive sample user and a negative sample user.
  • the method may adjust and control the ratio between the positive and negative sample users. In one embodiment, the adjustment and control step is aimed to prevent a numerical imbalance between the positive and negative sample users.
  • step f the method extracts characteristic data of the user or users to be tested and the positive and negative users from the Internet activity data.
  • the characteristic data comprises body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
  • BMI body mass index
  • unusual values may be cleaned. For example, if the height is 0, the method may set the BMI as a null value. Alternatively, if a BMI value is less than 12 but greater than 40, the BMI may be deemed as unusual data and set as a null value.
  • a user being addicted to gaming or fond of junk food may be an ambiguous concept, that is, a non-binary concept.
  • the method may calculate the a degree of an addiction to or preference for, for example, gaming or junk food of the user based on the purchasing activity under a "gaming" category over the last month and the purchasing activity under a "junk food” category over the past two weeks.
  • the calculated value is in an interval, and the degrees of addiction to gaming and the degree of preference for junk food of the user can be calculated through the following steps:
  • the points in the interval of [Q3 + 1.5IQR, + ⁇ ) may be deemed as unusual points, and the degree that the purchasing frequency is greater than Q3 + 1.5IQR is deemed to be higher.
  • a threshold of Q Q3 + 2.5IQR shall be selected. If the purchasing frequency is much higher than the threshold Q, the data will be deemed as unreliable, and the corresponding degree value may be predicted to be lower. In addition, the corresponding degree of the purchasing frequency close to the threshold may be predicted to be higher.
  • the degree of being addicted to games or being fond of junk food may be calculated by the following formula (2), e -Kx-Q)/Q ⁇ a Formula (2)
  • a is an adjustable parameter.
  • the method may determine that a user stays up late frequently based on the user's time preference of Internet surfing from PC and mobile devices, and the user whose most usual browsing period is between midnight and 5:00 AM. Such a user may be identified as staying up late frequently.
  • the method may first conduct an initial cleaning for the data with the same method used for the positive sample user selection above. The method may then add up the total frequency of the user under such category over the last two weeks, then set a threshold. If the total frequency of the user is greater than the threshold, the value shall be set as a null value.
  • step g the method calculates the health index according to the preset health index calculation model.
  • the method may select a random forest algorithm as a classification model, and according to the sample and characteristics input to the health index calculation model, the health index calculation model firstly predicts whether the user is healthy, and then outputs the health probability (prb) of the user.
  • the health condition of the user is evaluated by the method for evaluating the health condition of the Internet user provided by the embodiment of the disclosure based on the Internet activity data, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates. Moreover, in one embodiment, the method for evaluating a health condition of an Internet user is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.
  • Figure 3 is a block diagram illustrating a system for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
  • system 300 includes an acquisition apparatus 310 and an evaluation apparatus 320.
  • the acquisition apparatus 310 acquires Internet activity data during a set period of history for a user to be tested among a plurality of users.
  • the evaluation apparatus 320 evaluates the health condition of the user to be tested based on the Internet activity data acquired by the acquisition apparatus 310.
  • Internet activity data may comprise e-commerce activity data and/or web browsing activity data, for example, body mass index BMI, a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
  • body mass index BMI body mass index
  • a degree of an addiction to gaming for example, a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex
  • whether the user stays up late frequently e.g., the last two weeks
  • the set period of history may be the past two weeks, the past month, or the past year, etc.
  • the set period of history may differ for different types of Internet activity data. For example, when the acquired Internet activity data are e-commerce activity data, the set period of history can be the past month, whereas when the acquired Internet activity data is whether a user stays up late frequently, the set period of history may be the past two weeks.
  • Internet activity data may be automatically recorded by a network server and may be acquired from the network server (e.g., via an API).
  • Internet activity data is not private data (e.g., personally identifiable information or health data)
  • the Internet activity data does not need to be explicitly provided by the user and can be acquired easily and with low cost. Therefore, the feasibility of evaluating the health condition of the user based on Internet activity data is very high.
  • Internet activity data can reflect the health condition of the user. Specifically, in the current Internet era, people's daily lives are oftentimes inseparable from their activities involving the Internet. Internet activity is carried out nearly everywhere, therefore the disclosure provides a method to evaluate the health condition of the user based on Internet activity data.
  • the health condition of the user can be evaluated by a system for evaluating the health condition of an Internet user based on the Internet activity data, which establishes a new mode for evaluating the health condition.
  • the system for evaluating a health condition of an Internet user in the illustrated embodiments of the disclosure provides low cost, high feasibility and fast updates.
  • Figure 4 is a block diagram illustrating a system for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
  • system 400 includes an acquisition apparatus 410 and an evaluation apparatus 420.
  • the acquisition apparatus 410 acquires Internet activity data during a set period of history for a user to be tested among a plurality of users.
  • the evaluation apparatus 420 evaluates the health condition of the user to be tested based on the Internet activity data acquired by the acquisition apparatus 410.
  • evaluation apparatus 420 includes a selection module 421, an extraction module 422 and a calculation module 423.
  • the selection module 421 selects sample users from the plurality of users according to specified Internet activity data in the Internet activity data.
  • the extraction module 422 extracts characteristic data of the user to be tested from the Internet activity data and characteristic data of the sample users selected by the selection module 421.
  • the calculation module 423 calculates the health index of the user to be tested by using the characteristic data extracted by the extraction module 422 as parameters of a preset health index calculation model.
  • the selection module 421 includes a first selection unit and a second selection unit.
  • the first selection unit selects a positive sample user from the plurality of users according to a first specified Internet activity data in the Internet activity data, and the positive sample user does not include the user to be tested.
  • the second selection unit selects a negative sample user from the plurality of users according to the second specified Internet activity data in the Internet activity data, and the negative sample user does not include the user to be tested.
  • the selection module 421 can further include an elimination unit and a balancing unit.
  • the elimination unit eliminates overlapping sample users from the positive sample users and the negative sample users respectively, and the overlapping sample user refers to a sample user who is both a positive sample user and a negative sample user.
  • the balancing unit balances the ratio of the number of the positive sample user to the negative sample user so that the ratio of the numbers can be within a set range.
  • the first specified Internet activity data may be purchasing activity data under a sports category within a preset first period of history
  • the second specified Internet activity data may be the activity data of searching and browsing a medical registration website in a preset second period of history.
  • the calculation module 423 includes a training unit, a prediction unit and a normalization unit.
  • the training unit trains the health index calculation model by applying the characteristic data of the sample users to obtain a parameter value in the health index calculation model.
  • the prediction unit predicts the health probability of the user to be tested by using the characteristic data of the user to be tested as the input of the health index calculation model based on the parameter value obtained by the training unit as the parameter.
  • the normalization unit normalizes the health probability (predicted by the prediction unit) of the user to be tested, in order to obtain the health index of the user to be tested.
  • the characteristic data can comprise any one or more of the body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
  • BMI body mass index
  • the health condition of the user is evaluated by the system for evaluating the health condition of the Internet user provided by the embodiment of the disclosure based on the Internet activity data, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates. Moreover, in one embodiment, the system for evaluating a health condition of an Internet user is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.
  • Figure 5 is a block diagram illustrating a device for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
  • the device 500 comprises a system for evaluating the health condition of the Internet user.
  • the system for evaluating the health condition of the Internet user can be any system for evaluating the health condition of the Internet user in the above embodiments of the disclosure.
  • the system for evaluating the health condition of the Internet user is used for acquiring Internet activity data during a set period of history for a user to be tested among a plurality of users, and evaluating the health condition of the user to be tested based on the acquired Internet activity data.
  • the device for evaluating a health condition of an Internet user can be a computer, server, etc.
  • the health condition of the user can be evaluated by the device for evaluating the health condition of an Internet user provided by the embodiment of the disclosure, comprising a system for evaluating the health condition of the Internet user, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates.
  • the device for evaluating a health condition of an Internet user is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.

Abstract

Disclosed herein are methods, systems and devices for evaluating a health condition of an Intemet user. In one embodiment, the method comprises acquiring Intemet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Intemet activities identified in Intemet activity data associated with the first user; extracting characteristic data for the first user and the set of sample users from the Intemet activity data; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.

Description

Methods, Systems, and Devices for Evaluating a Health Condition of an Internet User
Cross-Reference to Related Applications
[0001] This application claims the benefit of priority of Chinese Application No. 201610201241.1, titled "Method, System and Apparatus for Evaluation of the Health of Internet Users," filed on March 31, 2016 and U.S. Application No. 15/473,016, titled "Methods, Systems, and Devices for Evaluating a Health Condition of an Internet User," filed on March 29, 2017, both of which are hereby incorporated by reference in their entirety.
Background
Technical Field
[0002] The disclosure relates to the field of communications, and in particular to methods, systems and devices for evaluating a health condition of an Internet user.
Description of Related Art
[0003] Currently, some Internet applications assume the role of a platform for facilitating communications between service providers and service requesters. Specifically, a service provider and a service requester each register on the platform and the service provider provides relevant services to the service requester. In any given scenario, the service provider must be healthy. Therefore, the recent health condition of the service provider is required as a reference index when facilitating connections between a service provider and a service requester.
[0004] Current techniques evaluate a health condition of a user based on medical test data. In general, current techniques receive medical test data sets (e.g., blood pressure, blood sugar and body mass index, bone mineral density, cardiovascular, arteriosclerosis, blood oxygen, and other medical test) data and screen the medical test data sets. The current techniques then apply various measurement methods (e.g., the equal ratio and/or interval value methods) to calculate a single index score for each of the medical test data sets collected. Finally, current techniques calculate a comprehensive health index based on the weighted average of the single index scores.
[0005] The current techniques suffer from numerous disadvantages discussed below.
[0006] First, medical test data of a user is difficult to obtain. Although medical test data of a user can reflect the health condition of the user, the user is often not willing to provide this data as such data is highly private. Thus, the feasibility of current techniques for testing the health condition of the user based on the user's medical test data is extremely low.
[0007] Second, the cost of updating a user's health condition based on obtained medical test data is high. Since the collection cost of the medical test data is relatively high, a health condition obtained based on the medical test data is likely not updated periodically since each update implicates a collection cost required to obtain updated medical test data.
[0008] Third, the credibility of a health condition obtained based on the medical test data is low. When current techniques weight the single index scores during the calculation of the comprehensive health score, the selection of the weight is highly subjective. This results in the reduction of credibility of the health condition obtained based on the medical test data as the comprehensive health score is subject to the subjective determinations made when weighting the single index scores.
Brief Summary
[0009] To remedy the above-described deficiencies, the present disclosure describes methods, systems and devices for evaluating a health condition of an Internet user.
[0010] In one embodiment, the method comprises acquiring Internet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Internet activities identified in Internet activity data associated with the first user; extracting characteristic data for the first user and the set of sample users from the Internet activity data; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.
[0011] In one embodiment, an apparatus comprises one or more processors and a non- transitory memory storing computer-executable instructions therein that, when executed by the processor, cause the apparatus to perform the operations of acquiring Internet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Internet activities identified in Internet activity data associated with the first user; extracting characteristic data for the first user and the set of sample users from the Internet activity data; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.
[0012] In some embodiments, characteristic data comprises any one of body mass index ("BMI"); a degree of an addiction to gaming; a degree of preference for junk foods; age; sex; whether the user stays up late frequently; the frequency of purchasing medical products over a given time period (e.g., the last two weeks); or whether the user performs manual labor.
[0013] The systems, devices, and methods disclosed herein evaluate the health condition of the user based on Internet activity data, which establishes a new mode for evaluating the health condition of a user versus current techniques. In addition, the systems, devices, and methods described herein provide low cost, high feasibility and fast updates.
[0014] In order to achieve the aforementioned purposes, the disclosure also describes a device for evaluating a health condition of an Internet user, comprising the system for evaluating the health condition of the Internet user according to any of the claims mentioned below. Based on the Internet activity data of user, the health condition of the user can be evaluated by the device for evaluating the health condition of an Internet user provided by the embodiment of the disclosure, comprising a system for evaluating the health condition of the Internet user, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates.
Brief Description of the Drawings
[0015] The described drawings herein are used to provide a further understanding of the disclosure and constitute a portion of the application. Exemplary embodiments and descriptions thereof of the disclosure are intended to explain the disclosure rather than improperly limit the disclosure.
[0016] Figure 1 is a flow diagram illustrating a method for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
[0017] Figure 2 is a flow diagram illustrating a method for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
[0018] Figure 3 is a block diagram illustrating a system for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
[0019] Figure 4 is a block diagram illustrating a system for evaluating a health condition of an Internet user, according to some embodiments of the disclosure.
[0020] Figure 5 is a block diagram illustrating a device for evaluating a health condition of an Internet user, according to some embodiments of the disclosure. Detailed Description
[0021] Subject matter will now be described more fully hereinafter with reference to the accompanying drawings, which form a part hereof, and which show, by way of illustration, specific example embodiments. Subject matter may, however, be embodied in a variety of different forms and, therefore, covered or claimed subject matter is intended to be construed as not being limited to any example embodiments set forth herein; example embodiments are provided merely to be illustrative. Likewise, a reasonably broad scope for claimed or covered subject matter is intended. Among other things, for example, subject matter may be embodied as methods, devices, components, or systems. Accordingly, embodiments may, for example, take the form of hardware, software, firmware or any combination thereof (other than software per se). The following detailed description is, therefore, not intended to be taken in a limiting sense.
[0022] Throughout the specification and claims, terms may have nuanced meanings suggested or implied in context beyond an explicitly stated meaning. Likewise, the phrase "in one embodiment" as used herein does not necessarily refer to the same embodiment and the phrase "in another embodiment" as used herein does not necessarily refer to a different embodiment. It is intended, for example, that claimed subject matter include combinations of example embodiments in whole or in part.
[0023] In general, terminology may be understood at least in part from usage in context. For example, terms, such as "and", "or", or "and/or," as used herein may include a variety of meanings that may depend at least in part upon the context in which such terms are used. Typically, "or" if used to associate a list, such as A, B or C, is intended to mean A, B, and C, here used in the inclusive sense, as well as A, B or C, here used in the exclusive sense. In addition, the term "one or more" as used herein, depending at least in part upon context, may be used to describe any feature, structure, or characteristic in a singular sense or may be used to describe combinations of features, structures or characteristics in a plural sense. Similarly, terms, such as "a," "an," or "the," again, may be understood to convey a singular usage or to convey a plural usage, depending at least in part upon context. In addition, the term "based on" may be understood as not necessarily intended to convey an exclusive set of factors and may, instead, allow for existence of additional factors not necessarily expressly described, again, depending at least in part on context.
[0024] The detailed description provided herein is not intended as an extensive or detailed discussion of known concepts, and as such, details that are known generally to those of ordinary skill in the relevant art may have been omitted or may be handled in summary fashion. Certain embodiments of the disclosure will now be discussed with reference to the aforementioned figures, wherein like reference numerals refer to like components.
[0025] Figure 1 is a flow diagram illustrating a method for evaluating a health condition of an Intemet user, according to some embodiments of the disclosure.
[0026] In step SI 01, the method acquires Internet activity data during a predefined period of history for a user to be tested among a plurality of users.
[0027] From the Intemet activity data, the method extracts characteristic data (discussed in more detail herein). In one embodiment, characteristic data comprises data such as e- commerce activity data, web browsing activity data, body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, an indication of whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
[0028] The set period of history may be the past two weeks, the past month, or the past year, etc. The set period of history may differ for different types of Intemet activity data. For example, when the acquired Intemet activity data is e-commerce activity data, the set period of history may be the past month, whereas when the acquired Internet activity data is whether a user stays up late frequently, the set period of history may be the past two weeks.
[0029] Internet activity data may be automatically recorded by a network server and may be acquired from the network server (e.g., via an API). As Intemet activity data is not private data (e.g., personally identifiable information or health data), the Internet activity data does not need to be explicitly provided by the user and can be acquired easily and with low cost. Therefore, the feasibility of evaluating the health condition of the user based on Intemet activity data is very high.
[0030] In step SI 02, the method evaluates the health condition of the user to be tested based on the obtained Internet activity data.
[0031] To some extent, Internet activity data can reflect the health condition of the user. Specifically, in the current Internet era, people's daily lives are oftentimes inseparable from their activities involving the Internet. Users engage in Internet activity nearly everywhere; therefore, the disclosure provides a method to evaluate the health condition of the user based on this Internet activity data. It has the revolutionary significance as compared to conventional ways of evaluating the health conditions based on medical test data. Moreover, not only is Internet activity data frequently updated, the cost of updates to Internet activity data is minimal. Thus, it is both fast and cost effective to update the health condition of the user based on constantly updating Internet activity data.
[0032] According to the embodiment illustrated in Figure 1 and herein, the health condition of the user can be evaluated by a method for evaluating the health condition of the Internet user based on the Internet activity data, which establishes a new mode for evaluating the health condition. In addition, the method for evaluating the health condition of the Internet user in the illustrated embodiments provides low cost, high feasibility and fast updates.
[0033] Figure 2 is a flow diagram illustrating a method for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
[0034] In step S201, the method acquires Internet activity data during a set period of history for a plurality of users, including a user to be tested.
[0035] In step S202, the method selects a set of sample users from the plurality of users according to one or more specified Internet activities.
[0036] In one embodiment, selecting sample users from the plurality of users according to specified Internet activity data in the Internet activity data may include selecting a positive sample user from the plurality of users according to a first specified Internet activity data in the Internet activity data, wherein the positive sample user does not include the user to be tested; and selecting a negative sample user from the plurality of users according to a second specified Internet activity data in the Internet activity data, wherein the negative sample user does not include the user to be tested.
[0037] On this basis, in other embodiments of the disclosure, selecting a set of sample users from the plurality of users according to specified Internet activity data in the Internet activity data can also further include eliminating overlapping sample users from the positive sample users and the negative sample users respectively, wherein the overlapping sample user refers to a sample user who is both a positive sample user and a negative sample user and balancing the ratio of the number of the positive sample user to the negative sample user so that the ratio of the numbers can be within a set threshold.
[0038] For example, the first specified Internet activity data may be purchasing activity data under a sports category within a preset first period of history, while the second specified Internet activity data may be the activity data of searching and browsing a medical registration website in a preset second period of history.
[0039] In one embodiment, the positive sample user refers to a healthy user, and the negative sample user refers to an unhealthy user. [0040] In step S203, the method extracts characteristic data of the user to be tested and the sample users from the Internet activity data.
[0041] The characteristic data can comprise any one or more of the body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
[0042] In step S204, the method uses the characteristic data as parameters of a preset health index calculation model, and then calculates the health index of the user to be tested.
[0043] In one embodiment, steps S202, S203, and S204 may be implemented as part, or the entirety of, step S102 discussed in connection with Figure 1.
[0044] In one embodiment, calculating the parameter of a preset health index calculation model based on the characteristic data, and then obtaining the health index of the user to be tested may comprise: training the health index calculation model by applying the characteristic data of the sample users to obtain a parameter value in the health index calculation model; predicting the health probability of the user to be tested by using the characteristic data of the user to be tested as the input of the health index calculation model with the parameter value as the parameter; and carrying out normalization processing for the health probability of the user to be tested, in order to obtain the health index of the user to be tested.
[0045] The comparison between the characteristic data of the user to be tested and the corresponding characteristic data of the sample users is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.
[0046] The following content further explains the method for evaluating the health condition of the Internet user in one embodiment by a specific application example.
[0047] In the embodiment, the method for evaluating a health condition of an Internet user may comprise the following steps.
[0048] In step a, the method may receive Internet activity data during a set period of history for a user to be tested among a plurality of users
[0049] In step b, the method may select positive sample users according to the Internet activity data; [0050] For example, it may be assumed that people who are fond of sports are in good health. Based on such an assumption, the method selects the set of positive samples according to the user's purchasing activity data under a sports category within the past month.
[0051] First, the method may conduct an initial cleaning (i.e., excluding) of the user's purchasing activity data under a sports category within the past one month. Considering that the online shopping data may include fake orders, the method may exclude obviously unusual data. The method may further set thresholds for the orders of the user under certain subcategories within the last one year, one month, two weeks, etc. and may then exclude users whose orders within the last one year, one month, two weeks, etc. exceed the set thresholds.
[0052] Afterwards, the method may add up the total purchasing frequency X within the last one month for each user with the initially cleaned data and calculate the average purchasing frequency μ and variance <x2 of the users. Later, the method may standardize the purchasing frequency by utilizing the z-score method to obtain
X =——— Formula (1)
σ
[0053] In Formula 1, X > 3 may indicate small probability events, which can be deemed as unusual values, thus the positive sample users can be selected from the users satisfying X < 3. In addition, it may be required for the method to select the users with relatively higher purchasing frequency, thus the users satisfying 2 < X < 3 may be marked as positive the sample users.
[0054] In step c, the method may select negative sample users according to the Internet activity data.
[0055] In one embodiment, selecting negative sample users may comprises summing the searching and browsing frequencies of each user and selecting the users whose total frequency is greater than the set threshold as negative sample users according to the medical registration website searching and browsing data of the users within the last one month.
[0056] In step d, the method may exclude the overlapping sample users from the positive and negative sample users.
[0057] The positive and negative sample users may be overlapping, and the overlapping sample users may be excluded from the positive and negative sample users. The overlapping sample user refers to a sample user who is both a positive sample user and a negative sample user. [0058] In step e, the method may adjust and control the ratio between the positive and negative sample users. In one embodiment, the adjustment and control step is aimed to prevent a numerical imbalance between the positive and negative sample users.
[0059] In step f, the method extracts characteristic data of the user or users to be tested and the positive and negative users from the Internet activity data.
[0060] In one embodiment, the characteristic data comprises body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
[0061] BMI may be used to measure the weight and health condition of the human body. It is a value of body mass divided by the square of the body height, that is, BMI = mass/height2, wherein unit of mass is kilograms, while the unit of height is meters. In one embodiment, when calculating BMI, unusual values may be cleaned. For example, if the height is 0, the method may set the BMI as a null value. Alternatively, if a BMI value is less than 12 but greater than 40, the BMI may be deemed as unusual data and set as a null value.
[0062] As a second example, a user being addicted to gaming or fond of junk food may be an ambiguous concept, that is, a non-binary concept. In this example, the method may calculate the a degree of an addiction to or preference for, for example, gaming or junk food of the user based on the purchasing activity under a "gaming" category over the last month and the purchasing activity under a "junk food" category over the past two weeks. The calculated value is in an interval, and the degrees of addiction to gaming and the degree of preference for junk food of the user can be calculated through the following steps:
(1) Respectively set thresholds for the orders of the user under certain subcategory within the last one year, one month, or two weeks, etc., and then exclude the users whose orders within the last one year, one month, two weeks, etc. exceed the set thresholds;
(2) Add up the total purchasing frequency of the user according to the initially cleaned data, and calculate the first quartile Ql and the third quartile Q3, and then compute the interquartile range (IQR);
(3) In one embodiment, the points in the interval of [Q3 + 1.5IQR, +∞) may be deemed as unusual points, and the degree that the purchasing frequency is greater than Q3 + 1.5IQR is deemed to be higher. However, considering that the result may be affected by unreliable data such as fake orders, a threshold of Q = Q3 + 2.5IQR shall be selected. If the purchasing frequency is much higher than the threshold Q, the data will be deemed as unreliable, and the corresponding degree value may be predicted to be lower. In addition, the corresponding degree of the purchasing frequency close to the threshold may be predicted to be higher. Thus, continuing the previous example, the degree of being addicted to games or being fond of junk food (or other non-binary characteristic data) may be calculated by the following formula (2), e -Kx-Q)/Q\a Formula (2)
[0063] Wherein, a is an adjustable parameter.
[0064] As a third example, the method may determine that a user stays up late frequently based on the user's time preference of Internet surfing from PC and mobile devices, and the user whose most usual browsing period is between midnight and 5:00 AM. Such a user may be identified as staying up late frequently.
[0065] As a fourth example, with respect to the frequency of purchasing medical products over the last two weeks, based on the purchasing data under the medicine category over the last two weeks, the method may first conduct an initial cleaning for the data with the same method used for the positive sample user selection above. The method may then add up the total frequency of the user under such category over the last two weeks, then set a threshold. If the total frequency of the user is greater than the threshold, the value shall be set as a null value.
[0066] As a fifth example, with respect to whether a user performs manual labor, according to the work that the user is engaged in (student, white collar, merchant, civil servant, manufacturing worker, medical staff, media, construction worker, shop assistant, waiter/waitress), users who work as manufacturing workers and construction workers are marked as being performing in manual labor.
[0067] In step g, the method calculates the health index according to the preset health index calculation model.
[0068] In many embodiments, there is often a significant amount of empty data in the characteristic data. Thus, in some embodiments the method may select a random forest algorithm as a classification model, and according to the sample and characteristics input to the health index calculation model, the health index calculation model firstly predicts whether the user is healthy, and then outputs the health probability (prb) of the user. The method may normalize the output probability value prb, suppose the maximum of the probability value prb in all users (positive and negative sample users and users to be tested) as max _prb, the minimum as min _prb, and calculate the health index according to the following formula (3): hea.lth_ind.ex = * 100 Formula (3)
max_prb-min_prb
[0069] The health condition of the user is evaluated by the method for evaluating the health condition of the Internet user provided by the embodiment of the disclosure based on the Internet activity data, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates. Moreover, in one embodiment, the method for evaluating a health condition of an Internet user is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.
[0070] Figure 3 is a block diagram illustrating a system for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
[0071] As illustrated in Figure 3, system 300 includes an acquisition apparatus 310 and an evaluation apparatus 320. In one embodiment, the acquisition apparatus 310 acquires Internet activity data during a set period of history for a user to be tested among a plurality of users. In one embodiment, the evaluation apparatus 320 evaluates the health condition of the user to be tested based on the Internet activity data acquired by the acquisition apparatus 310.
[0072] Internet activity data may comprise e-commerce activity data and/or web browsing activity data, for example, body mass index BMI, a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
[0073] The set period of history may be the past two weeks, the past month, or the past year, etc. The set period of history may differ for different types of Internet activity data. For example, when the acquired Internet activity data are e-commerce activity data, the set period of history can be the past month, whereas when the acquired Internet activity data is whether a user stays up late frequently, the set period of history may be the past two weeks.
[0074] Internet activity data may be automatically recorded by a network server and may be acquired from the network server (e.g., via an API). As Internet activity data is not private data (e.g., personally identifiable information or health data), the Internet activity data does not need to be explicitly provided by the user and can be acquired easily and with low cost. Therefore, the feasibility of evaluating the health condition of the user based on Internet activity data is very high. [0075] To some extent, Internet activity data can reflect the health condition of the user. Specifically, in the current Internet era, people's daily lives are oftentimes inseparable from their activities involving the Internet. Internet activity is carried out nearly everywhere, therefore the disclosure provides a method to evaluate the health condition of the user based on Internet activity data. It has the revolutionary significance as compared to conventional ways of detecting the health conditions based on medical test data. Moreover, not only is Internet activity data frequently updated, but the cost of updates to Internet activity data are minimal. Thus it is both fast and cost effective to update the health condition of the user based on constantly updating Internet activity data.
[0076] According to the embodiments illustrated herein, the health condition of the user can be evaluated by a system for evaluating the health condition of an Internet user based on the Internet activity data, which establishes a new mode for evaluating the health condition. In addition, the system for evaluating a health condition of an Internet user in the illustrated embodiments of the disclosure provides low cost, high feasibility and fast updates.
[0077] Figure 4 is a block diagram illustrating a system for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
[0078] As illustrated in Figure 4, system 400 includes an acquisition apparatus 410 and an evaluation apparatus 420. In one embodiment, the acquisition apparatus 410 acquires Internet activity data during a set period of history for a user to be tested among a plurality of users. In one embodiment, the evaluation apparatus 420 evaluates the health condition of the user to be tested based on the Internet activity data acquired by the acquisition apparatus 410.
[0079] In the illustrated embodiment, evaluation apparatus 420 includes a selection module 421, an extraction module 422 and a calculation module 423. In one embodiment, the selection module 421 selects sample users from the plurality of users according to specified Internet activity data in the Internet activity data. In one embodiment, the extraction module 422 extracts characteristic data of the user to be tested from the Internet activity data and characteristic data of the sample users selected by the selection module 421. In one embodiment, the calculation module 423 calculates the health index of the user to be tested by using the characteristic data extracted by the extraction module 422 as parameters of a preset health index calculation model.
[0080] In some embodiments, the selection module 421 includes a first selection unit and a second selection unit. In one embodiment, the first selection unit selects a positive sample user from the plurality of users according to a first specified Internet activity data in the Internet activity data, and the positive sample user does not include the user to be tested. In one embodiment, the second selection unit selects a negative sample user from the plurality of users according to the second specified Internet activity data in the Internet activity data, and the negative sample user does not include the user to be tested.
[0081] On this basis, in other embodiments, the selection module 421 can further include an elimination unit and a balancing unit. In one embodiment, the elimination unit eliminates overlapping sample users from the positive sample users and the negative sample users respectively, and the overlapping sample user refers to a sample user who is both a positive sample user and a negative sample user. In one embodiment, the balancing unit balances the ratio of the number of the positive sample user to the negative sample user so that the ratio of the numbers can be within a set range.
[0082] For example, the first specified Internet activity data may be purchasing activity data under a sports category within a preset first period of history, while the second specified Internet activity data may be the activity data of searching and browsing a medical registration website in a preset second period of history.
[0083] In some embodiments, the calculation module 423 includes a training unit, a prediction unit and a normalization unit. In one embodiment, the training unit trains the health index calculation model by applying the characteristic data of the sample users to obtain a parameter value in the health index calculation model. In one embodiment, the prediction unit predicts the health probability of the user to be tested by using the characteristic data of the user to be tested as the input of the health index calculation model based on the parameter value obtained by the training unit as the parameter. In one embodiment, the normalization unit normalizes the health probability (predicted by the prediction unit) of the user to be tested, in order to obtain the health index of the user to be tested.
[0084] The characteristic data can comprise any one or more of the body mass index (BMI), a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, whether the user stays up late frequently, the frequency of purchasing medical products over a given time period (e.g., the last two weeks), and whether the user performs manual labor.
[0085] The health condition of the user is evaluated by the system for evaluating the health condition of the Internet user provided by the embodiment of the disclosure based on the Internet activity data, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates. Moreover, in one embodiment, the system for evaluating a health condition of an Internet user is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher. [0086] Figure 5 is a block diagram illustrating a device for evaluating a health condition of an Internet user according to some embodiments of the disclosure.
[0087] As illustrated in Figure 5, the device 500 comprises a system for evaluating the health condition of the Internet user. The system for evaluating the health condition of the Internet user can be any system for evaluating the health condition of the Internet user in the above embodiments of the disclosure.
[0088] The system for evaluating the health condition of the Internet user is used for acquiring Internet activity data during a set period of history for a user to be tested among a plurality of users, and evaluating the health condition of the user to be tested based on the acquired Internet activity data.
[0089] The device for evaluating a health condition of an Internet user can be a computer, server, etc.
[0090] Based on the Internet activity data of the user, the health condition of the user can be evaluated by the device for evaluating the health condition of an Internet user provided by the embodiment of the disclosure, comprising a system for evaluating the health condition of the Internet user, which establishes a new mode for evaluating the health condition, with low cost, high feasibility and fast updates. Moreover, in one embodiment, the device for evaluating a health condition of an Internet user is capable of objectively reflecting the health condition of the user to be tested, thus the reliability of the health condition evaluation result is higher.
[0091] The above are only embodiments of the disclosure, which are not intended to limit the scope of the disclosure. Any alterations, equivalent replacements and improvements, without departing from the spirit and principle of the disclosure shall fall within the protection scope of the disclosure.

Claims

Claims What is claimed is:
1. A method comprising: acquiring Internet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Internet activities identified in Internet activity data associated with the first user; extracting, from the Internet activity data, characteristic data for the first user and the set of sample users; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.
2. The method of claim 1 wherein characteristic data comprises one of e-commerce data, web browsing data, body mass index data, a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, an indication of whether a user stays up late frequently, the frequency of purchasing medical products over a given time period, and whether a user performs manual labor.
3. The method of claim 1 wherein acquiring Internet activity data associated with a plurality of users comprises acquiring Internet activity data captured during a predefined period, the predefined period selected based on the type of the Internet activity data.
4. The method of claim 1, wherein selecting a set of sample users comprises: selecting a set of positive sample users based on a first specified Internet activity; and selecting a set of negative sample users based on a second specified Internet activity.
5. The method of claim 4, wherein selecting a set of sample users further comprises: identifying a set of overlapping sample users appearing in both the set of positive sample users and the set of negative sample users; eliminating the overlapping sample users from the set of positive sample users and the set of negative sample users; and balancing the ratio of the number of the positive sample users to the negative sample users according to a set ratio threshold.
6. The method of claim 4, wherein the first specified Internet activity comprises purchasing activity associated with a sports category within a preset first period of history and the second specified Intemet activity comprises searching and browsing a medical registration website in a preset second period of history.
7. The method of claim 1, wherein calculating a health index for the first user based on the health index calculation model comprises: training the health index calculation model using the characteristic data of the sample users to obtain a parameter of the health index calculation model; predicting a health probability of the first user using the characteristic data of the first user as an input to the health index calculation model; and normalizing the health probability of the first user to obtain the health index of the first user.
8. The method of claim 7, wherein the health index calculation model comprises a random forest.
9. The method of claim 7, wherein normalizing the health probability of the first user comprises: calculating a maximum heath probability and a minimum health probability for a set of users including the first user; and normalizing the health probability of the first user based on the maximum health probability and minimum health probability.
10. The method of claim 1 , wherein extracting characteristic data of a user comprises: calculating a total purchasing frequency of a user with respect to a category of goods; calculating a threshold based on a first quartile, a third quartile, and an interquartile range of total purchasing frequency of the user; and determining a degree of preference for the category of goods based on the threshold.
11. An apparatus comprising: one or more processors; and a non-transitory memory storing computer-executable instructions therein that, when executed by the processors, cause the apparatus to perform the operations of: acquiring Internet activity data associated with a plurality of users, the plurality of users including a first user; selecting a set of sample users from the plurality of users based on a plurality of specified Internet activities identified in Internet activity data associated with the first user; extracting, from the Internet activity data, characteristic data for the first user and the set of sample users; utilizing the characteristic data as at least one parameter of a health index calculation model; and calculating a health index for the first user based on the health index calculation model.
12. The apparatus of claim 11 wherein characteristic data comprises one of e-commerce data, web browsing data, body mass index data, a degree of an addiction to gaming, a degree of preference for junk foods, age, or sex, an indication of whether a user stays up late frequently, the frequency of purchasing medical products over a given time period, and whether a user performs manual labor.
13. The apparatus of claim 11 wherein acquiring Internet activity data associated with a plurality of users comprises acquiring Internet activity data captured during a predefined period, the predefined period selected based on the type of the Internet activity data.
14. The apparatus of claim 11, wherein selecting a set of sample users comprises: selecting a set of positive sample users based on a first specified Internet activity; and selecting a set of negative sample users based on a second specified Internet activity.
15. The apparatus of claim 14, wherein selecting a set of sample users further comprises: identifying a set of overlapping sample users appearing in both the set of positive sample users and the set of negative sample users; eliminating the overlapping sample users from the set of positive sample users and the set of negative sample users; and balancing the ratio of the number of the positive sample users to the negative sample users according to a set ratio threshold.
16. The apparatus of claim 14, wherein the first specified Internet activity comprises purchasing activity associated with a sports category within a preset first period of history and the second specified Internet activity comprises searching and browsing a medical registration website in a preset second period of history.
17. The apparatus of claim 11 , wherein calculating a health index for the first user based on the health index calculation model comprises: training the health index calculation model using the characteristic data of the sample users to obtain a parameter of the health index calculation model; predicting a health probability of the first user using the characteristic data of the first user as an input to the health index calculation model; and normalizing the health probability of the first user to obtain the health index of the first user.
18. The apparatus of claim 17, wherein the health index calculation model comprises a random forest.
19. The apparatus of claim 17, wherein normalizing the health probability of the first user comprises: calculating a maximum heath probability and a minimum health probability for a set of users including the first user; and normalizing the health probability of the first user based on the maximum health probability and minimum health probability.
20. The apparatus of claim 11, wherein extracting characteristic data of a user comprises: calculating a total purchasing frequency of a user with respect to a category of goods; calculating a threshold based on a first quartile, a third quartile, and an interquartile range of total purchasing frequency of the user; and determining a degree of preference for the category of goods based on the threshold.
PCT/US2017/024886 2016-03-29 2017-03-30 Methods, systems, and devices for evaluating a health condition of an internet user WO2017173012A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP17776613.6A EP3411850A4 (en) 2016-03-31 2017-03-30 Methods, systems, and devices for evaluating a health condition of an internet user

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201615473016A 2016-03-29 2016-03-29
US15/473,016 2016-03-29
CN201610201241.1A CN107291739A (en) 2016-03-31 2016-03-31 Evaluation method, system and the equipment of network user's health status
CN201610201241.1 2016-03-31

Publications (1)

Publication Number Publication Date
WO2017173012A1 true WO2017173012A1 (en) 2017-10-05

Family

ID=59966472

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2017/024886 WO2017173012A1 (en) 2016-03-29 2017-03-30 Methods, systems, and devices for evaluating a health condition of an internet user

Country Status (1)

Country Link
WO (1) WO2017173012A1 (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030105658A1 (en) * 1999-12-15 2003-06-05 Keith D Grzelak Customer profiling apparatus for conducting customer behavior pattern analysis, and method for comparing customer behavior patterns
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
US20120192214A1 (en) * 2009-12-22 2012-07-26 Resonate Networks Method and apparatus for delivering targeted content to television viewers
US8275635B2 (en) * 2007-02-16 2012-09-25 Bodymedia, Inc. Integration of lifeotypes with devices and systems
US20130211858A1 (en) * 2010-09-29 2013-08-15 Dacadoo Ag Automated health data acquisition, processing and communication system
US20130282733A1 (en) * 2012-04-24 2013-10-24 Blue Kai, Inc. Profile noise anonymity for mobile users
US20140074510A1 (en) * 2012-09-07 2014-03-13 Jennifer Clement McClung Personalized Health Score Generator
US8930204B1 (en) * 2006-08-16 2015-01-06 Resource Consortium Limited Determining lifestyle recommendations using aggregated personal information

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6839680B1 (en) * 1999-09-30 2005-01-04 Fujitsu Limited Internet profiling
US20030105658A1 (en) * 1999-12-15 2003-06-05 Keith D Grzelak Customer profiling apparatus for conducting customer behavior pattern analysis, and method for comparing customer behavior patterns
US8930204B1 (en) * 2006-08-16 2015-01-06 Resource Consortium Limited Determining lifestyle recommendations using aggregated personal information
US8275635B2 (en) * 2007-02-16 2012-09-25 Bodymedia, Inc. Integration of lifeotypes with devices and systems
US20120192214A1 (en) * 2009-12-22 2012-07-26 Resonate Networks Method and apparatus for delivering targeted content to television viewers
US20130211858A1 (en) * 2010-09-29 2013-08-15 Dacadoo Ag Automated health data acquisition, processing and communication system
US20130282733A1 (en) * 2012-04-24 2013-10-24 Blue Kai, Inc. Profile noise anonymity for mobile users
US20140074510A1 (en) * 2012-09-07 2014-03-13 Jennifer Clement McClung Personalized Health Score Generator

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3411850A4 *

Similar Documents

Publication Publication Date Title
US20170286624A1 (en) Methods, Systems, and Devices for Evaluating a Health Condition of an Internet User
Vega et al. Influenza surveillance in Europe: establishing epidemic thresholds by the moving epidemic method
Fullerton et al. Is the Modified Early Warning Score (MEWS) superior to clinician judgement in detecting critical illness in the pre-hospital environment?
Twells et al. Current and predicted prevalence of obesity in Canada: a trend analysis
US10216909B2 (en) Health monitoring
US20190290214A1 (en) Health Management Mobile Terminal, Method, and Server
Al Haddad et al. Role of the timed up and go test in patients with chronic obstructive pulmonary disease
Freak-Poli et al. Risk of cardiovascular disease and diabetes in a working population with sedentary occupations
US10153058B2 (en) Machine learning for hepatitis C
Lane et al. Screening strategies to identify sepsis in the prehospital setting: a validation study
Brett Hauber et al. Estimating importance weights for the IWQOL-Lite using conjoint analysis
de Souza de Silva et al. Association between cardiorespiratory fitness, obesity, and health care costs: the Veterans Exercise Testing Study
Fingleton et al. Towards individualised treatment in COPD
KR102342770B1 (en) A health management counseling system using the distribution of predicted disease values
Jennings et al. National Heart Foundation of Australia: position statement on coronary artery calcium scoring for the primary prevention of cardiovascular disease in Australia
Jácome et al. Validity, reliability and minimal detectable change of the balance evaluation systems test (BESTest), mini-BESTest and brief-BESTest in patients with end-stage renal disease
Wada et al. Diversity of respiratory impedance based on quantitative computed tomography in patients with COPD
McKechnie et al. Frailty and incident heart failure in older men: the British Regional Heart Study
Feldman et al. Physical therapists' ability to predict hypertensive status based on visual observation with and without past medical history
WO2017173012A1 (en) Methods, systems, and devices for evaluating a health condition of an internet user
US10325067B1 (en) Statistical quality control of medical laboratory results
Amin et al. Estimated medical cost reductions associated with apixaban in real-world patients with non-valvular atrial fibrillation
CN110097940A (en) A kind of physical examination item recommendation method and Related product
Kanesarajah et al. The relationship between SF-6D utility scores and lifestyle factors across three life stages: evidence from the Australian Longitudinal Study on Women’s Health
KR20170141502A (en) Method, system and computer-readable recording medium for conducting risk-based monitoring for clinical test

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 2017776613

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2017776613

Country of ref document: EP

Effective date: 20180903

NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17776613

Country of ref document: EP

Kind code of ref document: A1