CN113362120B - Group determination method and device, electronic equipment and computer readable storage medium - Google Patents

Group determination method and device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113362120B
CN113362120B CN202110916032.6A CN202110916032A CN113362120B CN 113362120 B CN113362120 B CN 113362120B CN 202110916032 A CN202110916032 A CN 202110916032A CN 113362120 B CN113362120 B CN 113362120B
Authority
CN
China
Prior art keywords
population
probability
predetermined
group
populations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110916032.6A
Other languages
Chinese (zh)
Other versions
CN113362120A (en
Inventor
段冰力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202110916032.6A priority Critical patent/CN113362120B/en
Publication of CN113362120A publication Critical patent/CN113362120A/en
Application granted granted Critical
Publication of CN113362120B publication Critical patent/CN113362120B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Game Theory and Decision Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure relates to a group determination method and device, electronic equipment and a computer readable storage medium. The population determination method comprises the following steps: acquiring characteristics of a user; inputting the characteristics into a probability model to obtain the predicted probability that the user belongs to a first group; and determining whether the user belongs to a first group according to the predicted probability, wherein the probability model is obtained by utilizing a training method executed by clustering a predetermined group.

Description

Group determination method and device, electronic equipment and computer readable storage medium
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a group determination method and apparatus, an electronic device, and a computer-readable storage medium.
Background
Often, business parties will utilize a/B testing methods to identify revenue for marketing campaigns, whether new application User Interfaces (UIs) or new product functionality is appropriate for online, etc. In order to carry out A/B test, a batch of users are selected in advance, the users are divided into an experimental group and a control group, and a new scheme on the experimental group is used by the experimental group, but the new scheme is not provided for the control group. Then, the evaluation indexes of the experimental group and the control group were compared. If the evaluation index of the experimental group is higher than that of the control group, the new protocol is considered to be effective for the entire users.
However, such a test method only sees the overall effect and cannot identify which specific group is indeed valid and which group is actually invalid, and secondly, such a test method cannot identify whether there is a simpson paradox problem.
Disclosure of Invention
The present disclosure provides a group determination method and apparatus to at least solve the above-mentioned problems in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, a group determination method is provided, which obtains characteristics of a user; inputting the characteristics into a probability model to obtain the predicted probability that the user belongs to a first group; and determining whether the user belongs to a first group according to the predicted probability, wherein the probability model is obtained by utilizing a training method executed by clustering a predetermined group.
According to a first aspect of an embodiment of the present disclosure, the training method includes: clustering a predetermined population based on characteristics of the predetermined population to obtain a plurality of clustering populations; determining a first population of the plurality of clustered populations; determining the proportion of the first population in the plurality of clustering populations as a preset probability; inputting the characteristics of the predetermined group into a probability prediction model to obtain an estimated probability; and adjusting parameters of the probability model according to the estimated probability and the preset probability so as to train the probability model.
According to a first aspect of an embodiment of the present disclosure, the training method further comprises: the predetermined population is selected from a target population.
According to a first aspect of embodiments of the present disclosure, the predetermined population comprises a second population and a third population.
According to a first aspect of an embodiment of the present disclosure, the training method further comprises: and selecting the second population and the third population from the target populations to form the predetermined population according to the predetermined flow ratio of the second population to the third population.
According to a first aspect of embodiments of the present disclosure, the determining a first population of the plurality of clustering populations comprises: determining a cluster group of which a difference between the evaluation index of the second group and the evaluation index of the third group is greater than or equal to a predetermined threshold value as the first group.
According to a first aspect of embodiments of the present disclosure, the evaluation index comprises at least one of: browsing duration, resource click rate, next-day retention rate and user conversion rate.
According to a first aspect of the embodiments of the present disclosure, the adjusting the parameter of the probability model according to the estimated probability and the preset probability includes: determining a loss function of the probability model according to the estimated probability and the preset probability; and adjusting parameters of the probability model according to the loss function.
According to a first aspect of embodiments of the present disclosure, the determining whether the user belongs to a first group according to the predicted probability includes: comparing the predicted probability to a predetermined probability threshold; determining that the user belongs to a first group when the predicted probability is greater than or equal to a predetermined probability threshold.
According to a second aspect of embodiments of the present disclosure, there is provided a population determining apparatus including: an acquisition unit configured to acquire a feature of a user; a prediction unit configured to input the features into a probability model, resulting in a predicted probability that the user belongs to a first group; a determining unit configured to determine whether the user belongs to a first group according to the predicted probability, wherein the probability model is obtained by training the probability model through a training device, and the training device is configured to train the probability model by clustering a predetermined group.
According to a second aspect of embodiments of the present disclosure, the training apparatus comprises: the clustering unit is configured to cluster a predetermined population based on the characteristics of the predetermined population to obtain a plurality of clustering populations; a population determination unit configured to determine a first population of the plurality of clustering populations; a preset probability determination unit configured to determine a proportion of the first population in the plurality of clustering populations as a preset probability; a probability estimation unit configured to input the features of the predetermined population to a probability prediction model, resulting in an estimated probability; a training unit configured to adjust parameters of the probabilistic model according to the estimated probability and the preset probability to train the probabilistic model.
According to a second aspect of embodiments of the present disclosure, the training apparatus further comprises: a selection unit configured to select the predetermined population from a target population.
According to a second aspect of embodiments of the present disclosure, the predetermined population includes a second population and a third population.
According to a second aspect of embodiments of the present disclosure, the training apparatus further comprises: and the selecting unit is configured to select the second population and the third population from the target population to form the predetermined population according to the predetermined flow ratio of the second population to the third population.
According to a second aspect of embodiments of the present disclosure, the population determining unit is configured to: determining a cluster group of which a difference between the evaluation index of the second group and the evaluation index of the third group is greater than or equal to a predetermined threshold value as the first group.
According to a second aspect of embodiments of the present disclosure, the evaluation index includes at least one of: browsing duration, resource click rate, next-day retention rate and user conversion rate.
According to a second aspect of embodiments of the present disclosure, the training unit is configured to: determining a loss function of the probability model according to the estimated probability and the preset probability; and adjusting parameters of the probability model according to the loss function.
According to a second aspect of embodiments of the present disclosure, the determining unit is configured to: comparing the predicted probability to a predetermined probability threshold; determining that the user belongs to a first group when the predicted probability is greater than or equal to a predetermined probability threshold.
According to a third aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions, wherein the processor is configured to execute the instructions to implement a population determination method according to the present disclosure.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform a population determination method according to the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, instructions in which are executable by a processor of a computer device to perform a population determination method according to the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings one of the following beneficial effects: with the probability model according to the exemplary embodiments of the present disclosure, the probability that a user belongs to a first group may be accurately predicted based on the characteristics of the user in a target group; in addition, the probability predicted by the probability model can help to determine whether the user belongs to the first population (for example, an effective population), so that the actual scale of the first population can be obtained, whether the test result of the test scheme on the target population has the Simpson paradox problem or not can be identified, and the influence of the test scheme on the target population can be comprehensively analyzed conveniently; therefore, according to the population determining method and apparatus of the exemplary embodiment of the disclosure, not only can help to identify how the overall test effect of the test scheme is on the target population, but also help to comprehensively and deeply analyze the test effect, for example, help to specifically analyze which type of population is effective and which type of population is ineffective, and how large the size of the effective population is, which is beneficial to avoiding the simpson paradox problem, for example, can identify the effective population and the ineffective population from the target population, obtain the size of the effective population and/or the ineffective population relative to the target population, obtain the predetermined probability threshold corresponding to the effective population, and also obtain information such as the predetermined effective standard corresponding to the effective population, thereby helping to analyze the characteristics of the effective population and help business side to make decisions.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 shows a scenario diagram of A/B testing according to the present disclosure.
Fig. 2 illustrates a flow chart of a population determination method according to an exemplary embodiment of the present disclosure.
FIG. 3 shows a scenario diagram of A/B testing according to an example embodiment of the present disclosure.
Fig. 4 shows a flow chart of a training method of a probabilistic model according to an exemplary embodiment of the present disclosure.
Fig. 5 shows a flowchart of a training method of a probabilistic model according to an exemplary embodiment of the present disclosure.
Fig. 6 illustrates a block diagram of a population determination device according to an exemplary embodiment of the present disclosure.
Fig. 7 shows a block diagram of a training apparatus of a probabilistic model according to an exemplary embodiment of the present disclosure.
Fig. 8 is a block diagram of an electronic device according to an example embodiment of the present disclosure.
Detailed Description
In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The embodiments described in the following examples do not represent all embodiments consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
In this case, the expression "at least one of the items" in the present disclosure means a case where three types of parallel expressions "any one of the items", "a combination of any plural ones of the items", and "the entirety of the items" are included. For example, "include at least one of M and N" includes the following three cases in parallel: (1) comprises M; (2) comprises N; (3) including M and N. For another example, "at least one of the first step and the second step is performed", which means that the following three cases are juxtaposed: (1) executing the step one; (2) executing the step two; (3) and executing the step one and the step two.
Typically, business parties will utilize a/B testing methods to predict revenue that will be generated by newly developed or designed functions and products (e.g., marketing campaigns, new User Interfaces (UIs) of applications, or new product functions), as well as to determine whether it is appropriate to go online, whether further improvement is needed, etc.
FIG. 1 shows a scenario diagram of A/B testing according to the present disclosure.
As shown in fig. 1, in order to perform an a/B test with respect to a predetermined scheme to be brought online (e.g., a new UI, a new Application (APP), a new marketing campaign, a new product function, etc.), users who have not used the predetermined scheme may be selected as a target group T. Then, one batch of users is selected from the target population T as an experimental population a (also referred to as an experimental group), and another batch of users is selected as a control population B (also referred to as a control group). After the experimental population a and the control population B are determined, the predetermined protocol may be applied only to the upper line of the experimental population a, i.e., the predetermined protocol is only opened for use by the experimental population a, while the predetermined protocol is not allowed to be used by the control population B. After a period of test use, the evaluation index of the experimental population a is compared with the evaluation index of the control population B (e.g., user conversion rate, average browsing duration, use duration, resource click rate, next day retention rate, etc.). If the evaluation index of the experimental population a is higher than that of the control population B, the predetermined scheme may be considered to be effective, i.e., to produce positive benefits, for each user in the target population T.
The present disclosure provides a group determination method and apparatus, which can obtain the above-mentioned overall test effect, and can further identify which type of group the predetermined scheme is valid for, and which type of group the predetermined scheme is invalid for, and can also identify whether there is a simpson paradox problem in the test result.
A probabilistic model according to an exemplary embodiment of the present disclosure may be used to predict the probability that a user belongs to an active population. For example, users for whom a predetermined scheme, such as a product to be tested, a function, a UI, a marketing campaign (e.g., issuing limited coupons, etc.), etc., would benefit or otherwise have significant benefit, may be collectively referred to as a valid population, and conversely, an invalid population.
According to exemplary embodiments of the present disclosure, the probabilistic model may be built using various classification algorithms (e.g., Support Vector Machine (SVM) algorithms, logistic regression algorithms, decision tree algorithms, etc.).
For convenience of explanation, in an embodiment of the present disclosure, the first population may represent an effective population, the fourth population may represent an ineffective population, the second population may represent an experimental population, and the third population may represent a control population, but the present disclosure is not limited thereto, and various populations may also be represented by other terms.
A population determination method according to an exemplary embodiment of the present disclosure is described below in conjunction with fig. 2.
Fig. 2 illustrates a flow chart of a population determination method according to an exemplary embodiment of the present disclosure. As shown in fig. 2, in step S501, the characteristics of the user may be acquired. According to an example embodiment of the present disclosure, the user may include each user in the target group. The features may include portrait features and the like, for example, portrait features may include at least one of: gender, age, income status (e.g., annual income, monthly income, etc.), occupation (e.g., type of occupation, field of occupation, etc.), place of residence, place of work, price of electronic products used (e.g., cell phone, computer, bluetooth headset, bluetooth speaker, etc.), number of fans (e.g., number of fans of social platform account, number of fans of e-commerce platform account), average length of time to use the predetermined application product (e.g., average length of time per day to use the predetermined social APP), number of times to use the predetermined application product within a first predetermined time (e.g., number of times to use the predetermined short video APP within seven days), number of work uploaded within a second predetermined time (e.g., number of work uploaded within seven or thirty days on the video platform), amount of activity time within a third predetermined time (e.g., amount of activity time to use the predetermined application on the application platform within thirty days, e.g., the sum of the amount of time to comment, like, forward, etc.).
The characteristics of the user may be obtained in a variety of ways. For example, in case of user permission, the characteristics of the user may be obtained from the profile that the user fills in when registering or logging in the predetermined application APP. For example, the user's features may be obtained by applying a data record in the background, with user permission. For example, the user's features may be obtained by requesting the features from the user.
In step S502, the features may be input to a probability model, resulting in a predicted probability that the user belongs to a first group. According to an exemplary embodiment of the present disclosure, the probabilistic model may be obtained by using a training method performed by clustering a predetermined population. For example, but not limited to, the probabilistic model may be trained using a training method of the probabilistic model according to an exemplary embodiment of the present disclosure (e.g., the training method described with reference to fig. 3 to 5).
According to an exemplary embodiment of the present disclosure, a probability model may be utilized to predict a probability that a user belongs to a first group, i.e., a predicted probability that the user belongs to the first group.
In step S503, it may be determined whether the user belongs to a first group according to the predicted probability. According to an exemplary embodiment of the present disclosure, the predicted probability may be compared to a predetermined probability threshold; determining that the user belongs to the first group when the predicted probability is greater than or equal to a predetermined probability threshold.
According to an example embodiment of the present disclosure, a first group of the target groups may be determined in response to determining whether the user belongs to the first group (e.g., the active group). For example, it may be determined whether each user in the target group belongs to the first group, thereby determining the first group in the target group according to the result of the final determination. In addition, the number of users of the first population and the number of users of the target population may be determined, and thus, the size of the first population relative to the target population may be determined, thereby facilitating analysis of characteristics of the first population, facilitating decisions by business parties, and the like.
For example, according to the population determining method of the exemplary embodiment of the present disclosure, the valid population (i.e., the first population) and the invalid population (i.e., the fourth population) may be identified from the target population, the size of the valid population and/or the invalid population with respect to the target population may be obtained, the predetermined probability threshold corresponding to the valid population may be obtained, and the predetermined validity criterion corresponding to the valid population may also be obtained. For example, by inputting the features of 1000 million total users into the probabilistic model, the final conclusion is reached: and (3) estimating that 800 users in the 1000 ten thousand total users belong to an effective group, namely the effective group estimated to be operated after the new strategy is online is 800 total users, and the user conversion rate of the effective group can be increased by at least 10% additionally.
In this way, the probability model obtained by using the training method performed by clustering the predetermined population can be used for predicting the probability that the user belongs to the first population, and the population determination method according to the present disclosure can help determine whether the user belongs to the first population (e.g., an effective population), so that the actual scale of the first population can be obtained, whether the simpson paradox problem exists in the test result of the test scheme on the target population can be identified, and the influence of the test scheme on the target population can be comprehensively analyzed; therefore, the method not only can help to identify the overall effect of the test scheme on the target group, but also can help to deeply analyze the test effect.
FIG. 3 shows a scenario diagram of A/B testing according to an example embodiment of the present disclosure. Fig. 4 shows a flow chart of a training method of a probabilistic model according to an exemplary embodiment of the present disclosure. Fig. 5 illustrates another flowchart of a training method of a probabilistic model according to an exemplary embodiment of the present disclosure. According to an exemplary embodiment of the present disclosure, the probabilistic model may be obtained by using a training method performed by clustering a predetermined population.
A method of training a probabilistic model according to an exemplary embodiment of the present disclosure is described below with reference to fig. 3 to 5.
Referring to fig. 4, in step S301, a predetermined population may be clustered based on characteristics of the predetermined population to obtain a plurality of clustered populations.
According to an exemplary embodiment of the present disclosure, a predetermined population may be selected from the target population. The target group may include a large number of users who have not used the predetermined product or the predetermined UI or the predetermined function of the predetermined product. For example, a batch of users who have never used a predetermined product or a predetermined UI or a predetermined function of a predetermined product may be acquired as a target group of products to be tested according to big data analysis on the products to be tested. Further, a predetermined population may be randomly selected from the target population for clustering. For example, the predetermined population may be randomly selected from the target populations according to a preset ratio (e.g., 1: 10) of the predetermined population to the target populations.
According to an example embodiment of the present disclosure, the characteristics of the group or user may include portrait characteristics or the like, for example, the portrait characteristics may include at least one of: gender, age, income status (e.g., annual income, monthly income, etc.), occupation (e.g., type of occupation, field of occupation, etc.), place of residence, place of work, price of electronic products used (e.g., cell phone, computer, bluetooth headset, bluetooth speaker, etc.), number of fans (e.g., number of fans of social platform account, number of fans of e-commerce platform account), average length of time to use the predetermined application product (e.g., average length of time per day to use the predetermined social APP), number of times to use the predetermined application product within a first predetermined time (e.g., number of times to use the predetermined short video APP within seven days), number of work uploaded within a second predetermined time (e.g., number of work uploaded within seven or thirty days on the video platform), amount of activity time within a third predetermined time (e.g., amount of activity time to use the predetermined application on the application platform within thirty days, e.g., the sum of the amount of time to comment, like, forward, etc.).
The characteristics of the predetermined population may be obtained in a variety of ways. For example, with user permission, the characteristics of the predetermined group may be obtained from profiles each user of the predetermined group fills in when registering or logging in the predetermined application APP. For example, the characteristics of each user in a predetermined group may be obtained by applying a data record in the background, with user permission. For example, the characteristics of the predetermined population may be obtained by requesting the characteristics from each user in the predetermined population.
According to an exemplary embodiment of the present disclosure, the predetermined population may be clustered using various clustering algorithms based on characteristics of the predetermined population. The clustering algorithm may include various algorithms capable of clustering, such as a K-means clustering algorithm, a K-center clustering algorithm, a density-based clustering algorithm, a hierarchy-based clustering algorithm, and the like. And selecting a proper clustering algorithm according to the requirements of the actual scene so as to cluster the predetermined population based on the characteristics. By clustering, a plurality of cluster groups can be obtained.
According to an exemplary embodiment of the present disclosure, the predetermined population C may include a second population a '(e.g., an experimental population) and a third population B' (e.g., a control population). Referring to fig. 3, the predetermined population C may be clustered based on characteristics of the predetermined population C to obtain a plurality of clustering populations, for example, clustering populations K1, K2, K3, K4, K5, K6, K7, and K8 shown in fig. 3. As such, the second population a '(e.g., experimental population) and the third population B' (e.g., control population) may be clustered together, resulting in a plurality of clustered populations K1, K2, K3, K4, K5, K6, K7, and K8.
According to an exemplary embodiment of the present disclosure, the second population a 'and the third population B' may be selected from the target population T to constitute the predetermined population C according to a predetermined traffic ratio of the second population a 'to the third population B'. For example, a predetermined flow ratio (e.g., 1: 2) of the second population A 'to the third population B' is set according to the A/B test requirements. A predetermined population C (e.g., 2 ten thousand experimental users) may be formed by randomly selecting a second population a '(e.g., 1 ten thousand experimental users) and a third population B' from a target population T (e.g., 1000 ten thousand target users) at a predetermined flow ratio of 1: 2. However, the present disclosure is not limited thereto, and the second population a 'and the third population B' may be selected to constitute the predetermined population C according to other manners.
As described above, unsupervised learning can be achieved by clustering of predetermined populations.
Referring to fig. 4, in step S302, a first population (e.g., an active population) of a plurality of clustering populations may be determined. According to exemplary embodiments of the present disclosure, a predetermined scheme of products, functions, UIs, marketing campaigns (e.g., issuing limited coupons, etc.) to be tested may be beneficial (e.g., generate more revenue, improve user experience) for each user in the active population, and thus, these users may be collectively referred to as the active population (i.e., the first population). Conversely, users for whom a predetermined scheme of products, functions, UIs, marketing campaigns (e.g., issuing limited coupons, etc.) to be tested does not produce a benefit or for which a benefit is insignificant may be collectively referred to as an invalid population (i.e., a fourth population).
According to an exemplary embodiment of the present disclosure, the predetermined population may include an experimental population (i.e., a second population) and a control population (i.e., a third population), and thus, both experimental users belonging to the experimental population and control users belonging to the control population may be included in each cluster population. All of the experimental users in each cluster population may be considered to be experimental populations in the corresponding cluster population, and all of the control users in each cluster population may be considered to be control populations in the corresponding cluster population.
According to an exemplary embodiment of the present disclosure, before determining an effective population of the plurality of clustering populations (e.g., before or after clustering the predetermined population, or while clustering the predetermined population), each experimental user of the experimental population of the predetermined population may be provided with a predetermined scheme such as a product to be tested, a function, a UI, a marketing campaign (e.g., issuing a limited coupon, etc.), and the usage and user experience of the experimental population may be observed, so that the effective population of the plurality of clustering populations may be determined according to a comparison result of the experimental population and the control population after observing for a period of time.
According to an exemplary embodiment of the present disclosure, an effective population of a plurality of clustering populations may be determined according to a predetermined effective criterion. The predetermined validity criterion is set based on an evaluation index of the experimental population relative to the control population in the single clustering population. The predetermined validity criteria may include: in a single cluster population, the evaluation index of the experimental population is improved at least to a predetermined degree relative to the evaluation index of the control population.
According to an exemplary embodiment of the present disclosure, a cluster population in which a difference between an evaluation index of a second population (e.g., an experimental population) and an evaluation index of a third population (e.g., a control population) is greater than or equal to a predetermined threshold may be determined as the first population. The predetermined threshold may be a threshold corresponding to a predetermined degree of improvement in the predetermined validity criterion.
In an exemplary embodiment of the present disclosure, each cluster population may include users in the second population a 'and may also include users in the third population B'. Thus, the second population in each cluster population may consist of users in the cluster population belonging to the second population a ', and the third population in each cluster population may consist of users in the cluster population belonging to the third population B'. Therefore, a difference between the evaluation index of the second population and the evaluation index of the third population in the cluster population may be calculated for each cluster population. For example, the evaluation index may include at least one of: browsing duration (e.g., average browsing duration per day, longest browsing duration per day, etc.), resource click-through rate (e.g., a ratio of a user amount that clicked through a predetermined UI to a user amount of the experimental group after the product was brought online for a predetermined period of time), next day retention rate (e.g., a ratio of a user amount that visited back the next day to a user amount of the experimental group after the product was brought online for a predetermined period of time), user conversion rate (e.g., a ratio of a user amount that was registered or paid for to a user amount of the experimental group after the product was brought online for a predetermined period of time).
According to the training method of the exemplary embodiment of the present disclosure, one or more cluster groups of the plurality of cluster groups may be determined to be a first group in response to the one or more cluster groups satisfying a predetermined effective criterion. For example, the cluster population may be determined as the first population in response to a difference between the evaluation index of the second population and the evaluation index of the third population being greater than or equal to a predetermined threshold, and otherwise determined as the fourth population.
Referring to the flowchart illustrated in fig. 5, the training method according to an exemplary embodiment of the present disclosure may determine whether each cluster population is the first population.
In step S401, an evaluation index of the second population may be acquired. In step S402, an evaluation index of the third population may be acquired. For example, after each experimental user in a second group of the predetermined groups uses or experiences a predetermined scheme of a product, a function, a UI, a marketing campaign (e.g., issuing a limited coupon, etc.) to be tested for a predetermined period of time, the evaluation index of each experimental user in the second group and the evaluation index of each control user in a third group can be obtained by observing the use and user experience effects of the second group, respectively. Then, the evaluation index of the second population and the evaluation index of the third population in each cluster population may be acquired, respectively.
In step S403, the evaluation index of the second population may be compared with the evaluation index of the third population to determine whether a difference between the evaluation index of the second population and the evaluation index of the third population in the cluster population is greater than or equal to a predetermined threshold. If so, the cluster group is determined to be the first group (step S404), otherwise, the cluster group is determined to be the fourth group (step S405). As such, it may be determined whether each cluster population of the plurality of cluster populations is a first population.
Here, the evaluation index is exemplified as the user conversion rate, but the present disclosure is not limited thereto, and one or more evaluation indexes may be used to determine whether the cluster group is the first group (e.g., the effective group). For example, the predetermined validity criteria may include: in a single cluster population, the user conversion rate of the experimental population (i.e., the second population) is elevated by at least a predetermined degree (e.g., 10% of the predetermined threshold) relative to the user conversion rate of the control population (i.e., the third population). The experimental groups in the cluster groups K1, K2, K3, K4 shown in fig. 3 are all improved by 10% or more (i.e., the degree of improvement is equal to or higher than a predetermined degree, and the difference between the evaluation indexes is greater than or equal to a predetermined threshold) relative to the user conversion rate of the control group, whereas the experimental groups in the cluster groups K5, K6, K7, K8 are improved by less than 5% relative to the user conversion rate of the control group (i.e., the degree of improvement is lower than the predetermined degree, and the difference between the evaluation indexes is smaller than the predetermined threshold), and therefore, the cluster groups K1, K2, K3, K4 are identified as the effective groups (i.e., the first group), and the cluster groups K5, K6, K7, K8 are identified as the ineffective groups (i.e., the fourth group). It can be seen that the predetermined scheme is more effective for the cluster populations K1, K2, K3, K4 than for the cluster populations K5, K6, K7, K8.
According to an exemplary embodiment of the present disclosure, in order to facilitate identification of the valid population (i.e., the first population) and the invalid population (i.e., the fourth population), a valid flag (e.g., a valid flag of 1) may be added to each user in the valid population and an invalid flag (e.g., an invalid flag of 0) may be added to the invalid population.
Referring again to fig. 4, after the first population of the plurality of cluster populations is determined, the occupation ratio of the first population in the plurality of cluster populations may be determined as a preset probability (step S303). According to an exemplary embodiment of the present disclosure, after determining a first population of the plurality of cluster populations, determining a number of the first population (e.g., the total number of the first populations K1-K4 is four) and a total number of the plurality of cluster populations (e.g., the total number of the cluster populations K1-K8 is eight), a ratio of the first population in the plurality of cluster populations is a ratio of the two numbers (e.g., 1/2). Accordingly, a ratio of the first population in the plurality of cluster populations may be determined as a preset probability (e.g., 50%).
Referring to fig. 4, in step S304, features of a predetermined population may be input to a probability model, resulting in an estimated probability.
According to exemplary embodiments of the present disclosure, the probabilistic model may be built using various classification algorithms (e.g., Support Vector Machine (SVM) algorithms, logistic regression algorithms, decision tree algorithms, etc.). For example, the features of the predetermined population may be input to a probability model, and then the probability model is used to estimate the probability that each user in the predetermined population belongs to the first population, i.e., the estimated probability.
In step S305, parameters of the probabilistic model may be adjusted according to the estimated probability and the preset probability to train the probabilistic model. As described above, training of the probabilistic model may be implemented.
According to an example embodiment of the present disclosure, the training samples of the probabilistic model may include features of each of the predetermined populations and/or a proportion of the first population in the plurality of clustering populations. The preparation process of the training samples may be associated with predetermined validity criteria and/or predetermined thresholds. Thus, a predetermined validity criterion corresponding to the valid population (i.e., the first population) may be determined from the training samples of the probabilistic model. The predetermined validity criteria may include: the evaluation index of the experimental population (i.e., the second population) in the effective population is improved by at least a predetermined degree relative to the evaluation index of the control population (i.e., the third population), e.g., the user conversion rate of the experimental population is increased by at least a predetermined degree (e.g., the predetermined threshold is 10%) relative to the user conversion rate of the control population. Thus, the determined predetermined validity criteria may be provided to assist in analyzing characteristics of the valid population, to further assist business parties and the like in making decisions, e.g., whether to deliver a predetermined scheme to the valid population.
According to an exemplary embodiment of the present disclosure, the predetermined probability threshold may be set according to a training sample of the probabilistic model. For example, a ratio of a first population included in a training sample of the probabilistic model among the plurality of clustering populations may be set as a predetermined probability threshold. For example, a user may be determined to belong to a first group in response to the predicted probability being greater than a predetermined probability threshold, and a fourth group otherwise. Additionally, a predetermined probability threshold may be provided to assist in analyzing characteristics of the first population, to further assist business parties and the like in making decisions, e.g., whether to deliver a predetermined scheme to the first population. According to an exemplary embodiment of the present disclosure, a loss function of a probability model may be determined according to an estimated probability and a preset probability; parameters of the probabilistic model are adjusted according to the loss function. For example, a loss function of the probabilistic model may be determined based on a loss value between the estimated probability and a preset probability. And adjusting the parameters of the probability model according to the loss function by taking the minimization of the loss value between the estimated probability and the preset probability as a target. The probabilistic model is trained by adjusting parameters of the probabilistic model. As described above, training of the probabilistic model may be achieved through supervised learning.
Through the combination of unsupervised learning and supervised learning, the training of the probability model can be realized, so that whether the users belong to the first group (for example, the effective group) or not can be identified through the trained probability model, and the users belonging to the first group can be identified from the whole target group.
A population determining apparatus and a training apparatus of a probabilistic model according to an exemplary embodiment of the present disclosure are described below with reference to fig. 6 and 7.
Fig. 6 shows a block diagram of a population determination device 70 according to an exemplary embodiment of the present disclosure. The operations of the units in the group determination device 70 shown in fig. 6 can be understood by referring to the steps in the group determination method shown in fig. 2, and are not described again for brevity.
The group determination device 70 may include an acquisition unit 701, and the acquisition unit 701 may acquire the characteristics of the user. According to an example embodiment of the present disclosure, the user may include each user in the target group. The features may include portrait features and the like, for example, portrait features may include at least one of: gender, age, income status (e.g., annual income, monthly income, etc.), occupation (e.g., type of occupation, field of occupation, etc.), place of residence, place of work, price of electronic products used (e.g., cell phone, computer, bluetooth headset, bluetooth speaker, etc.), number of fans (e.g., number of fans of social platform account, number of fans of e-commerce platform account), average length of time to use the predetermined application product (e.g., average length of time per day to use the predetermined social APP), number of times to use the predetermined application product within a first predetermined time (e.g., number of times to use the predetermined short video APP within seven days), number of work uploaded within a second predetermined time (e.g., number of work uploaded within seven or thirty days on the video platform), amount of activity time within a third predetermined time (e.g., amount of activity time to use the predetermined application on the application platform within thirty days, e.g., the sum of the amount of time to comment, like, forward, etc.).
The characteristics of the user may be obtained in a variety of ways. For example, in case of user permission, the characteristics of the user may be obtained from the profile that the user fills in when registering or logging in the predetermined application APP. For example, the user's features may be obtained by applying a data record in the background, with user permission. For example, the user's features may be obtained by requesting the features from the user.
The population determining device 70 may comprise a prediction unit 702, and the prediction unit 702 may input the features to the probability model, resulting in a probability that the predicted user belongs to the first population. According to an exemplary embodiment of the present disclosure, the probabilistic model may be obtained by the training apparatus 60 according to an exemplary embodiment of the present disclosure performing a training method of the probabilistic model (e.g., the training method described with reference to fig. 3 to 5). Referring to the embodiment shown in fig. 7, the probabilistic model may be obtained by training the probabilistic model through the training device 60, and the training device 60 may train the probabilistic model by clustering the predetermined population.
The group determination device 70 may include a determination unit 703, and the determination unit 703 may determine whether the user belongs to the first group according to the predicted probability. According to an exemplary embodiment of the present disclosure, the predicted probability may be compared to a predetermined probability threshold; in response to the predicted probability being greater than or equal to the predetermined probability threshold, determining that the user belongs to the first group.
According to an example embodiment of the present disclosure, a first group of the target groups may be determined in response to determining whether the user belongs to the first group. For example, it may be determined whether each user in the target population belongs to the effective population (i.e., the first population), thereby determining the effective population in the target population according to the result of the final determination. In addition, the number of users of the effective group and the number of users of the target group can be determined, so that the size of the effective group relative to the target group can be determined, and therefore the analysis of the characteristics of the effective group is facilitated, and a business side and the like are facilitated to make decisions.
According to the population determining apparatus 70 of the exemplary embodiment of the present disclosure, the valid population (i.e., the first population) and the invalid population (i.e., the fourth population) may be identified from the target population, the size of the valid population and/or the invalid population with respect to the target population may be obtained, the predetermined probability threshold corresponding to the valid population may be obtained, and the predetermined validity criterion corresponding to the valid population may also be obtained. For example, by inputting the features of 1000 million total users into the probabilistic model, the final conclusion is reached: and (3) estimating that 800 users in the 1000 ten thousand total users belong to an effective group, namely the effective group estimated to be operated after the new strategy is online is 800 total users, and the user conversion rate of the effective group can be increased by at least 10% additionally.
In this way, the group determination apparatus 70 may help determine whether the user belongs to the first group (e.g., the effective group), and may further obtain the actual scale size of the first group, identify whether there is a simpson paradox problem in the test result of the test scheme on the target group, so as to comprehensively analyze the influence of the test scheme on the target group; therefore, the method not only can help to identify the overall effect of the test scheme on the target group, but also can help to deeply analyze the test effect.
Fig. 7 shows a block diagram of a training apparatus 60 of a probabilistic model according to an exemplary embodiment of the present disclosure. The operations of the units in the training device 60 shown in fig. 7 can be understood by referring to the steps in the training method shown in fig. 4 to 5, and are not repeated herein for brevity.
According to an exemplary embodiment of the present disclosure, the probabilistic model may be obtained by training the probabilistic model through the training device 60, and the training device 60 may train the probabilistic model by clustering a predetermined population.
For example, but not limited to, the training apparatus 60 may include a clustering unit 601, and the clustering unit 601 may cluster the predetermined population based on the features of the predetermined population to obtain a plurality of clustered populations. According to an exemplary embodiment of the present disclosure, the training apparatus 60 may further include a selection unit (not shown) that may select the predetermined population from the target population. The target group may include a large number of users who have not used the predetermined product or the predetermined UI or the predetermined function of the predetermined product. For example, a batch of users who have never used a predetermined product or a predetermined UI or a predetermined function of a predetermined product may be acquired as a target group of products to be tested according to big data analysis on the products to be tested. Further, a predetermined population may be randomly selected from the target population for clustering. For example, the predetermined population may be randomly selected from the target populations according to a preset ratio (e.g., 1: 10) of the predetermined population to the target populations.
According to an example embodiment of the present disclosure, the characteristics of the group or the user may include portrait characteristics, which may include at least one of: gender, age, income status (e.g., annual income, monthly income, etc.), occupation (e.g., type of occupation, field of occupation, etc.), place of residence, place of work, price of electronic products used (e.g., cell phone, computer, bluetooth headset, bluetooth speaker, etc.), number of fans (e.g., number of fans of social platform account, number of fans of e-commerce platform account), average length of time to use the predetermined application product (e.g., average length of time per day to use the predetermined social APP), number of times to use the predetermined application product within a first predetermined time (e.g., number of times to use the predetermined short video APP within seven days), number of work uploaded within a second predetermined time (e.g., number of work uploaded within seven or thirty days on the video platform), amount of activity time within a third predetermined time (e.g., amount of activity time to use the predetermined application on the application platform within thirty days, e.g., the sum of the amount of time to comment, like, forward, etc.).
The characteristics of the predetermined population may be obtained in a variety of ways. For example, with user permission, the characteristics of the predetermined group may be obtained from profiles each user of the predetermined group fills in when registering or logging in the predetermined application APP. For example, the characteristics of each user in a predetermined group may be obtained by applying a data record in the background, with user permission. For example, the characteristics of the predetermined population may be obtained by requesting the characteristics from each user in the predetermined population.
According to an exemplary embodiment of the present disclosure, the predetermined population may be clustered using various clustering algorithms based on characteristics of the predetermined population. The clustering algorithm may include various algorithms capable of clustering, such as a K-means clustering algorithm, a K-center clustering algorithm, a density-based clustering algorithm, a hierarchy-based clustering algorithm, and the like. And selecting a proper clustering algorithm according to the requirements of the actual scene so as to cluster the predetermined population based on the characteristics. By clustering, a plurality of cluster groups can be obtained.
The selecting unit (not shown) may select the second population and the third population from the target population to compose the predetermined population according to a predetermined traffic ratio of the second population to the third population. According to an exemplary embodiment of the present disclosure, the predetermined population may include a second population and a third population. Referring to fig. 2, the predetermined population C may be clustered based on characteristics of the predetermined population C to obtain a plurality of clustering populations, for example, clustering populations K1, K2, K3, K4, K5, K6, K7, and K8 shown in fig. 2. The predetermined population C may include a second population a '(i.e., an experimental population) and a third population B' (i.e., a control population). As such, the second population a 'and the third population B' may be clustered together, resulting in a plurality of clustered populations K1, K2, K3, K4, K5, K6, K7, and K8. According to an exemplary embodiment of the present disclosure, the second population a 'and the third population B' may be selected from the target population T to constitute the predetermined population C according to a predetermined traffic ratio of the second population a 'to the third population B'.
As described above, unsupervised learning can be achieved by clustering of predetermined populations.
The training device 60 may include a population determination unit 602, the population determination unit 602 may determine a first population of the plurality of clustered populations.
According to exemplary embodiments of the present disclosure, a predetermined scheme of products, functions, UIs, marketing campaigns (e.g., issuing limited coupons, etc.) to be tested may be beneficial to each user in the active population (e.g., generating more revenue, improving user experience), and thus, these users may be collectively referred to as the active population. Conversely, users for whom a predetermined scheme of products, functions, UIs, marketing campaigns (e.g., issuing limited coupons, etc.) to be tested does not produce a benefit or for which a benefit is insignificant may be collectively referred to as an invalid population.
According to an exemplary embodiment of the present disclosure, the predetermined population may include an experimental population and a control population, and thus, both experimental users belonging to the experimental population and control users belonging to the control population may be included in each cluster population. All of the experimental users in each cluster population may be considered to be experimental populations in the corresponding cluster population, and all of the control users in each cluster population may be considered to be control populations in the corresponding cluster population.
According to an exemplary embodiment of the present disclosure, before determining an effective population of the plurality of clustering populations (e.g., before or after clustering the predetermined population, or while clustering the predetermined population), each experimental user of the experimental population of the predetermined population may be provided with a predetermined scheme such as a product to be tested, a function, a UI, a marketing campaign (e.g., issuing a limited coupon, etc.), and the usage and user experience of the experimental population may be observed, so that the effective population of the plurality of clustering populations may be determined according to a comparison result of the experimental population and the control population after observing for a period of time.
According to an exemplary embodiment of the present disclosure, an effective population of a plurality of clustering populations may be determined according to a predetermined effective criterion. The predetermined validity criterion is set based on an evaluation index of the experimental population relative to the control population in the single clustering population. The predetermined validity criteria may include: in a single cluster population, the evaluation index of the experimental population is improved at least to a predetermined degree relative to the evaluation index of the control population. For example, the evaluation index may include at least one of: browsing duration (e.g., average browsing duration per day, longest browsing duration per day, etc.), resource click-through rate (e.g., a ratio of a user amount that clicked through a predetermined UI to a user amount of the experimental group after the product was brought online for a predetermined period of time), next day retention rate (e.g., a ratio of a user amount that visited back the next day to a user amount of the experimental group after the product was brought online for a predetermined period of time), user conversion rate (e.g., a ratio of a user amount that was registered or paid for to a user amount of the experimental group after the product was brought online for a predetermined period of time).
The population determining unit 602 may determine one or more clustering populations of the plurality of clustering populations as valid populations in response to the one or more clustering populations satisfying a predetermined validity criterion. For example, the population determination unit 602 may determine, as the first population, a cluster population in which a difference between the evaluation index of the second population (e.g., experimental population) and the evaluation index of the third population (e.g., control population) is greater than or equal to a predetermined threshold value among the cluster populations.
According to an exemplary embodiment of the present disclosure, the population determining unit 602 may obtain the evaluation index of the experimental population and the evaluation index of the control population. For example, after each experimental user in the experimental group in the predetermined group uses or experiences a predetermined scheme such as a product, a function, a UI, a marketing campaign (e.g., issuing a limited coupon, etc.) to be tested for a predetermined period of time, the evaluation index of each experimental user in the experimental group and the evaluation index of each control user in the control group may be obtained by observing the usage and user experience effects of the experimental group, respectively. Then, the evaluation index of the experimental population and the evaluation index of the control population in each cluster population may be obtained separately.
The population determining unit 602 may compare the evaluation index of the experimental population with the evaluation index of the control population to determine whether the evaluation index of the experimental population is improved at least by a predetermined degree with respect to the evaluation index of the control population. If so, determining the clustering population as an effective population, otherwise, determining the clustering population as an ineffective population. As such, it may be determined whether each cluster population of the plurality of cluster populations is a valid population.
According to an example embodiment of the present disclosure, in order to facilitate identification of the valid population and the invalid population, the population determining unit 602 may add a valid flag (e.g., a valid flag of 1) to each user in the valid population and an invalid flag (e.g., an invalid flag of 0) to the invalid population.
The training apparatus 60 may include a preset probability determination unit 603, and the preset probability determination unit 603 may determine a ratio of the first group in the plurality of clustering groups as a preset probability. According to an exemplary embodiment of the present disclosure, after determining a first population of the plurality of cluster populations, determining a number of the first population (e.g., the total number of the first populations K1-K4 is four) and a total number of the plurality of cluster populations (e.g., the total number of the cluster populations K1-K8 is eight), a ratio of the first population in the plurality of cluster populations is a ratio of the two numbers (e.g., 1/2). Accordingly, the proportion of the effective population in the plurality of cluster populations may be determined as a preset probability (e.g., 50%).
The training device 60 may include a probability estimation unit 604, and the probability estimation unit 604 may input the features of the predetermined population into the probability model as described above to obtain the estimated probability. According to exemplary embodiments of the present disclosure, the probabilistic model may be built using various classification algorithms (e.g., Support Vector Machine (SVM) algorithms, logistic regression algorithms, decision tree algorithms, etc.). For example, the features of the predetermined population may be input to a probability model, and then the probability model is used to estimate the probability that each user in the predetermined population belongs to the first population, i.e., the estimated probability.
The training apparatus 60 may include a training unit 605, and the training unit 605 may adjust parameters of the probabilistic model according to the estimated probability and the preset probability to train the probabilistic model. According to an exemplary embodiment of the present disclosure, a loss function of a probability model may be determined according to an estimated probability and a preset probability; parameters of the probabilistic model are adjusted according to the loss function. For example, a loss function of the probabilistic model may be determined based on a loss value between the estimated probability and a preset probability. And adjusting the parameters of the probability model according to the loss function by taking the minimization of the loss value between the estimated probability and the preset probability as a target. The probabilistic model is trained by adjusting parameters of the probabilistic model. As described above, training of the probabilistic model may be achieved through supervised learning.
Through the combination of unsupervised learning and supervised learning, a probability model can be obtained through training, so that whether a user belongs to an effective group or an ineffective group can be conveniently identified, and the effective group and the ineffective group can be conveniently identified from the whole target group.
Fig. 8 is a block diagram of an electronic device 80 according to an example embodiment of the present disclosure.
Referring to fig. 8, the electronic device 80 comprises at least one memory 801 and at least one processor 802, the at least one memory 801 having stored therein a set of processor-executable instructions that, when executed by the at least one processor 802, perform a population determination method according to an exemplary embodiment of the present disclosure.
By way of example, the electronic device 80 may be a PC computer, tablet device, personal digital assistant, smart phone, or other device capable of executing the set of instructions described above. The electronic device 80 need not be a single electronic device, but can be any collection of devices or circuits that can execute the above instructions (or sets of instructions) individually or in combination. The electronic device 80 may also be part of an integrated control system or system manager, or may be configured as a portable electronic device that interfaces with local or remote (e.g., via wireless transmission).
In the electronic device 80, the processor 802 may include a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a programmable logic device, a special purpose processor system, a microcontroller, or a microprocessor. By way of example, and not limitation, processors may also include analog processors, digital processors, microprocessors, multi-core processors, processor arrays, network processors, and the like.
The processor 802 may execute instructions or code stored in the memory 801, wherein the memory 801 may also store data. The instructions and data may also be transmitted or received over a network via a network interface device, which may employ any known transmission protocol.
The memory 801 may be integrated with the processor 802, for example, with RAM or flash memory disposed within an integrated circuit microprocessor or the like. Further, memory 801 may include a stand-alone device, such as an external disk drive, storage array, or any other storage device usable by a database system. The memory 801 and the processor 802 may be operatively coupled or may communicate with each other, such as through I/O ports, network connections, etc., so that the processor 802 can read files stored in the memory.
In addition, the electronic device 80 may also include a video display (such as a liquid crystal display) and a user interaction interface (such as a keyboard, mouse, touch input device, etc.). All components of the electronic device 80 may be connected to each other via a bus and/or a network.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer-readable storage medium storing instructions, wherein the instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform a group determination method or one or more steps in a group determination method according to the present disclosure. Examples of the computer-readable storage medium herein include: read-only memory (ROM), random-access programmable read-only memory (PROM), electrically erasable programmable read-only memory (EEPROM), random-access memory (RAM), dynamic random-access memory (DRAM), static random-access memory (SRAM), flash memory, non-volatile memory, CD-ROM, CD-R, CD + R, CD-RW, CD + RW, DVD-ROM, DVD-R, DVD + R, DVD-RW, DVD + RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or compact disc memory, Hard Disk Drive (HDD), solid-state drive (SSD), card-type memory (such as a multimedia card, a Secure Digital (SD) card or a extreme digital (XD) card), magnetic tape, a floppy disk, a magneto-optical data storage device, an optical data storage device, a hard disk, a magnetic tape, a magneto-optical data storage device, a hard disk, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, a magnetic tape, a magnetic data storage device, A solid state disk, and any other device configured to store and provide a computer program and any associated data, data files, and data structures to a processor or computer in a non-transitory manner such that the processor or computer can execute the computer program. The computer program in the computer-readable storage medium described above can be run in an environment deployed in a computer apparatus, such as a client, a host, a proxy device, a server, and the like, and further, in one example, the computer program and any associated data, data files, and data structures are distributed across a networked computer system such that the computer program and any associated data, data files, and data structures are stored, accessed, and executed in a distributed fashion by one or more processors or computers.
According to an exemplary embodiment of the present disclosure, there may also be provided a computer program product comprising a computer program which, when executed by a processor, implements the population determination method according to an exemplary embodiment of the present disclosure.
The group determination method and device according to the exemplary embodiments of the present disclosure can achieve at least one of the following technical effects: the probability model obtained by the training method which is executed by clustering the predetermined population can be used for predicting the probability that the user belongs to the first population, can help to determine whether the user belongs to the first population (for example, an effective population), can further obtain the actual scale of the first population, can identify whether the test result of the test scheme on the target population has the Simpson paradox problem, and is convenient for comprehensively analyzing the influence of the test scheme on the target population; therefore, the method can help to identify the overall effect of the test scheme on the target group, and can help to deeply analyze the test effect, for example, help to specifically analyze which type of group is effective, which type of group is ineffective, and how large the scale of the effective group is, which is beneficial to avoiding the simpson paradox problem. Further, in the training process of the probabilistic model according to the exemplary embodiment of the present disclosure, the predetermined population may be divided into a plurality of clustering populations by clustering based on the features of the predetermined population, implementing the hierarchical processing of the predetermined population based on the features; then, the proportion of the first population in the plurality of clustering populations (for example, the effective population) is obtained by determining the first population in the plurality of clustering populations, and the training of the probability model is realized by using the proportion, so that the trained probability model can accurately predict the probability that the user belongs to the first population based on the characteristics of the user in the target population.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (17)

1. A method of population determination, comprising:
acquiring characteristics of a user;
inputting the characteristics into a probability model to obtain the predicted probability that the user belongs to a first group;
determining whether the user belongs to a first group based on the predicted probability,
wherein, the probability model is obtained by training through a training method, and the training method comprises the following steps:
clustering predetermined populations based on characteristics of the predetermined populations to obtain a plurality of clustering populations, wherein the predetermined populations comprise a second population and a third population;
determining a clustering population of which a difference between the evaluation index of the second population and the evaluation index of the third population is greater than or equal to a predetermined threshold as a first population;
determining a proportion of the first population in the plurality of clustered populations;
and training the probability model based on the occupation ratio.
2. The population determination method of claim 1, wherein said training said probabilistic model based on said occupation ratio comprises:
determining the ratio as a preset probability;
inputting the characteristics of the predetermined group into a probability prediction model to obtain an estimated probability;
and adjusting parameters of the probability model according to the estimated probability and the preset probability so as to train the probability model.
3. The population determination method of claim 1, wherein said training method further comprises: the predetermined population is selected from a target population.
4. The population determination method of claim 1, wherein said training method further comprises: and selecting the second population and the third population from the target populations to form the predetermined population according to the predetermined flow ratio of the second population to the third population.
5. The population determination method of claim 1, wherein the evaluation index comprises at least one of: browsing duration, resource click rate, next-day retention rate and user conversion rate.
6. The population determining method of claim 2, wherein said adjusting parameters of said probabilistic model based on said estimated probability and said preset probability comprises:
determining a loss function of the probability model according to the estimated probability and the preset probability;
and adjusting parameters of the probability model according to the loss function.
7. The population determination method of claim 1, wherein said determining whether said user belongs to a first population based on said predicted probability comprises:
comparing the predicted probability to a predetermined probability threshold;
determining that the user belongs to a first group when the predicted probability is greater than or equal to a predetermined probability threshold.
8. A population determining apparatus, comprising:
an acquisition unit configured to acquire a feature of a user;
a prediction unit configured to input the features into a probability model, resulting in a predicted probability that the user belongs to a first group;
a determination unit configured to determine whether the user belongs to a first group according to the predicted probability,
wherein the probabilistic model is derived by training the probabilistic model with a training device configured to:
clustering predetermined populations based on characteristics of the predetermined populations to obtain a plurality of clustering populations, wherein the predetermined populations comprise a second population and a third population;
determining a clustering population of which a difference between the evaluation index of the second population and the evaluation index of the third population is greater than or equal to a predetermined threshold as a first population;
determining a proportion of the first population in the plurality of clustered populations;
and training the probability model based on the occupation ratio.
9. The population determining apparatus of claim 8, wherein said training means comprises:
the clustering unit is configured to cluster a predetermined population based on the characteristics of the predetermined population to obtain a plurality of clustering populations;
a population determination unit configured to determine a first population of the plurality of clustering populations;
a preset probability determination unit configured to determine a proportion of the first population in the plurality of clustering populations as a preset probability;
a probability estimation unit configured to input the features of the predetermined population to a probability prediction model, resulting in an estimated probability;
a training unit configured to adjust parameters of the probabilistic model according to the estimated probability and the preset probability to train the probabilistic model.
10. The population determining apparatus of claim 9, wherein said training apparatus further comprises: a selection unit configured to select the predetermined population from a target population.
11. The population determining apparatus of claim 9, wherein said training apparatus further comprises: and the selecting unit is configured to select the second population and the third population from the target population to form the predetermined population according to the predetermined flow ratio of the second population to the third population.
12. The population determination device of claim 11, wherein said population determination unit is configured to: determining a cluster group of which a difference between the evaluation index of the second group and the evaluation index of the third group is greater than or equal to a predetermined threshold value as the first group.
13. The population determination device of claim 12 wherein said assessment indicator comprises at least one of: browsing duration, resource click rate, next-day retention rate and user conversion rate.
14. The population determination device of claim 9, wherein said training unit is configured to:
determining a loss function of the probability model according to the estimated probability and the preset probability;
and adjusting parameters of the probability model according to the loss function.
15. The population determination device of claim 8, wherein said determination unit is configured to:
comparing the predicted probability to a predetermined probability threshold;
determining that the user belongs to a first group when the predicted probability is greater than or equal to a predetermined probability threshold.
16. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions,
wherein the processor is configured to execute the instructions to implement the population determination method of any one of claims 1 to 7.
17. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the population determination method of any one of claims 1 to 7.
CN202110916032.6A 2021-08-11 2021-08-11 Group determination method and device, electronic equipment and computer readable storage medium Active CN113362120B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110916032.6A CN113362120B (en) 2021-08-11 2021-08-11 Group determination method and device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110916032.6A CN113362120B (en) 2021-08-11 2021-08-11 Group determination method and device, electronic equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN113362120A CN113362120A (en) 2021-09-07
CN113362120B true CN113362120B (en) 2022-01-21

Family

ID=77522930

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110916032.6A Active CN113362120B (en) 2021-08-11 2021-08-11 Group determination method and device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113362120B (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3109801A1 (en) * 2015-06-26 2016-12-28 National University of Ireland, Galway Data analysis and event detection method and system
CN105894372B (en) * 2016-06-13 2018-03-16 腾讯科技(深圳)有限公司 The method and apparatus for predicting colony's credit
CN110245787B (en) * 2019-05-24 2023-11-17 创新先进技术有限公司 Target group prediction method, device and equipment
CN110704614B (en) * 2019-08-30 2023-09-19 中国平安人寿保险股份有限公司 Information processing method and device for predicting user group type in application
CN111835561B (en) * 2020-06-29 2024-07-02 中国平安财产保险股份有限公司 Abnormal user group detection method, device and equipment based on user behavior data
CN112381154A (en) * 2020-11-17 2021-02-19 深圳壹账通智能科技有限公司 Method and device for predicting user probability and computer equipment

Also Published As

Publication number Publication date
CN113362120A (en) 2021-09-07

Similar Documents

Publication Publication Date Title
DeGroat et al. Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine
Emura et al. A joint frailty-copula model between tumour progression and death for meta-analysis
US20220223233A1 (en) Display of estimated parental contribution to ancestry
CN107040397B (en) Service parameter acquisition method and device
Scheel et al. The influence of missing value imputation on detection of differentially expressed genes from microarray data
JP2021530026A (en) Filtering the genetic network to find the desired population
Florez-Lopez Effects of missing data in credit risk scoring. A comparative analysis of methods to achieve robustness in the absence of sufficient data
CN112070615A (en) Financial product recommendation method and device based on knowledge graph
US20160055496A1 (en) Churn prediction based on existing event data
CN112508694A (en) Resource limit application processing method and device and electronic equipment
Xiao et al. An MCEM framework for drug safety signal detection and combination from heterogeneous real world evidence
CN112348321A (en) Risk user identification method and device and electronic equipment
Paulon et al. Joint modeling of recurrent events and survival: a Bayesian non-parametric approach
Mogensen et al. A random forest approach for competing risks based on pseudo‐values
CN111967543A (en) User resource quota determining method and device and electronic equipment
CN108416684A (en) A kind of credibility appraisal procedure, device and the server of account main body
US11907962B2 (en) Estimating conversions
CN111861521A (en) Data processing method and device, computer readable medium and electronic equipment
CN113222073A (en) Method and device for training transfer learning model and recommendation model
George et al. Universal abundance fluctuations across microbial communities, tropical forests, and urban populations
Di Scipio et al. A versatile, fast and unbiased method for estimation of gene-by-environment interaction effects on biobank-scale datasets
Yu et al. Adjusting confounders in ranking biomarkers: a model-based ROC approach
US20180075195A1 (en) System and method for facilitating computer-assisted healthcare-related outlier detection
CN113362120B (en) Group determination method and device, electronic equipment and computer readable storage medium
CN117314586A (en) Product recommendation method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant