CN115049429A - Gain prediction method and device and computer equipment - Google Patents

Gain prediction method and device and computer equipment Download PDF

Info

Publication number
CN115049429A
CN115049429A CN202210648120.7A CN202210648120A CN115049429A CN 115049429 A CN115049429 A CN 115049429A CN 202210648120 A CN202210648120 A CN 202210648120A CN 115049429 A CN115049429 A CN 115049429A
Authority
CN
China
Prior art keywords
group
target
conversion
gain
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210648120.7A
Other languages
Chinese (zh)
Inventor
曾梅花
黄舒曼
郑逸杰
张舸
王圣东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Airlines Co Ltd
Original Assignee
Xiamen Airlines Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Airlines Co Ltd filed Critical Xiamen Airlines Co Ltd
Priority to CN202210648120.7A priority Critical patent/CN115049429A/en
Publication of CN115049429A publication Critical patent/CN115049429A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0224Discounts or incentives, e.g. coupons or rebates based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0207Discounts or incentives, e.g. coupons or rebates
    • G06Q30/0226Incentive systems for frequent usage, e.g. frequent flyer miles programs or point systems

Landscapes

  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

The application relates to a gain prediction method, a gain prediction device and computer equipment. The method comprises the steps of determining an experimental group and a control group from a preset object group; acquiring initial characteristic data and conversion result labels of each object in an experimental group and a control group, wherein the initial characteristic data comprises data under at least two characteristic dimensions; calculating the net information value corresponding to each characteristic dimension; determining target characteristic data for each subject in the experimental group and the control group; adjusting the conversion result labels of each object in the experimental group and the control group into gain labels according to the conversion rule; and respectively selecting objects with the first number of objects from the experimental group and the comparison group as a training object group, training the gain prediction model through target characteristic data and a gain label of each object in the training object group to obtain a target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the objects after the objects participate in the target service. By adopting the method, the accuracy of gain prediction can be improved.

Description

Gain prediction method and device and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a gain prediction method, an apparatus, and a computer device.
Background
With the development of computer technology, machine learning technology has emerged, and a gain prediction model can be constructed based on the machine learning technology and used for predicting an increment corresponding to a certain intervention (for example, whether to participate in a target business), namely, an improvement of the intervention relative to the conversion without intervention, so as to identify an object group converted due to the intervention from all the object groups. An increment of a positive value indicates that a positive effect is produced in the presence of intervention, an increment of a negative value indicates that a negative effect is produced in the presence of intervention, and a larger absolute value of an increment indicates that a stronger effect is produced in the presence of intervention.
The existing main method for establishing the gain prediction model is based on a differential response model of a single model, and features related to intervention are introduced into feature variables of an experimental group and a control group for modeling, so that training sample sets of the experimental group and the control group can be trained by using the same model. However, the method for establishing the gain prediction model is still based on the response model, and the gain is indirectly modeled, so that the prediction capability of the gain is limited.
Disclosure of Invention
In view of the above, it is necessary to provide a method, an apparatus and a computer device for gain prediction, which can improve the accuracy of gain prediction.
In a first aspect, the present application provides a gain prediction method. The method comprises the following steps:
determining an experimental group and a control group from a preset object group, wherein the number of the objects in the experimental group is the same as that in the control group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in the target service;
acquiring initial characteristic data and conversion result labels of each object in an experimental group and a control group, wherein the initial characteristic data comprises data under at least two characteristic dimensions;
calculating the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group;
screening out a target characteristic dimension from at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension;
according to a conversion rule, adjusting the conversion result label of each object in the experimental group and the comparison group into a gain label, wherein the gain label comprises a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result label of the object in the experimental group is conversion, the second value indicates that the conversion result label of the object in the comparison group is conversion, and the third value indicates that the conversion result labels of the objects in the experimental group and the comparison group are not conversion;
and respectively selecting objects with the first object number from the experimental group and the comparison group as training object groups, training the gain prediction model to be trained through the target characteristic data and the gain label of each object in the training object groups by the participated target business to obtain a trained target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the objects after participated in the target business.
In a second aspect, the present application further provides a gain prediction apparatus. The device comprises:
the determining module is used for determining an experimental group and a comparison group from a preset object group, the number of the objects in the experimental group is the same as that of the objects in the comparison group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in the target service;
the acquisition module is used for acquiring initial characteristic data and conversion result labels of each object in the experimental group and the control group, wherein the initial characteristic data comprises data under at least two characteristic dimensions;
the calculation module is used for calculating the net information value corresponding to each characteristic dimension according to the initial characteristic data of each object in the experimental group and the control group;
the screening module is used for screening out a target characteristic dimension from at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension;
the conversion module is used for adjusting the conversion result labels of each object in the experimental group and the control group into gain labels according to the conversion rule, wherein the gain labels comprise a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result labels of the objects in the experimental group are converted, the second value indicates that the conversion result labels of the objects in the control group are converted, and the third value indicates that the conversion result labels of the objects in the experimental group and the control group are not converted;
and the training module is used for selecting objects with the first object number from the experimental group and the comparison group respectively to serve as a training object group, the participatory target business trains the gain prediction model to be trained through the target characteristic data and the gain label of each object in the training object group to obtain a trained target gain prediction model, and the target gain prediction model is used for predicting the gain of the objects after the objects participate in the target business.
In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps in the gain prediction method as described above when executing the computer program.
The gain prediction method, the gain prediction device and the computer equipment determine an experimental group and a control group from a preset object group, screen initial characteristic data of each object in the acquired experimental group and the control group to obtain target characteristic data, adjust a conversion result label of each object in the acquired experimental group and the control group into a gain label based on a conversion rule, respectively select objects with a first object number from the experimental group and the control group as a training object group, train a gain prediction model through the target characteristic data and the gain label of each object in the training object group to obtain a target gain prediction model, perform gain prediction on the objects with characteristic data which are consistent with the distribution of the target characteristic data through the target gain prediction model, and distinguish the objects which are converted only under the condition of intervention from the objects which are converted under the condition of no intervention according to the size of the gain prediction value, that is, the larger the predicted value of the gain of the object is, the higher the possibility that the object belongs to the object which is only converted under the condition of intervention is, and therefore the purpose of directly modeling the gain is achieved. Therefore, compared with the existing method for indirectly establishing the gain prediction model, the method and the device for indirectly establishing the gain prediction model can improve the accuracy of gain prediction.
Drawings
FIG. 1 is a diagram of an exemplary gain prediction method;
FIG. 2 is a flow diagram illustrating a method for gain prediction in one embodiment;
FIG. 3 is a schematic flow chart diagram illustrating the step of calculating net worth of information in one embodiment;
FIG. 4 is a flowchart illustrating the gain step of the experimental group for predicting the test subject population in one embodiment;
FIG. 5 is a block diagram of a gain prediction apparatus in one embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The gain prediction method provided by the embodiment of the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be placed on the cloud or other network server. The terminal 102 may independently execute the gain prediction method provided by the embodiment of the present application, and the terminal 102 and the server 104 may also cooperatively execute the gain prediction method provided by the embodiment of the present application.
When the terminal 102 executes the gain prediction method alone, the terminal 102 determines an experimental group and a comparison group from a preset object group, the number of the objects in the experimental group is the same as that in the comparison group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in the target service; acquiring initial characteristic data and conversion result labels of each object in an experimental group and a control group, wherein the initial characteristic data comprises data under at least two characteristic dimensions; calculating the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group; screening out a target characteristic dimension from at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension; according to a conversion rule, adjusting the conversion result label of each object in the experimental group and the comparison group into a gain label, wherein the gain label comprises a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result label of the object in the experimental group is conversion, the second value indicates that the conversion result label of the object in the comparison group is conversion, and the third value indicates that the conversion result labels of the objects in the experimental group and the comparison group are both non-conversion; and respectively selecting objects with the first object number from the experimental group and the comparison group as training object groups, training the gain prediction model to be trained through the target characteristic data and the gain label of each object in the training object groups by the participated target business to obtain a trained target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the objects after participated in the target business.
When the terminal 102 and the server 104 cooperatively execute the gain prediction method, the terminal 102 determines an experimental group and a comparison group from a preset object group, the number of the objects of the experimental group and the comparison group is the same, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in the target service; the initial feature data and the conversion result labels of each object in the experimental group and the control group are obtained, the initial feature data comprises data under at least two feature dimensions, and the initial feature data and the conversion result labels of each object in the experimental group and the control group are sent to the server 104. The server 104 calculates the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group; screening out a target characteristic dimension from at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension; according to a conversion rule, adjusting the conversion result label of each object in the experimental group and the comparison group into a gain label, wherein the gain label comprises a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result label of the object in the experimental group is conversion, the second value indicates that the conversion result label of the object in the comparison group is conversion, and the third value indicates that the conversion result labels of the objects in the experimental group and the comparison group are both non-conversion; and respectively selecting objects with the first object number from the experimental group and the comparison group as training object groups, training the gain prediction model to be trained through the target characteristic data and the gain label of each object in the training object groups by the participated target business to obtain a trained target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the objects after participated in the target business.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart car-mounted devices, and the like. The portable wearable device can be a smart watch, a smart bracelet, a head-mounted device, and the like. The server 104 may be implemented as a stand-alone server or as a server cluster comprised of multiple servers.
It should be understood that the use of "first," "second," "third," "fourth," and similar terms in the embodiments of the present application do not denote any order, quantity, or importance, but rather the components are used to distinguish one from another. The singular forms "a," "an," or "the" and similar referents do not denote a limitation of quantity, but rather denote the presence of at least one, unless the context clearly dictates otherwise.
In one embodiment, as shown in fig. 2, a gain prediction method is provided, which may be performed by a terminal or a server alone or by the terminal and the server in cooperation. The embodiment of the present application is described by taking the application of the method to the terminal in fig. 1 as an example, and includes the following steps:
step 202, determining an experimental group and a control group from a preset object group, wherein the number of the objects in the experimental group is the same as that in the control group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in the target service.
The preset object group is a group consisting of a plurality of objects with characteristics related to the target service, and the number of the objects of the target service is the number of all the objects in the experimental group. The initial characteristic data is data corresponding to the target business for each object. The target service is to promote the object to be converted into the target service, and may be a resource giving activity, for example, material may be issued in various forms, information may be pushed, resources may be pushed, and the like. The resource may specifically be discount information, a coupon, an electronic score, an electronic resource, or the like, which is not limited in this embodiment of the application.
Because the responses of the same object under the condition of intervention and no intervention cannot be obtained simultaneously, a preset object group is selected, and the preset object group is randomly distributed into two groups with the same number of objects and consistent distribution of initial characteristic data of the objects according to the requirements of a control test (also called an A/B test): and the experimental group and the control group are made to participate in the target service, the response, namely conversion or non-conversion, of each object in the experimental group and the control group is obtained, the conversion of the experimental group is improved relative to the conversion of the control group, namely the gain of the preset object group is obtained, and the gain of each object after participating in the target service is represented by the gain of the preset object group.
Specifically, before the gain prediction model is constructed, the terminal determines an experimental group and a control group from a preset object group to obtain initial characteristic data and a conversion result label of each object in the experimental group and the control group. The number of the objects in the experimental group is the same as that in the control group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in the target service.
And 204, acquiring initial characteristic data and conversion result labels of each object in the experimental group and the control group, wherein the initial characteristic data comprises data under at least two characteristic dimensions.
Wherein the initial characteristic data is data corresponding to characteristics related to the target service of each object. The initial feature data includes data in at least two feature dimensions, the feature dimensions include basic attributes of the subject (such as gender, age group, residence, and the like), value attributes of the subject (such as subject level identification, service specification), historical target business participation characteristics of the subject (i.e., whether the subject participates in the historical target business frequently, such as 5-6 times/year, or frequently, such as 0-2 times/year), historical conversion characteristics of the subject (such as monthly cumulative conversion times, yearly cumulative conversion times, and the quarter in which the conversion time is located), historical conversion related characteristics of the subject (i.e., characteristics of other behaviors caused by conversion of the subject), and the like, which is not limited in this embodiment.
The conversion result label is converted or not, the conversion into the object behavior is carried out, taking the target business as company A to push the coupon to the user as an example, at this time, the conversion of the object behavior can be considered from multiple dimensions of participation, registration, ordering, riding and the like, namely: if the user clicks the activity page of the push coupon during the target service period, the user is considered to be converted, otherwise, the user is not converted; if the user registers the account number during the target service period, the user is considered to be converted, otherwise, the user is not converted; if the number of orders placed by the user in the target service period is larger than a preset order placing threshold value, determining that the user is converted, otherwise, determining that the user is not converted; or if the number of times of the user taking the airplane to go out is larger than a preset airplane taking threshold value during the target service period, determining that the user is converted, otherwise, determining that the user is not converted; in embodiments in other fields, the conversion of the object behavior may also be promotion information clicking, product transaction conversion, or the like, which is not limited in this application embodiment. If the conversion result label of the object in the experimental group is conversion, the conversion of the object participating in the target business is represented, if the conversion result label of the object in the experimental group is non-conversion, the conversion of the object participating in the target business is represented, if the conversion result label of the object in the control group is conversion, the conversion of the object not participating in the target business is represented, and if the conversion result label of the object in the control group is non-conversion, the conversion of the object not participating in the target business is represented.
Specifically, the terminal obtains initial characteristic data and conversion result labels of each object in the experimental group and the control group from the database, wherein the initial characteristic data comprises data under at least two characteristic dimensions.
Step 206, calculating the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group; and screening out a target characteristic dimension from at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension.
The net information value corresponding to the feature dimension is the sum of the net information values of all feature intervals of the feature dimension, and is used for measuring the prediction strength of the feature dimension, namely measuring the influence of the feature dimension on the gain. The larger the net information value corresponding to the characteristic dimension is, the more the characteristic dimension can distinguish the object which is only converted under the condition of intervention and the object which is converted under the condition of no intervention from the conversion object marked as the conversion result, namely, the more the sensitive object can be distinguished. The target feature dimension is a feature dimension with high prediction strength. The target characteristic data is characteristic data of each object under a target characteristic dimension, and the initial characteristic data corresponding to the target characteristic dimension is selected from the initial characteristic data of each object, so that the target characteristic data of each object can be obtained.
Specifically, the terminal calculates the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group; and screening out a target characteristic dimension from at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension to obtain target characteristic data of each object in the experimental group and the control group under the target characteristic dimension.
And 208, adjusting the conversion result labels of each object in the experimental group and the control group into gain labels according to the conversion rule, wherein the gain labels comprise a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result labels of the objects in the experimental group are conversion, the second value indicates that the conversion result labels of the objects in the control group are conversion, and the third value indicates that the conversion result labels of the objects in the experimental group and the control group are not conversion.
Wherein the gain label is a first value, a second value or a third value. The first value, the second value and the third value are different values, and the second value and the first value are opposite numbers. The first value-taking indicates that the transformation result label of the object in the experimental group is transformed, the second value-taking indicates that the transformation result label of the object in the control group is transformed, and the third value-taking indicates that the transformation result labels of the objects in the experimental group and the control group are both non-transformed.
The conversion rule is used for adjusting the conversion result labels of each object in the experiment group and the comparison group into the gain labels, so that the gain of each object in the experiment group and the comparison group after participating in the target service can be represented by the gain of the preset object group based on the conversion rule.
Specifically, the terminal adjusts the conversion result label of each object in the experimental group and the control group into the gain label according to the conversion rule.
The following description will take the conversion rule expressed by the formula (1) as an example.
Figure BDA0003686772440000061
In formula (1), Z is a gain label, Z ═ 1 denotes a first value, Z ═ 1 denotes a second value, Z ═ 0 denotes a third value, T denotes an experimental group, C denotes a control group, G ═ T denotes that the subject is from the experimental group, G ═ C denotes that the subject is from the control group, Y denotes a conversion result label, Y ═ 1 denotes that the conversion result label is conversion, and Y ═ 0 denotes that the conversion result label is non-conversion.
Based on the above conversion rule, assuming that the number of objects in the preset object group is 2n, randomly allocating the first n objects to participate in the target service as the realAnd (5) group checking, wherein the other n non-participated target services are used as a contrast group. Denote by i any object in the preset group of objects and by y i A transformation result label representing each object in the preset object group, using z i Representing the converted gain label of each object in the preset object group, and representing the gain of the preset object group by Lift, E [ y | t]Denotes y in the experimental group i Number of objects equal to 1, E [ y | c]Denotes y in control group i Number of objects equal to 1, then:
Figure BDA0003686772440000071
in random cases, y in the experimental group i Number of subjects 1 relative to y in control i The increment of the number of objects equal to 1 is the gain of the preset object group, and can also be called the gain brought by the participation of the experimental group in the target business, and since the control group is not changed, the gain of the preset object group depends on y in the experimental group i 1.
Therefore, based on the above conversion rule, the terminal can directly establish a regression model for the gain of each object in the predicted object group after participating in the target service, and thus can obtain the gain participation target service of the object represented by the target feature data of each object in the predicted object group after participating in the target service.
Step 210, selecting objects of the first number of objects from the experimental group and the comparison group respectively as training object groups, training the gain prediction model to be trained through the target characteristic data and the gain label of each object in the training object groups for participating in the target business, and obtaining a trained target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the objects after participating in the target business.
The first number of objects is a number smaller than the number of objects in the experimental group or the control group, and for example, 80% of the number of objects in the experimental group or the control group is taken as the first number of objects. The training subject cohort comprises an experimental group of training subject cohorts and a control group of training subject cohorts. The gain prediction model to be trained may be constructed based on a tree model, which may be implemented by an XGBoost algorithm or other ensemble learning algorithms, which is not limited in this embodiment. By establishing the gain prediction model to be trained based on the tree model, the gain prediction model can directly quantize the gain, and the gain prediction is more accurate.
The gain of the object after participating in the target service is the probability of the object converting after participating in the target service, and may also be considered as the tendency of the object to perform a conversion behavior, such as click browsing, account registration, or article purchase. For example, the target service is to push a coupon to the object, and the gain of the object after participating in the target service at this time may be the probability that the object clicks a coupon activity page, may be the probability that the object registers an account, may be the probability that the number of times the object places an order is greater than a preset order placing threshold, may also be the number of times the object travels by riding an airplane is greater than a preset airplane riding threshold, and the like. The gain prediction value of the object is used for representing the gain of the object after participating in the target service, and the larger the gain prediction value of the object is, the higher the probability that the object is converted after participating in the target service due to participating in the target service is, namely, the higher the possibility that the object belongs to the object converted under the condition of intervention is, so as to distinguish the object converted under the condition of intervention from the object converted under the condition of intervention.
The gain prediction model is used for predicting the gain of each object in the training object group after participating in the target service, the input of the gain prediction model is target characteristic data of each object in the training object group, and the output of the gain prediction model is the gain prediction value of each object in the training object group. And training a gain prediction model aiming at different target characteristic data of each object in the training object group by taking the training object group as a training sample and taking the gain label of each object in the training object group as a sample label. The target gain prediction model is a model after the gain prediction model is trained, and can be used for predicting the gain of each object in the training object group after participating in the target service, and can also be used for predicting the gain of each object in other feature groups except the training object group after participating in the target service, and the feature data of each object in the other feature groups is consistent with the target feature data in distribution.
Specifically, the terminal selects objects with a first number of objects from an experimental group and a control group respectively, and determines a set of the objects with the first number of objects selected from the experimental group and the control group as a training object group; constructing a gain prediction model to be trained based on the tree model, wherein the gain prediction model is used for predicting the gain of each object in the training object group after participating in the target service; inputting the target characteristic data of each object in the training object group into a gain prediction model to obtain a gain prediction value of each object in the experimental group and the comparison group; and training the gain prediction model according to the gain prediction value and the gain label of each object in the experimental group and the comparison group to obtain a trained target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the object after participating in the target service.
In the gain prediction method, an experimental group and a comparison group are determined from a preset object group, initial characteristic data of each object in the acquired experimental group and the comparison group are screened to obtain target characteristic data, a conversion result label of each object in the acquired experimental group and the comparison group is adjusted to be a gain label based on a conversion rule, then objects with a first object number are respectively selected from the experimental group and the comparison group to be used as a training object group, a gain prediction model is trained through the target characteristic data and the gain label of each object in the training object group to obtain a target gain prediction model, so that the object with characteristic data which is consistent with the distribution of the target characteristic data is subjected to gain prediction through the target gain prediction model, and the object which is converted only under the condition of intervention and the object which is converted under the condition of no intervention are distinguished according to the magnitude of a gain prediction value, that is, the larger the predicted value of the gain of the object is, the higher the possibility that the object belongs to the object which is only converted under the condition of intervention is, and therefore the purpose of directly modeling the gain is achieved. Compared with the existing method for indirectly establishing the gain prediction model, the gain prediction method can improve the accuracy of gain prediction by directly modeling the gain.
In one embodiment, as shown in fig. 3, calculating the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group includes:
step 302, for any characteristic interval of each characteristic dimension, calculating the evidence weights of the experimental group and the comparison group in the current characteristic interval respectively, and calculating the difference between the two evidence weights.
Wherein each feature dimension comprises at least two feature intervals, for example, the feature dimension is the age range of the subject, and may comprise five feature intervals of 0-18 years old, 19-29 years old, 30-39 years old, 40-49 years old and over 50 years old, and may also comprise four feature intervals of 0-18 years old, 19-39 years old, 40-60 years old and over 60 years old.
The evidence Weight (WOE) represents a difference between "a ratio of the objects in the current feature interval in which the transformation result is labeled as transformed to all the objects in which the transformation result is labeled as transformed" and "a ratio of the objects in the current feature interval in which the transformation result is labeled as non-transformed to all the objects in which the transformation result is labeled as non-transformed".
Specifically, for any characteristic interval of each characteristic dimension, the terminal calculates the evidence weight of the experimental group in the current characteristic interval and the evidence weight of the comparison group in the current characteristic interval, and calculates the difference between the two evidence weights. For example, the current feature interval is denoted by i,
Figure BDA0003686772440000091
the evidence weight of the experimental group in the current characteristic interval i is shown,
Figure BDA0003686772440000092
the evidence weight of the comparison group in the current characteristic interval i is shown, and the difference value between the two evidence weights is
Figure BDA0003686772440000093
Step 304, determining the coverage number of the objects covered by the current characteristic interval, and determining the net evidence weight corresponding to the current characteristic interval according to the ratio of the coverage number to the total number of the preset object group and the difference value between the two evidence weights.
Wherein, the covering quantity is the sum of the quantity of the objects of the initial characteristic data in the characteristic interval in the experimental group and the control group. The total number is the number of subjects in the preset group of subjects, i.e. the sum of the numbers of subjects in the experimental group and the control group. The net evidence weight represents a difference between the evidence weight of the experimental group and the evidence weight of the control group in the current characteristic interval, and if the net evidence weight is larger, the probability that the object transformation result label of the object in the current characteristic interval is the transformation is higher.
Specifically, the terminal determines the coverage number of the objects covered by the current characteristic interval, and performs product operation on the ratio of the coverage number to the total number of the preset object group and the difference between the two evidence weights to obtain the net evidence weight corresponding to the current characteristic interval. For example, with count [ i ]]Representing the covering number of objects covered by the current characteristic interval, sum (counts) representing the total number of the preset object group, NWOE i Representing a net evidence weight, then
Figure BDA0003686772440000094
And step 306, determining a first sample weight coefficient corresponding to the experimental group, a second sample weight coefficient corresponding to the comparison group, and calculating to obtain the net information value corresponding to the current characteristic interval according to the first sample weight coefficient, the second sample weight coefficient and the net evidence weight corresponding to the current characteristic interval.
And the first sample weight coefficient represents the difference of the conversion result labels of the experiment group in the current characteristic interval, wherein the conversion result labels are the total proportion of the converted and unconverted objects in the corresponding samples. And the second sample weight coefficient represents the difference of the conversion result label of the control group in the current characteristic interval, namely the total proportion of the converted and unconverted objects in the corresponding samples.
Specifically, the terminal determines a first sample weight coefficient corresponding to the experimental group and a second sample weight coefficient corresponding to the control group, and calculates a difference value between the first sample weight coefficient and the second sample weight coefficientAnd performing product operation on the difference value of the first sample weight coefficient and the second sample weight coefficient and the net evidence weight corresponding to the current characteristic interval to obtain the net information value corresponding to the current characteristic interval. For example, with
Figure BDA0003686772440000095
Represents the weight coefficient of the first sample,
Figure BDA0003686772440000096
representing the second sample weight coefficient, NIV i The value of the net information corresponding to the current characteristic interval is represented, then
Figure BDA0003686772440000097
And 308, calculating the net information value corresponding to each characteristic dimension based on the net information value corresponding to each characteristic interval under the current characteristic dimension for each characteristic dimension.
Specifically, for each feature dimension, the terminal performs summation operation on the net information value corresponding to each feature interval under the current feature dimension to obtain the net information value corresponding to each feature dimension. For example, if n represents the number of the characteristic intervals, and NIV represents the net information value corresponding to the current characteristic interval, then
Figure BDA0003686772440000101
Figure BDA0003686772440000102
In this embodiment, the net evidence weight of the current feature interval is determined based on the evidence weights of the experimental group and the control group in any feature interval of each feature dimension, and the net information value corresponding to each feature dimension is obtained based on the difference between the sample weight coefficients respectively corresponding to the experimental group and the control group and the net evidence weight of the current feature interval, so that the purpose of screening out the target feature dimension from at least two feature dimensions according to the net information value corresponding to each feature dimension, and thus screening out the target feature data from the initial feature data can be achieved. Because the target characteristic dimension is a characteristic dimension with high prediction strength, namely the prediction capability is stronger than other characteristic dimensions, compared with the method that initial characteristic data is directly used as the characteristic variable input by the gain prediction model, the method can shorten the time for training the gain prediction model and improve the efficiency for training the gain prediction model by using the target characteristic data as the characteristic variable input by the gain prediction model, thereby being beneficial to improving the accuracy of the gain prediction model for gain prediction.
In one embodiment, before calculating the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group, the gain prediction method further includes preprocessing the initial feature data, including at least null value filling, outlier processing, class variable conversion and continuous variable feature binning, to obtain feature data meeting the data format requirement corresponding to the gain prediction model.
In one embodiment, determining a first sample weight coefficient corresponding to the experimental group and a second sample weight coefficient corresponding to the control group comprises: determining a first ratio between the number of the response objects in the current characteristic interval in the experiment group and the number of all the response objects in the experiment group; determining a second ratio between the number of the unresponsive objects in the current characteristic interval in the experimental group and the number of all unresponsive objects in the experimental group; determining a first sample weight coefficient corresponding to the experimental group based on a difference between the first ratio and the second ratio; determining a third ratio between the number of the response objects in the current characteristic interval in the comparison group and the number of all the response objects in the comparison group; determining a fourth ratio between the number of the non-responsive objects in the current characteristic interval in the comparison group and the number of all the non-responsive objects in the comparison group; and determining a second sample weight coefficient corresponding to the control group based on the difference between the third ratio and the fourth ratio.
Wherein, the response object is the object with the conversion result label as conversion, and the non-response object is the object with the conversion result label as non-conversion. The first ratio is the proportion of the response objects in the current characteristic interval in the experimental group to all the response objects in the experimental group. The second ratio is the proportion of the non-responsive objects in the current signature interval in the experimental group to all the non-responsive objects in the experimental group. The third ratio is the proportion of the response objects in the current characteristic interval in the control group to all the response objects in the control group. The fourth ratio is the proportion of the non-responsive objects in the current characteristic interval in the control group to all the non-responsive objects in the control group.
Specifically, the terminal determines a first ratio between the number of the response objects in the current characteristic interval in the experimental group and the number of all the response objects in the experimental group, determines a second ratio between the number of the non-response objects in the current characteristic interval in the experimental group and the number of all the non-response objects in the experimental group, and obtains a first sample weight coefficient corresponding to the experimental group by subtracting the first ratio from the second ratio. For example, with y it Representing the number of responding subjects in the current signature interval, y, in the experimental group t Denotes the number of all responding subjects in the experimental group, n it Representing the number of unresponsive objects in the current signature interval in the experimental group, n t Representing the number of all non-responding subjects in the experimental group,
Figure BDA0003686772440000111
representing a first sample weight coefficient corresponding to the experimental group, the first ratio is y it /y t The second ratio is n it /n t
Figure BDA0003686772440000112
And the terminal determines a third ratio between the number of the response objects in the current characteristic interval in the comparison group and the number of all the response objects in the comparison group, determines a fourth ratio between the number of the non-response objects in the current characteristic interval in the comparison group and the number of all the non-response objects in the comparison group, and obtains a second sample weight coefficient corresponding to the comparison group by subtracting the third ratio and the fourth ratio. For example, with y ic Representing the number of responding objects in the current characteristic interval in the control group, y c Presentation pairNumber of all responding objects in a group, n ic Representing the number of non-responding objects in the current feature interval in the control group, n c Representing the number of all non-responding subjects in the control group,
Figure BDA0003686772440000113
representing a second sample weight coefficient corresponding to the control group, the third ratio is y ic /y c The fourth ratio is n ic /n c
Figure BDA0003686772440000114
In this embodiment, the purpose of determining the first sample weight coefficient corresponding to the experimental group and the second sample weight coefficient corresponding to the control group can be achieved by determining the number of the responsive objects and the non-responsive objects in the current characteristic interval in the experimental group, the number of all the responsive objects and all the non-responsive objects in the experimental group, the number of the responsive objects and the non-responsive objects in the current characteristic interval in the control group, and the number of all the responsive objects and all the non-responsive objects in the control group.
In one embodiment, the gain prediction method further comprises determining a conversion index based on the conversion result labels of each object in the experimental group and the control group, wherein the conversion index comprises at least one of conversion increase rate, conversion quantity increase amount, first resource increase amount and second resource increase amount; evaluating the real gain brought by the target service based on the conversion index; the conversion rate is the difference value between the conversion rate of the experimental group and the conversion rate of the control group, the conversion rate of the experimental group is the ratio of the number of conversion objects of the experimental group to the number of objects of the experimental group, and the conversion rate of the control group is the ratio of the number of conversion objects of the control group to the number of objects of the control group; the conversion quantity increase is the product of the conversion increase rate and the number of conversion objects of the experimental group; the first resource promotion amount is the sum of a first product and a second product, the first product is the product of the transformation quantity promotion amount and the first value of the experimental group, and the second product is the product of a first difference value and a second difference value, wherein the first difference value is the difference value between the transformation object quantity of the experimental group and the transformation quantity promotion amount, and the second difference value is the difference value between the first value of the experimental group and the first value of the control group; the second resource lifting amount is the sum of a first product and a third product, and the third product is the product of a first difference value, a first ratio and a third difference value, wherein the first ratio is the ratio of the first value of the experimental group to the second value of the control group, and the third difference value is the difference of the second value of the experimental group and the second value of the control group.
Wherein the transformation index is a measure determined by comparing the transformation behaviors of all subjects in the experimental group and the control group. The conversion promotion rate is a conversion promotion rate of the object behavior, such as a user participation conversion promotion rate, a promotion information click promotion rate or a product transaction conversion promotion rate. The first resource promotion amount is a resource promotion amount of the experimental group relative to the control group based on a first price value, the second resource promotion amount is a resource promotion amount of the experimental group relative to the control group based on a second price value, and the first value and the second value are different dimensions of resource statistics. Taking the target business as company a to push the coupon to the user, the conversion promotion rate of the target behavior is the passenger conversion promotion rate, the first value is the average fare and the second value is the average discount rate as an example, at this time, the conversion quantity promotion amount is the passenger promotion amount, the first resource promotion amount is the passenger income promotion amount based on the average fare, and the second resource promotion amount is the passenger income promotion amount based on the average discount rate, which is not limited in the embodiment of the present application.
The real gain brought by the target service is the real gain brought by the participation of the experimental group in the target service, and can be understood as the tendency degree of the whole objects in the experimental group as a whole to perform the conversion action. And the conversion index of the experiment group is used for expressing the real gain brought by the experiment group participating in the target service, and evaluating the real gain brought by the experiment group participating in the target service according to the target service gain overall measurement criterion.
Specifically, the terminal may determine a conversion index based on the conversion result tags of each object in the experimental group and the control group before or after the gain prediction model is constructed, where the conversion index includes at least one of a conversion increase rate, a conversion quantity increase amount, a first resource increase amount, and a second resource increase amount; constructing a target service gain overall measurement criterion based on the conversion index; and evaluating the real gain brought by the participation of the experimental group in the target service based on the target service gain overall measurement criterion.
When the conversion index is the conversion improvement rate, the target service gain overall measurement criterion may include: if the conversion promotion rate is a positive value, the real gain brought by the representation experiment group participating in the target service is the conversion behavior of the improved object, and the larger the positive value is, the larger the real gain brought by the representation experiment group participating in the target service is, the more the conversion behavior of the improved object is facilitated, the more the positive value approaches to zero, the smaller the real gain brought by the representation experiment group participating in the target service is, and the influence of the target service on the conversion behavior of the object is not obvious; if the conversion promotion rate is a negative value, the real gain brought by the participation of the representation experiment group in the target business is the reduction of the conversion behavior of the object, and the larger the absolute value of the negative value is, the larger the influence of the representation target business on the conversion behavior of the object is, the more adverse the improvement of the conversion behavior of the object is.
When the conversion index is the conversion number increase, the target service gain overall measurement criterion may include: if the conversion quantity improvement amount is a positive value, the real gain brought by the representation experiment group participating in the target service is the conversion behavior of the improved object, and the larger the positive value is, the larger the real gain brought by the representation experiment group participating in the target service is, the more the conversion behavior of the improved object is facilitated; if the conversion quantity increase is a negative value, the real gain brought by the participation of the representation experiment group in the target service is the conversion behavior of the object, and the larger the absolute value of the negative value is, the larger the influence of the representation target service on the conversion behavior of the object is, the more adverse the improvement of the conversion behavior of the object is.
When the conversion index is the first resource increase amount, the target traffic gain overall measurement criterion may include: if the first resource lifting amount is a positive value, the real gain brought by the fact that the experiment group participates in the target service is a conversion behavior of the object, and the larger the positive value is, the larger the real gain brought by the fact that the experiment group participates in the target service is, the more the conversion behavior of the object is favorably improved; if the first resource lifting amount is a negative value, the real gain brought by the representation experiment group participating in the target business is the conversion behavior of the object, and the larger the absolute value of the negative value is, the larger the influence of the representation target business on the conversion behavior of the object is, the more the improvement of the conversion behavior of the object is not facilitated.
When the conversion index is the second resource increase amount, the target service gain overall measurement criterion may include: if the second resource lifting amount is a positive value, the real gain brought by the fact that the experiment group participates in the target service is a conversion behavior of the object, and the larger the positive value is, the larger the real gain brought by the fact that the experiment group participates in the target service is, the more the conversion behavior of the object is favorably improved; if the second resource lifting amount is a negative value, the real gain brought by the representation experiment group participating in the target business is the conversion behavior of the object, and the larger the absolute value of the negative value is, the larger the influence of the representation target business on the conversion behavior of the object is, the more adverse the improvement of the conversion behavior of the object is.
In this embodiment, after the conversion result labels of each object in the experimental group and the control group are obtained, before or after the gain prediction model is constructed, the conversion index may be determined based on the conversion result labels of each object in the experimental group and the control group, and based on the conversion index, the purpose of evaluating the real gain brought by the target service may be achieved.
In one embodiment, as shown in fig. 4, the gain prediction method further includes:
step 402, selecting objects with a second number of objects from the experimental group and the control group respectively, and determining a test object group; acquiring target characteristic data and a conversion result label of each object in an experiment group of a test object group; and inputting the target characteristic data of each object in the experimental group of the test object group into the target gain prediction model to obtain the gain prediction value corresponding to each object in the experimental group of the test object group.
Wherein the second number of subjects is a number smaller than the number of subjects in the experimental group or the control group, for example, 20% of the number of subjects in the experimental group or the control group is taken as the second number of subjects. The test object group includes an experimental group of the test object group and a control group of the test object group, and the test object group and the objects in the training object group may or may not be repeated, for example, the test object group may be a set of objects in a preset object group except the training object group, which is not limited in this embodiment. Because only the experimental group participates in the target service, the target characteristic data of each object in the experimental group of the test object group is selected as a target gain prediction model to predict the gain of each object in the experimental group of the test object group.
Specifically, the terminal selects objects with a second number of objects from the experimental group and the control group respectively, and determines a set of the objects with the second number of objects selected from the experimental group and the control group as a test object group; and acquiring target characteristic data of each object in the experimental group of the test object group from the database, and inputting the target characteristic data of each object in the experimental group of the test object group into the target gain prediction model to obtain a gain prediction value corresponding to each object in the experimental group of the test object group.
Step 404, dividing the experimental group of the test object group into a first set, a second set and a third set based on the magnitude of the gain prediction value corresponding to each object in the experimental group of the test object group; the minimum gain prediction value in the first set is larger than the maximum gain prediction value in the second set, and the minimum gain prediction value in the second set is larger than the maximum gain prediction value in the third set.
The first set, the second set, and the third set may be three object groups obtained by dividing the experiment groups of the test object group according to a rule of descending the gain prediction values, in other embodiments, the experiment groups of the test object group may also be divided according to other division rules, and two, four, five, or other limited number of object groups may be divided, which is not limited in this embodiment.
Specifically, the terminal arranges the experimental groups of the test object group in a descending order according to the size of the gain predicted value and divides the experimental groups according to the quantile mode to obtain a first set, a second set and a third set based on the size of the gain predicted value corresponding to each object in the experimental groups of the test object group; the minimum gain prediction value in the first set is larger than the maximum gain prediction value in the second set, and the minimum gain prediction value in the second set is larger than the maximum gain prediction value in the third set.
Step 406, calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion index and the conversion result label of each object in the experimental group of the test object group.
The target conversion index is used for measuring the effect of the target service, and may be, for example, a conversion increase rate, a conversion quantity increase amount, and the like. The first actual gain result is an actual gain result corresponding to a first set of target conversion indicators, the second actual gain result is an actual gain result corresponding to a second set of target conversion indicators, and the third actual gain result is an actual gain result corresponding to a third set of target conversion indicators.
Specifically, the terminal calculates at least one of a first actual gain result of the first set, a second actual gain result of the second set, and a third actual gain result of the third set according to the conversion result tag of each object in the experimental group of the test object group based on the target conversion index.
And step 408, evaluating the prediction accuracy of the target gain prediction model based on at least one actual gain result of the first actual gain result, the second actual gain result and the third actual gain result.
Specifically, the terminal evaluates the prediction accuracy of the target gain prediction model based on at least one of the first actual gain result, the second actual gain result and the third actual gain result. For example, if the first actual gain result, the second actual gain result and the third actual gain result are sequentially decreased, it is determined that the prediction accuracy of the target gain prediction model meets the preset requirement; otherwise, determining that the prediction accuracy of the target gain prediction model does not meet the preset requirement.
In the embodiment, the number of the objects of the second object number is respectively selected from the experimental group and the control group, and the test object group is determined; performing gain prediction on the experimental group of the test object group by using a target gain prediction model, and determining a first set, a second set and a third set, wherein the minimum gain prediction value in the first set is greater than the maximum gain prediction value in the second set, and the minimum gain prediction value in the second set is greater than the maximum gain prediction value in the third set; determining at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion index and the conversion result label of each object in the experimental group of the test object group; the purpose of evaluating the prediction accuracy of the target gain prediction model can be achieved based on at least one of the first actual gain result, the second actual gain result and the third actual gain result.
In one embodiment, the target conversion index includes at least one of a target conversion increase rate, a target conversion quantity increase amount, a target first resource increase amount, and a target second resource increase amount; the target conversion increasing rate is the difference value between the conversion rate of the test group and the conversion rate of the reference group, the conversion rate of the test group is the ratio of the number of conversion objects of the test group to the number of objects of the test group, and the conversion rate of the reference group is the ratio of the number of conversion objects of the reference group to the number of objects of the reference group; the target conversion quantity increase is the product of the target conversion increase rate and the conversion object quantity of the test group; the target first resource lifting amount is the sum of a target first product and a target second product, the target first product is the product of a target conversion quantity lifting amount and a first value of the test group, and the target second product is the product of a target first difference value and a target second difference value, wherein the target first difference value is the difference value between the conversion object quantity of the test group and the target conversion quantity lifting amount, and the target second difference value is the difference value between the first value of the test group and the first value of the reference group; the target second resource promoting amount is the sum of a target first product and a target third product, the target third product is the product of a target first difference value, a target first ratio value and a target third difference value, wherein the target first ratio value is the ratio of the first value of the test group to the second value of the test group, and the target third difference value is the difference of the second value of the test group and the second value of the reference group.
Corresponding to the transformation indexes determined by the comparison results of the transformation behaviors of all the objects in the experimental group and the control group, the target transformation index is a measurement index determined by the comparison results of the transformation behaviors of all the objects in the test group and the reference group, and is used for evaluating the real gain brought by the participation of the experimental group of the test object group in the target service. The test set may be any one of the first set, the second set, and the third set, and the reference set may be an experimental set of the test object group, or may be any one of the first set, the second set, and the third set and different from the test set. The number of conversion objects is the number of objects labeled as conversion results.
In this embodiment, target transformation indexes are defined from a plurality of dimensions of a target transformation improvement rate, a target transformation quantity improvement amount, a target first resource improvement amount and a target second resource improvement amount, and effects of participating in a target service on an experimental group of a test object group are quantified, so that the purpose of evaluating a real gain caused by the participation of the experimental group of the test object group in the target service can be achieved, and the target transformation indexes of the plurality of dimensions are combined, so that the target transformation indexes of the dimensions can be verified mutually, and the purpose of improving the accuracy of the real gain caused by the evaluation of the target service can be achieved.
In one embodiment, calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion metric and the conversion result label for each subject in the experimental group of the group of test subjects comprises: taking the first set as a test set, taking an experiment set of the test object group as a reference set, and calculating a first actual gain result corresponding to the target conversion index; taking the second set as a test set, taking an experiment set of the test object group as a reference set, and calculating a second actual gain result corresponding to the target conversion index; and taking the third set as a test group, taking the experiment group of the test object group as a reference group, and calculating a third actual gain result corresponding to the target conversion index.
Evaluating the prediction accuracy of the target gain prediction model based on at least one of the first actual gain result, the second actual gain result, and the third actual gain result, including: judging whether the first actual gain result, the second actual gain result and the third actual gain result are decreased in sequence; and if so, determining that the prediction accuracy of the target gain prediction model meets the preset requirement, and if not, training the target gain prediction model again.
The first actual gain result is an actual gain result of the first set corresponding to the target transformation index relative to the experimental group of the test object group, the second actual gain result is an actual gain result of the second set corresponding to the target transformation index relative to the experimental group of the test object group, and the third actual gain result is an actual gain result of the third set corresponding to the target transformation index relative to the experimental group of the test object group.
Specifically, the terminal takes the first set as a test set, takes the experiment set of the test object group as a reference set, calculates at least one of a target conversion and promotion rate, a target conversion quantity and promotion amount, a target first resource and second resource promotion amount of the first set relative to the experiment set of the test object group, and determines at least one of the target conversion and promotion rate, the target conversion quantity and promotion amount, the target first resource and second resource promotion amount of the first set relative to the experiment set of the test object group as a first actual gain result; taking the second set as a test set, taking the experiment set of the test object group as a reference set, calculating at least one of a target conversion lifting rate, a target conversion quantity lifting amount, a target first resource lifting amount and a target second resource lifting amount of the second set relative to the experiment set of the test object group, and determining at least one of the target conversion lifting rate, the target conversion quantity lifting amount, the target first resource lifting amount and the target second resource lifting amount of the second set relative to the experiment set of the test object group as a second actual gain result; and taking the third set as a test group, taking the experiment group of the test object group as a reference group, calculating at least one of a target conversion and promotion rate, a target conversion quantity and promotion amount, a target first resource and a target second resource promotion amount of the third set relative to the experiment group of the test object group, and determining at least one of the target conversion and promotion rate, the target conversion quantity and promotion amount, the target first resource and the target second resource promotion amount of the third set relative to the experiment group of the test object group as a third actual gain result.
The terminal judges whether the first actual gain result, the second actual gain result and the third actual gain result are decreased in sequence; and if so, determining that the prediction accuracy of the target gain prediction model meets the preset requirement, and if not, training the target gain prediction model again.
For example, the target conversion index is a target conversion increase rate, and the step of calculating the first actual gain result, the second actual gain result, and the third actual gain result includes: the terminal takes the first set as a test set and the experiment set of the test object group as a reference set, calculates the target conversion and improvement rate of the first set relative to the experiment set of the test object group, and determines the target conversion and improvement rate of the first set relative to the experiment set of the test object group as a first actual gain result; taking the second set as a test set and the experiment set of the test object group as a reference set, calculating the target conversion and improvement rate of the second set relative to the experiment set of the test object group, and determining the target conversion and improvement rate of the second set relative to the experiment set of the test object group as a second actual gain result; and taking the third set as a test set and the experiment set of the test object group as a reference set, calculating the target conversion and improvement rate of the third set relative to the experiment set of the test object group, and determining the target conversion and improvement rate of the third set relative to the experiment set of the test object group as a third actual gain result. The step of calculating the target conversion improvement rate of the first set relative to the experimental group of the test object group by the terminal is as follows: the terminal calculates the ratio of the number of conversion objects of the first set to the number of objects of the first set to obtain the conversion rate of the first set; then calculating the ratio of the number of the conversion objects of the experiment groups of the test object group to the number of the objects of the experiment groups of the test object group to obtain the conversion rate of the experiment groups of the test object group; and calculating the difference value between the first set conversion rate and the experimental group conversion rate of the test object group to obtain the target conversion improvement rate.
The terminal judges whether the target conversion increasing rate of the first set relative to the experimental group of the test object group, the target conversion increasing rate of the second set relative to the experimental group of the test object group and the target conversion increasing rate of the third set relative to the experimental group of the test object group are sequentially decreased; and if so, determining that the prediction accuracy of the target gain prediction model meets the preset requirement, and if not, training the target gain prediction model again.
In this embodiment, the first set, the second set, and the third set are respectively used as test groups, and the experimental group of the test object group is used as a reference group, so that the purpose of calculating the first actual gain result, the second actual gain result, and the third actual gain result corresponding to the target conversion index can be achieved.
In one embodiment, calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion metric and the conversion result label for each subject in the experimental group of the group of test subjects comprises: taking the first set as a test group, taking the second set or the third set as a reference group, and calculating a first actual gain result corresponding to the target conversion index; and taking the second set as a test group and the third set as a reference group, and calculating a second actual gain result corresponding to the target conversion index.
Evaluating the prediction accuracy of the target gain prediction model based on at least one of the first actual gain result, the second actual gain result, and the third actual gain result, including: and under the condition that the first actual gain result is a positive value and the second actual gain result is a positive value, determining that the prediction accuracy of the target gain prediction model meets the preset requirement, otherwise, performing retraining on the target gain prediction model.
Wherein the first actual gain result is an actual gain result of the first set relative to the second set or the third set corresponding to the target conversion metric, and the second actual gain result is an actual gain result of the second set relative to the third set corresponding to the target conversion metric.
Specifically, the terminal takes the first set as a test group, takes the second set or the third set as a reference group, calculates at least one of a target conversion and promotion rate, a target conversion quantity promotion amount, a target first resource promotion amount and a target second resource promotion amount of the first set relative to the second set or the third set, and determines at least one of the target conversion and promotion rate, the target conversion quantity promotion amount, the target first resource promotion amount and the target second resource promotion amount of the first set relative to the second set or the third set as a first actual gain result; and taking the second set as a test group, taking the third set as a reference group, calculating at least one of a target conversion and lifting rate, a target conversion quantity lifting amount, a target first resource lifting amount and a target second resource lifting amount of the second set relative to the third set, and determining at least one of the target conversion and lifting rate, the target conversion quantity lifting amount, the target first resource lifting amount and the target second resource lifting amount of the second set relative to the third set as a second actual gain result.
And the terminal judges whether the first actual gain result and the second actual gain result are positive values, if so, the prediction accuracy of the target gain prediction model is determined to meet the preset requirement, and if not, the target gain prediction model is trained again.
For example, the target conversion index is a target conversion increase rate, and the step of calculating the first actual gain result and the second actual gain result includes: the terminal takes the first set as a test group and the second set as a reference group, calculates the target conversion and lifting rate of the first set relative to the second set, and determines the target conversion and lifting rate of the first set relative to the second set as a first actual gain result; and taking the second set as a test group and the third set as a reference group, calculating the target conversion and lifting rate of the second set relative to the third set, and determining the target conversion and lifting rate of the second set relative to the third set as a second actual gain result. The method for calculating the target conversion and promotion rate of the first set relative to the second set by the terminal comprises the following steps: the terminal calculates the ratio of the number of conversion objects of the first set to the number of objects of the first set to obtain the conversion rate of the first set; then calculating the ratio of the number of the conversion objects of the second set to the number of the objects of the second set to obtain the conversion rate of the second set; and calculating the difference value between the first set conversion rate and the second set conversion rate to obtain the target conversion improvement rate.
The terminal judges whether the target conversion and lifting rate of the first set relative to the second set and the target conversion and lifting rate of the second set relative to the third set are both positive values; if so, determining that the prediction accuracy of the target gain prediction model meets the preset requirement, otherwise, performing retraining on the target gain prediction model.
In this embodiment, the first set and the second set are used as the test group and the reference group, respectively, and the second set and the third set are used as the test group and the reference group, respectively, so that the first actual gain result and the second actual gain result corresponding to the target conversion index can be calculated.
In one embodiment, the gain prediction method further comprises the steps of acquiring to-be-intervened characteristic data of each to-be-intervened object in the to-be-intervened object group; inputting the characteristic data to be intervened into a target gain prediction model to obtain a gain prediction value corresponding to each object to be intervened in the object group to be intervened; determining the target times to be intervened based on different intervention strategies; and dividing the target group to be intervened according to the target number of times to be intervened according to the gain prediction value corresponding to each target to be intervened, so as to obtain the target group to be intervened.
The group of the objects to be intervened is a group consisting of a plurality of objects to be intervened with characteristics corresponding to the characteristic data to be intervened. And the characteristic data to be intervened is the characteristic data which is consistent with the distribution of the target characteristic data. Taking the target service as an example of issuing coupons, the intervention policy may be to issue a first number of coupons for each object, or may be to issue no more than a second number of coupons for each object. The target to-be-intervened times are the times of participating in the target service. The target object group to be intervened is an object participating in the target service in the object group to be intervened.
Specifically, the terminal acquires to-be-intervened characteristic data of each to-be-intervened object in the to-be-intervened object group; inputting the characteristic data to be intervened into a target gain prediction model to obtain a gain prediction value corresponding to each object to be intervened in the object group to be intervened; determining the target times to be intervened based on different intervention strategies; and arranging the gain predicted values corresponding to each object to be intervened in a descending order, dividing the object groups to be intervened according to the target object to be intervened times to obtain the target object groups to be intervened so as to participate in the target service for the target object groups to be intervened. For example, if the target service issues coupons to the group of objects to be intervened, the number of objects in the group of objects to be intervened is 2000, the plurality of pre-policies issue coupons of 1 to each object, and the number of times of target intervention is 1000, the group of objects to be intervened may be the first 1000 objects in the group of objects to be intervened, or may be the 500 th to 1500 th objects in the group of objects to intervene; the plurality of pre-policies are to issue coupons whose number is not more than 3 for each object, and the target number of times to intervene is 1000, then the target group of objects to intervene may be the first 1000 objects in the group of objects to intervene and issue 1 coupon for each object, or may be the first 500 objects in the group of objects to intervene and issue 2 coupons for each object.
After the target service of the target to-be-intervened object group is finished, the terminal acquires the number of conversion objects of the intervened object group and the non-intervention object group, wherein the intervened object group is a set of all objects or part of objects of the target to-be-intervened object group, the non-intervention object group is an object group except the target to-be-intervened object group in the to-be-intervened object group, and the number of the non-intervention object group is the same as that of the intervened object group. The terminal determines at least one of a corresponding conversion improvement rate, a conversion quantity improvement amount, a first resource improvement amount and a second resource improvement amount based on the conversion object quantity of the intervened object group and the non-intervention object group, and evaluates the real gain brought by the target service participation of the target to-be-intervened object group based on the target service gain overall measurement criterion, namely, evaluates the real gain brought by the target service under the guidance of the target gain prediction model.
In this embodiment, the target gain prediction model is used to perform gain prediction on the target group to be interfered to obtain a gain prediction value of each target, and the intervention strategy is combined to determine the target number of times to be interfered, so that the target group to be interfered can be determined from the target group to be interfered to participate in the target service.
In one embodiment, there is provided a gain prediction method comprising the steps of:
firstly, determining a preset object group according to the characteristics of target services, and dividing the preset object group into an experiment group and a control group according to the requirements of an A/B experiment. The number of samples in the experimental group and the control group is consistent, and the characteristic distribution of the samples is consistent. The target service is to promote the conversion of the object into the target service, and may be a resource gifting activity, for example, a coupon may be pushed to the user, which is not limited in this embodiment of the present application.
Selecting a database, obtaining initial characteristic data and a conversion result label of a preset object group, taking a set of 80% of objects in the preset object group as a training set (namely, a training object group in the embodiment), taking a set of the remaining 20% of objects as a test set (namely, a test object group in the embodiment), wherein the training set is used for training the gain prediction model to obtain a target gain prediction model, and the test set is used for evaluating the prediction accuracy of the target gain prediction model. The initial feature data includes data in at least two feature dimensions. And preprocessing the acquired initial characteristic data, wherein the preprocessing at least comprises null value filling, abnormal value processing, class variable conversion and continuous variable characteristic binning, so as to obtain the characteristic data meeting the data format requirement corresponding to the gain prediction model.
And thirdly, selecting characteristics. For target services, the situation of high-dimensional and sparse features often exists, feature selection is needed, and meanwhile, the target services often need to observe which features have great influence on gains. The target feature dimension is screened from at least two feature dimensions by calculating the net information value of each feature dimension and based on the net information value corresponding to each feature dimension, and target feature data of each object in the experimental group and the control group under the target feature dimension are determined.
And fourthly, constructing a target service gain overall measurement criterion based on the conversion index, and evaluating the real gain brought by the historical target service through the target service gain overall measurement criterion.
The conversion indexes are conversion increasing rate, conversion quantity increasing quantity, first resource increasing quantity and second resource increasing quantity.
The formula for calculating the conversion improvement rate of the experimental group is as follows: conversion increase-experimental group conversion-control group conversion, experimental group conversion-number of conversion subjects in experimental group/number of subjects in experimental group, and control group conversion-number of conversion subjects in control group/number of subjects in control group. At this time, the overall target service gain criterion is: higher conversion increase rate indicates more positive target service effect, and approaching 0 indicates that the effect is not obvious.
The formula for calculating the conversion quantity increase of the experimental group is as follows: the conversion number increase amount is the conversion increase rate of the experimental group and the number of conversion subjects of the experimental group. At this time, the overall target service gain measurement criterion is as follows: the conversion quantity is the lifting quantity when the lifting quantity is a positive value, and is the reduction quantity when the lifting quantity is a negative value; the larger the absolute value, the larger the impact of the target service.
The calculation formula of the first resource lifting amount of the experimental group is as follows: the first resource boost (number of transformed subjects in the experimental group-boost of number of transformed subjects in the experimental group) ((first value of experimental group-first value of control group)). At this time, the overall target service gain measurement criterion is as follows: the first resource lifting amount is a lifting amount when the first resource lifting amount is a positive value, and is a reducing amount when the first resource lifting amount is a negative value.
The calculation formula of the second resource lifting amount of the experimental group is as follows: the second resource boost (number of transformed subjects in the experimental group-boost of number of transformed subjects in the experimental group) (first value of experimental group/second value of experimental group) is the first value of control group. At this time, the overall target service gain measurement criterion is as follows: the second resource lifting amount is a lifting amount when the second resource lifting amount is a positive value, and is a reducing amount when the second resource lifting amount is a negative value.
Taking the example that the target business is company A and pushes coupons to users, the conversion promotion rate is the conversion promotion rate of the number of passengers, the conversion quantity promotion amount is the promotion amount of the number of passengers, the first value is the average fare, the first resource promotion amount is the promotion amount of the income of the passengers based on the average fare, the second value is the average discount rate, and the second resource promotion amount is the promotion amount of the income of the passengers based on the average discount rate, the real gain of the experiment group participating in the target business is evaluated through the overall measurement criterion of the target business gain, and the evaluation result shown in the table I is obtained.
Watch 1
Index (I) Experimental group Control group Lifting of
Number of subjects 67150 67150
Riding in 10365590 3782592
Average fare 790 597
Average discount rate 0.272 0.246
Number of riding the robot 13121 6336 6809
Improvement rate of secondary conversion by man 0.1954 0.094 0.101
Income increasing amount of passenger plane (average fare) - - 6597332
Income promoting amount of riding machine (average discount rate) - - 5379118
As shown in table one, the secondary conversion increasing rate of the experimental group by the robot is 0.101, and according to the overall target service gain measuring criteria, the effect of the experimental group participating in the target service is determined to be a positive effect, but the positive effect is not obvious; the conversion quantity of the experimental group is increased to 6809, and the effect of the experimental group participating in the target service is determined to be a positive effect according to the target service gain overall measurement criterion; the first resource lifting amount of the experimental group is 6597332, and the effect of the experimental group participating in the target service is determined to be a positive effect according to the target service gain overall measurement criterion; and the second resource increasing amount of the experimental group is 5379118, and the effect of the experimental group participating in the target service is determined to be a positive effect according to the target service gain overall measurement criterion.
And fifthly, converting the conversion result labels of each object in the experimental group and the control group into gain labels based on the conversion rule. The conversion rule is:
Figure BDA0003686772440000211
in formula (1), Z is a gain label, T is an experimental group, C is a control group, G ═ T is an object from the experimental group, G ═ C is an object from the control group, Y is a conversion result label, Y ═ 1 is a conversion result label of conversion, and Y ═ 0 is an unconverted conversion result label.
And sixthly, constructing a gain prediction model to be trained based on the tree model, taking the target characteristic data of each object in the training set as input, taking the gain prediction value of each object in the training set as output, comparing the gain prediction value of each object in the training set with the gain label, training the gain prediction model in the direction of reducing the difference between the gain prediction value and the gain label, and obtaining the trained target gain prediction model.
And seventhly, predicting the experiment groups of the test set through the target gain prediction model to obtain a gain prediction value, and evaluating the prediction accuracy of the target gain prediction model based on the scoring quantile of the gain prediction value and the target conversion index.
And dividing the experimental groups of the test set into three groups by adopting the scoring quantiles of the gain predicted values:
1. high gain population a: the gain predicted value is more than or equal to 0.7 quantile;
2. middle gain population b: 0.3 quantile > gain prediction >0.7 quantile;
3. low gain population c: the gain prediction value is less than or equal to 0.3 quantile.
And verifying the prediction accuracy of the target gain prediction model by comparing a plurality of target conversion indexes such as the number conversion promotion rate of each crowd by the passenger and the income promotion amount of the passenger. For example, if the human-to-machine conversion improvement rate of the population a is improved by 1 time by comparing with all the objects of the experimental group of the test set; compared with all objects of the crowd c, the improvement rate of the man-machine times conversion of the crowd a is improved by 3.1 times, and the target gain prediction model has higher prediction accuracy.
And eighthly, under the condition that the prediction accuracy of the estimated target gain prediction model meets the preset requirement, determining a target to-be-intervened object group from the to-be-intervened object group according to different intervention strategies and the gain prediction value corresponding to each to-be-intervened object in the to-be-intervened object group, and participating in the target service for the target to-be-intervened object group. For example, the target service is to give away points to the user. The method comprises the steps of calculating the disposable number for each category of integral according to the total integral to be presented and the integral size, correspondingly sequencing and truncating an object group to be intervened according to a model gain prediction value and the calculated total integral number, dividing the object group to be intervened into different groups to configure different integral categories, and achieving the aim of improving the conversion rate of target services under the same intervention cost.
After the target group to be intervened participates in the target service, two groups with the same number of objects are selected from the target group to be intervened and the object groups except the target group to be intervened in the target group to be intervened as an experimental group and a control group; determining a corresponding conversion index based on the conversion object quantity of the experimental group and the control group, namely determining at least one of a corresponding conversion increase rate, a conversion quantity increase quantity, a first resource increase quantity and a second resource increase quantity; and evaluating the real gain brought by the target service under the guidance of the target gain prediction model based on the target service gain overall measurement criterion.
In this embodiment, according to the target feature data and the gain label of the target, a target gain prediction model is established based on the tree model, and a target conversion index is determined, so that on one hand, the real synergy of the intervention activity can be rapidly and clearly evaluated, on the other hand, the target which is converted only under the intervention condition and the target which is converted under the intervention condition can be distinguished through the target gain prediction model, data support is provided for different intervention strategies of the subsequent formulated target service, and a scientific mechanism from pre-prediction to post-evaluation of the target service is realized.
It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.
Based on the same inventive concept, the embodiment of the present application further provides a gain prediction apparatus for implementing the above-mentioned gain prediction method. The implementation scheme for solving the problem provided by the apparatus is similar to the implementation scheme described in the above method, so specific limitations in one or more embodiments of the gain prediction apparatus provided below may refer to the limitations of the gain prediction method in the foregoing, and details are not described herein again.
In one embodiment, as shown in fig. 5, there is provided a gain prediction apparatus 500, including: a determination module 502, an acquisition module 504, a calculation module 506, a screening module 508, a conversion module 510, and a training module 512, wherein:
the determining module 502 is configured to determine an experimental group and a control group from a preset object group, where the number of the objects in the experimental group is the same as that in the control group, and the initial feature data distribution is consistent, and the objects in the experimental group participate in the target service.
An obtaining module 504, configured to obtain initial feature data and a transformation result label of each object in the experimental group and the control group, where the initial feature data includes data in at least two feature dimensions.
A calculating module 506, configured to calculate a net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group.
And the screening module 508 is configured to screen a target feature dimension from the at least two feature dimensions based on the net information value corresponding to each feature dimension, and determine target feature data of each object in the experimental group and the control group under the target feature dimension.
The conversion module 510 is configured to adjust the conversion result label of each object in the experiment group and the control group into a gain label according to the conversion rule, where the gain label includes a first value, a second value, and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result label of the object in the experiment group is converted, the second value indicates that the conversion result label of the object in the control group is converted, and the third value indicates that the conversion result labels of the objects in the experiment group and the control group are both unconverted.
The training module 512 is configured to select objects of the first number from the experiment group and the comparison group as a training object group, train the gain prediction model to be trained through the target feature data and the gain label of each object in the training object group for the participating target business, and obtain a trained target gain prediction model, where the target gain prediction model is used to predict gains of the objects after participating in the target business.
In one embodiment, the calculating module 506 is further configured to calculate, for any feature interval of each feature dimension, the evidence weights of the experimental group and the control group in the current feature interval respectively, and calculate a difference between the two evidence weights; determining the covering quantity of the objects covered by the current characteristic interval, and determining the net evidence weight corresponding to the current characteristic interval according to the ratio of the covering quantity to the total quantity of the preset object group and the difference value between the two evidence weights; determining a first sample weight coefficient corresponding to the experimental group, a second sample weight coefficient corresponding to the comparison group, and calculating to obtain a net information value corresponding to the current characteristic interval according to the first sample weight coefficient, the second sample weight coefficient and the net evidence weight corresponding to the current characteristic interval; and for each characteristic dimension, calculating to obtain the net information value corresponding to each characteristic dimension based on the net information value corresponding to each characteristic interval under the current characteristic dimension.
In one embodiment, the calculating module 506 is further configured to determine a first ratio between the number of the responding objects in the current feature interval in the experimental group and the number of all the responding objects in the experimental group; determining a second ratio between the number of the unresponsive objects in the current characteristic interval in the experiment group and the number of all unresponsive objects in the experiment group; determining a first sample weight coefficient corresponding to the experimental group based on a difference between the first ratio and the second ratio; determining a third ratio between the number of the response objects in the current characteristic interval in the comparison group and the number of all the response objects in the comparison group; determining a fourth ratio between the number of the unresponsive objects in the current characteristic interval in the comparison group and the number of all unresponsive objects in the comparison group; and determining a second sample weight coefficient corresponding to the control group based on the difference between the third ratio and the fourth ratio.
In one embodiment, the gain prediction apparatus 500 further comprises an evaluation module, configured to determine a conversion index based on the conversion result label of each object in the experimental group and the control group, where the conversion index includes at least one of a conversion increase rate, a conversion quantity increase amount, a first resource increase amount, and a second resource increase amount; evaluating the real gain brought by the target service based on the conversion index; the conversion rate is the difference value between the conversion rate of the experimental group and the conversion rate of the control group, the conversion rate of the experimental group is the ratio of the number of conversion objects of the experimental group to the number of objects of the experimental group, and the conversion rate of the control group is the ratio of the number of conversion objects of the control group to the number of objects of the control group; the conversion quantity increase is the product of the conversion increase rate and the number of conversion objects of the experimental group; the first resource promotion amount is the sum of a first product and a second product, the first product is the product of the transformation quantity promotion amount and the first value of the experimental group, and the second product is the product of a first difference value and a second difference value, wherein the first difference value is the difference value between the transformation object quantity of the experimental group and the transformation quantity promotion amount, and the second difference value is the difference value between the first value of the experimental group and the first value of the control group; the second resource lifting amount is the sum of a first product and a third product, and the third product is the product of a first difference value, a first ratio and a third difference value, wherein the first ratio is the ratio of the first value of the experimental group to the second value of the control group, and the third difference value is the difference of the second value of the experimental group and the second value of the control group.
In one embodiment, the gain prediction apparatus 500 further comprises a testing module, configured to select a second number of objects from the experimental group and the control group, respectively, and determine a testing object group; acquiring target characteristic data and a conversion result label of each object in an experiment group of a test object group; inputting the target characteristic data of each object in the experimental group of the test object group into a target gain prediction model to obtain a gain prediction value corresponding to each object in the experimental group of the test object group; dividing the experimental groups of the test object group into a first set, a second set and a third set based on the magnitude of the gain prediction value corresponding to each object in the experimental groups of the test object group; the minimum gain prediction value in the first set is larger than the maximum gain prediction value in the second set, and the minimum gain prediction value in the second set is larger than the maximum gain prediction value in the third set; calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion index and the conversion result label of each object in the experimental group of the test object group; and evaluating the prediction accuracy of the target gain prediction model based on at least one actual gain result of the first actual gain result, the second actual gain result and the third actual gain result.
In one embodiment, the target conversion index includes at least one of a target conversion increase rate, a target conversion quantity increase amount, a target first resource increase amount, and a target second resource increase amount; the target conversion increasing rate is the difference value between the conversion rate of the test group and the conversion rate of the reference group, the conversion rate of the test group is the ratio of the number of conversion objects of the test group to the number of objects of the test group, and the conversion rate of the reference group is the ratio of the number of conversion objects of the reference group to the number of objects of the reference group; the target conversion quantity increase is the product of the target conversion increase rate and the conversion object quantity of the test group; the target first resource lifting amount is the sum of a target first product and a target second product, the target first product is the product of a target conversion quantity lifting amount and a first value of the test group, and the target second product is the product of a target first difference value and a target second difference value, wherein the target first difference value is the difference value between the conversion object quantity of the test group and the target conversion quantity lifting amount, and the target second difference value is the difference value between the first value of the test group and the first value of the reference group; the target second resource promoting amount is the sum of a target first product and a target third product, the target third product is the product of a target first difference value, a target first ratio value and a target third difference value, wherein the target first ratio value is the ratio of the first value of the test group to the second value of the test group, and the target third difference value is the difference of the second value of the test group and the second value of the reference group.
In one embodiment, the test module is further configured to calculate a first actual gain result corresponding to the target conversion index by using the first set as a test set and using an experimental set of the test object group as a reference set; taking the second set as a test set, taking an experiment set of the test object group as a reference set, and calculating a second actual gain result corresponding to the target conversion index; and taking the third set as a test set, taking the experiment set of the test object group as a reference set, and calculating a third actual gain result corresponding to the target conversion index.
In one embodiment, the test module is further configured to evaluate the prediction accuracy of the target gain prediction model based on at least one of the first actual gain result, the second actual gain result, and the third actual gain result, including: judging whether the first actual gain result, the second actual gain result and the third actual gain result are decreased in sequence; and if so, determining that the prediction accuracy of the target gain prediction model meets the preset requirement, and if not, training the target gain prediction model again.
In one embodiment, the test module is further configured to calculate a first actual gain result corresponding to the target conversion index by using the first set as a test group and the second set or the third set as a reference group; and taking the second set as a test group and the third set as a reference group, and calculating a second actual gain result corresponding to the target conversion index.
The various modules in the gain prediction apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer apparatus includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected by a system bus, and the communication interface, the display unit and the input device are connected by the input/output interface to the system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a gain prediction method. The display unit of the computer equipment is used for forming a visual and visible picture, and can be a display screen, a projection device or a virtual reality imaging device, the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the above-described method embodiments when executing the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.
It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), Magnetic Random Access Memory (MRAM), Ferroelectric Random Access Memory (FRAM), Phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases referred to in various embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application should be subject to the appended claims.

Claims (10)

1. A method of gain prediction, the method comprising:
determining an experimental group and a control group from a preset object group, wherein the number of the objects in the experimental group is the same as that in the control group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in a target service;
acquiring initial characteristic data and conversion result labels of each object in the experimental group and the control group, wherein the initial characteristic data comprises data under at least two characteristic dimensions;
calculating the net information value corresponding to each feature dimension according to the initial feature data of each object in the experimental group and the control group;
screening a target characteristic dimension from the at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension;
according to a conversion rule, adjusting the conversion result label of each object in the experimental group and the control group into a gain label, wherein the gain label comprises a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result label of the object in the experimental group is conversion, the second value indicates that the conversion result label of the object in the control group is conversion, and the third value indicates that the conversion result labels of the objects in the experimental group and the control group are both non-conversion;
and selecting objects with a first object number from the experiment group and the comparison group respectively as training object groups, training the gain prediction model to be trained through the target characteristic data and the gain label of each object in the training object groups for participating in target business, and obtaining the trained target gain prediction model, wherein the target gain prediction model is used for predicting the gain of the objects after participating in the target business.
2. The method of claim 1, wherein calculating the net information value for each feature dimension based on the initial feature data for each subject in the experimental group and the control group comprises:
for any characteristic interval of each characteristic dimension, calculating the evidence weights of the experimental group and the comparison group in the current characteristic interval respectively, and calculating the difference between the two evidence weights;
determining the coverage quantity of the objects covered by the current characteristic interval, and determining the net evidence weight corresponding to the current characteristic interval according to the ratio of the coverage quantity to the total quantity of the preset object groups and the difference value between the two evidence weights;
determining a first sample weight coefficient corresponding to the experimental group, a second sample weight coefficient corresponding to the comparison group, and calculating to obtain a net information value corresponding to the current characteristic interval according to the first sample weight coefficient, the second sample weight coefficient and a net evidence weight corresponding to the current characteristic interval;
and for each characteristic dimension, calculating to obtain the net information value corresponding to each characteristic dimension based on the net information value corresponding to each characteristic interval under the current characteristic dimension.
3. The method of claim 1, further comprising:
determining a conversion index based on the conversion result labels of each object in the experimental group and the control group, wherein the conversion index comprises at least one of conversion improvement rate, conversion quantity improvement amount, first resource improvement amount and second resource improvement amount;
evaluating a real gain brought by the target service based on the conversion index;
the conversion improvement rate is the difference value between the conversion rate of an experimental group and the conversion rate of a control group, the conversion rate of the experimental group is the ratio of the number of conversion objects of the experimental group to the number of objects of the experimental group, and the conversion rate of the control group is the ratio of the number of conversion objects of the control group to the number of objects of the control group;
the conversion quantity increase amount is the product of the conversion increase rate and the conversion object quantity of the experiment group;
the first resource increasing amount is the sum of a first product and a second product, the first product is the product of the converting quantity increasing amount and the first value of the experimental group, and the second product is the product of a first difference value and a second difference value, wherein the first difference value is the difference value between the converting object quantity of the experimental group and the converting quantity increasing amount, and the second difference value is the difference value between the first value of the experimental group and the first value of the control group;
the second resource increasing amount is the sum of a first product and a third product, the third product is the product of the first difference, a first ratio and a third difference, wherein the first ratio is the ratio of the first value of the experimental group to the second value of the control group, and the third difference is the difference between the second value of the experimental group and the second value of the control group.
4. The method of claim 1, further comprising:
selecting a second number of objects from the experimental group and the control group respectively, and determining a test object group; acquiring target characteristic data and a conversion result label of each object in an experiment group of a test object group;
inputting the target characteristic data of each object in the experimental group of the test object group into the target gain prediction model to obtain a gain prediction value corresponding to each object in the experimental group of the test object group;
dividing the experiment group of the test object group into a first set, a second set and a third set based on the size of the gain prediction value corresponding to each object in the experiment group of the test object group; the minimum gain prediction value in the first set is larger than the maximum gain prediction value in the second set, and the minimum gain prediction value in the second set is larger than the maximum gain prediction value in the third set;
calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on a target conversion index and a conversion result label for each object in an experimental group of a group of test objects;
estimating the prediction accuracy of the target gain prediction model based on at least one of the first actual gain result, the second actual gain result, and the third actual gain result.
5. The method of claim 4, wherein the target conversion index comprises at least one of a target conversion boost rate, a target conversion quantity boost amount, a target first resource boost amount, and a target second resource boost amount; the target conversion improvement rate is the difference value between the conversion rate of the test group and the conversion rate of the reference group, the conversion rate of the test group is the ratio of the number of conversion objects of the test group to the number of objects of the test group, and the conversion rate of the reference group is the ratio of the number of conversion objects of the reference group to the number of objects of the reference group;
the target conversion quantity increasing amount is the product of the target conversion increasing rate and the conversion object quantity of the test group;
the target first resource promoting amount is the sum of a target first product and a target second product, the target first product is the product of the target conversion quantity promoting amount and the first value of the test group, and the target second product is the product of a target first difference value and a target second difference value, wherein the target first difference value is the difference value between the conversion object quantity of the test group and the target conversion quantity promoting amount, and the target second difference value is the difference value between the first value of the test group and the first value of the reference group;
the target second resource lifting amount is the sum of a target first product and a target third product, the target third product is the product of the target first difference, the target first ratio and the target third difference, the target first ratio is the ratio of the first value of the test group to the second value of the test group, and the target third difference is the difference between the second value of the test group and the second value of the reference group.
6. The method of claim 5, wherein calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion index and the conversion result label of each subject in the experimental group of test subject groups comprises:
taking the first set as a test set, taking an experiment set of the test object group as a reference set, and calculating a first actual gain result corresponding to the target conversion index;
taking the second set as a test set, taking an experiment set of the test object group as a reference set, and calculating a second actual gain result corresponding to the target conversion index;
and taking the third set as a test set, taking an experiment set of the test object group as a reference set, and calculating a third actual gain result corresponding to the target conversion index.
7. The method of claim 6, wherein the evaluating the prediction accuracy of the target gain prediction model based on at least one of the first actual gain result, the second actual gain result, and the third actual gain result comprises:
judging whether the first actual gain result, the second actual gain result and the third actual gain result are decreased in sequence;
and if so, determining that the prediction accuracy of the target gain prediction model meets a preset requirement, and if not, performing retraining on the target gain prediction model.
8. The method of claim 5, wherein calculating at least one of a first actual gain result corresponding to the first set, a second actual gain result corresponding to the second set, and a third actual gain result corresponding to the third set based on the target conversion index and the conversion result label of each subject in the experimental group of test subject groups comprises:
taking the first set as a test group, taking the second set or the third set as a reference group, and calculating a first actual gain result corresponding to the target conversion index;
and taking the second set as a test group and the third set as a reference group, and calculating a second actual gain result corresponding to the target conversion index.
9. A gain prediction apparatus, the apparatus comprising:
the device comprises a determining module, a comparing module and a judging module, wherein the determining module is used for determining an experimental group and a comparison group from a preset object group, the number of objects in the experimental group is the same as that of the comparison group, the initial characteristic data distribution is consistent, and the objects in the experimental group participate in target business;
an obtaining module, configured to obtain initial feature data and a transformation result label of each object in the experimental group and the control group, where the initial feature data includes data in at least two feature dimensions;
the calculation module is used for calculating the net information value corresponding to each characteristic dimension according to the initial characteristic data of each object in the experimental group and the control group;
the screening module is used for screening a target characteristic dimension from the at least two characteristic dimensions based on the net information value corresponding to each characteristic dimension, and determining target characteristic data of each object in the experimental group and the control group under the target characteristic dimension;
the conversion module is used for adjusting the conversion result labels of each object in the experimental group and the comparison group into gain labels according to a conversion rule, wherein the gain labels comprise a first value, a second value and a third value, the second value and the first value are opposite numbers, the first value indicates that the conversion result labels of the objects in the experimental group are converted, the second value indicates that the conversion result labels of the objects in the comparison group are converted, and the third value indicates that the conversion result labels of the objects in the experimental group and the comparison group are not converted;
the training module is used for selecting objects with a first object number from the experiment group and the control group respectively and determining a training object group; the method comprises the steps of constructing a gain prediction model to be trained based on a tree model, wherein the gain prediction model is used for predicting gains of all objects in a training object group after the objects participate in target services, training the gain prediction model to be trained through target characteristic data and gain labels of all the objects in the training object group to obtain a trained target gain prediction model, and the target gain prediction model is used for predicting gains of the objects after the objects participate in the target services.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.
CN202210648120.7A 2022-06-09 2022-06-09 Gain prediction method and device and computer equipment Pending CN115049429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210648120.7A CN115049429A (en) 2022-06-09 2022-06-09 Gain prediction method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210648120.7A CN115049429A (en) 2022-06-09 2022-06-09 Gain prediction method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN115049429A true CN115049429A (en) 2022-09-13

Family

ID=83161118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210648120.7A Pending CN115049429A (en) 2022-06-09 2022-06-09 Gain prediction method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN115049429A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805253A (en) * 2023-08-18 2023-09-26 腾讯科技(深圳)有限公司 Intervention gain prediction method, device, storage medium and computer equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805253A (en) * 2023-08-18 2023-09-26 腾讯科技(深圳)有限公司 Intervention gain prediction method, device, storage medium and computer equipment
CN116805253B (en) * 2023-08-18 2023-11-24 腾讯科技(深圳)有限公司 Intervention gain prediction method, device, storage medium and computer equipment

Similar Documents

Publication Publication Date Title
US11238473B2 (en) Inferring consumer affinities based on shopping behaviors with unsupervised machine learning models
US11188935B2 (en) Analyzing consumer behavior based on location visitation
Pineda et al. An integrated MCDM model for improving airline operational and financial performance
US9589048B2 (en) Geolocation data analytics on multi-group populations of user computing devices
WO2021174944A1 (en) Message push method based on target activity, and related device
CN110622196B (en) Evaluating a model dependent on aggregated historical data
CN106251174A (en) Information recommendation method and device
CN107729519B (en) Multi-source multi-dimensional data-based evaluation method and device, and terminal
US20200234218A1 (en) Systems and methods for entity performance and risk scoring
CN110753920A (en) System and method for optimizing and simulating web page ranking and traffic
CN106485585A (en) Method and system for ranking
CN115049429A (en) Gain prediction method and device and computer equipment
CN113742069A (en) Capacity prediction method and device based on artificial intelligence and storage medium
WO2020150597A1 (en) Systems and methods for entity performance and risk scoring
CN116029760A (en) Message pushing method, device, computer equipment and storage medium
CN115345257A (en) Flight trajectory classification model training method, classification method, device and storage medium
WO2021129368A1 (en) Method and apparatus for determining client type
CN110689032A (en) Data processing method and system, computer system and computer readable storage medium
CN109472455B (en) Activity evaluation method, activity evaluation device, electronic equipment and storage medium
Şahinarslan et al. Machine learning algorithms to forecast population: Turkey example
CN107122125B (en) Data processing method and system
CN117522518A (en) Virtual product information pushing method, device, computer equipment and storage medium
CN115905552A (en) Intention matching method and device, computer equipment and storage medium
CN116881546A (en) Resource recommendation method, device, equipment and storage medium
CN114066502A (en) Target customer analysis method, system, equipment and computer readable medium based on AI big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination