WO2022237477A1 - Data calibration method and apparatus, and computer device and readable storage medium - Google Patents

Data calibration method and apparatus, and computer device and readable storage medium Download PDF

Info

Publication number
WO2022237477A1
WO2022237477A1 PCT/CN2022/087839 CN2022087839W WO2022237477A1 WO 2022237477 A1 WO2022237477 A1 WO 2022237477A1 CN 2022087839 W CN2022087839 W CN 2022087839W WO 2022237477 A1 WO2022237477 A1 WO 2022237477A1
Authority
WO
WIPO (PCT)
Prior art keywords
resource
factor
historical
conversion
historical business
Prior art date
Application number
PCT/CN2022/087839
Other languages
French (fr)
Chinese (zh)
Inventor
李少波
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2022237477A1 publication Critical patent/WO2022237477A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0206Price or cost determination based on market factors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0273Determination of fees for advertising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0277Online advertisement

Definitions

  • This application relates to the field of Internet technology, and in particular to data calibration technology in the field of advertising.
  • the advertising platform needs to consider the benefits of all aspects (including users, advertisers and advertising platforms), and then estimate the display cost of the advertisement to be placed (ECPM, Effective Cost per Mile), that is, the estimated advertising platform
  • ECPM Effective Cost per Mile
  • oCPA Optimized Cost per Action
  • targetCpa target Cost per Action
  • conversion refers to the completion of the advertisement
  • the optimization behavior expected by the master such as completing an application registration through advertisements, etc.
  • the ECPM estimation of oCPA needs to estimate the relationship between the exposure amount and the conversion number, and then determine the ECPM according to the taregtCpa combined with the estimated relationship.
  • the embodiments of the present application provide a data calibration method, device, computer equipment, and readable storage medium, which can reduce the deviation between the estimated value and the actual value of the advertising display fee, and improve the estimated accuracy.
  • an embodiment of the present application provides a data calibration method executed by a computer device, including:
  • N is a positive integer
  • Each historical business resource in the historical business resource set is clustered based on S combination types to obtain H historical business resource subsets; the resource attribute types corresponding to each combination type belong to N resource attribute types; a historical business resource
  • the historical resource attribute combinations of each historical business resource in the resource subset are the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer;
  • the aggregated statistical processing is performed separately to obtain the aggregated data set;
  • the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource sub-sets;
  • the conversion numbers are based on the corresponding historical business resource sub-sets
  • the conversion number of each historical business resource in the set is determined, and the joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset;
  • the effective conversion number and effective joint factor for the target business resource are obtained in the aggregated data set, and the calibration coefficient is determined according to the effective conversion number and effective joint factor;
  • the estimated joint factor of the target business resource is calibrated according to the calibration coefficient; the estimated joint factor is determined according to the estimated conversion rate and the industry factor of the target business resource.
  • An embodiment of the present application provides a data calibration device on the one hand, including:
  • An acquisition module configured to acquire resource attribute information of target business resources under N resource attribute types; N is a positive integer;
  • the division module is used to cluster the historical business resources in the historical business resource set based on S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to N resource attributes type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer;
  • the aggregation statistics module is used to perform aggregate statistics processing based on the H historical business resource sub-sets respectively to obtain an aggregated data set;
  • the aggregated data set includes the respective conversion numbers and joint factors corresponding to the H historical business resource sub-sets;
  • the conversion numbers are based on The conversion number of each historical business resource in the corresponding historical business resource subset is determined, and the joint factor is determined according to the corresponding estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset;
  • the effective data determination module is used to obtain the effective conversion number and effective joint factor for the target business resource in the aggregated data set according to the resource attribute information;
  • the calibration module is used to determine the calibration coefficient according to the effective conversion number and the effective joint factor, and to calibrate the estimated joint factor of the target business resource according to the calibration coefficient; the estimated joint factor is determined according to the estimated conversion rate and industry factor of the target business resource of.
  • An embodiment of the present application provides a computer device, including: a processor, a memory, and a network interface;
  • the above-mentioned processor is connected to the above-mentioned memory and the above-mentioned network interface, wherein the above-mentioned network interface is used to provide a data communication function, the above-mentioned memory is used to store a computer program, and the above-mentioned processor is used to call the above-mentioned computer program to execute the method in the embodiment of the present application .
  • An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is adapted to be loaded by a processor and execute the method in the embodiment of the present application.
  • Embodiments of the present application provide a computer program product or computer program on the one hand, the computer program product or computer program includes computer instructions, the computer instructions are stored in a computer-readable storage medium, and the processor of the computer device reads from the computer-readable storage The medium reads the computer instruction, and the processor executes the computer instruction, so that the computer device executes the method in the embodiment of the present application.
  • the embodiment of the present application can obtain the effective conversion number and the effective combination factor determined according to the historical business resources related to the target business resource, and then use the effective conversion number and the effective combination factor to calculate the target
  • the estimated joint factor of the service resource is calibrated, and then, according to the calibrated estimated joint factor, the estimated value of the display fee of the target service resource is adjusted.
  • the estimated joint factor is determined according to the estimated conversion rate of the target business resource and the industry factor.
  • the estimated conversion rate and the industry factor are the main factors for calculating the display fee of the target business resource.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 2a is a schematic diagram of a data calibration scenario provided by an embodiment of the present application.
  • Fig. 2b is a schematic diagram of a data calibration scenario provided by an embodiment of the present application.
  • Fig. 3 is a schematic flow chart of a data calibration method provided by an embodiment of the present application.
  • Fig. 4a is a schematic diagram of a scenario for dividing historical service resource sets provided by an embodiment of the present application.
  • Fig. 4b is a schematic diagram of a determination scenario of an effective conversion number and an effective combination factor provided by the embodiment of the present application;
  • Fig. 5 is a schematic flow chart of a data calibration method provided by an embodiment of the present application.
  • Fig. 6 is a schematic flow chart of an analysis method for determining the combination factor of display costs provided by the embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of a data calibration device provided by an embodiment of the present application.
  • Fig. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system architecture may include a service server 100 and a user terminal cluster.
  • the user terminal cluster may include: user terminal 200a, user terminal 200b, user terminal 200c, ..., user terminal 200n, and multiple user terminals in the user terminal cluster may exist
  • the communication connection for example, there is a communication connection between the user terminal 200a and the user terminal 200b, and there is a communication connection between the user terminal 200a and the user terminal 200c.
  • any user terminal in the user terminal cluster may have a communication connection with the service server 100 , for example, there is a communication connection between the user terminal 200 a and the service server 100 .
  • the above-mentioned communication connection is not limited to the connection method, and may be directly or indirectly connected by wired communication, wireless communication, or other methods, which are not limited in this application.
  • Each user terminal in the user terminal cluster can install a target application, and when the target application runs on each user terminal, it can perform data interaction with the service server 100 respectively.
  • the target application may include an application with the function of displaying data information such as text, image, audio or video (such as game application, video editing application, social application, instant messaging application, live broadcast application, short video application, video application, music application, etc. , shopping application, novel application, payment application, browser, etc.) in one or more applications.
  • the service server 100 may respond to the promotion request for the service resource, and send the service resource to one or more user terminals in the above-mentioned user terminal cluster.
  • the business resources may be resources used to disseminate product or service information to consumers or users, such as job recruitment advertisements, commodity sales advertisements, movie promotion advertisements, game recommendation advertisements, and the like.
  • the service server 100 After receiving the service resource sent by the service server 100, one or more user terminals can load and display the service resource through the target application, and then collect the user's operation behavior on the service resource through the user terminal, one or more user terminals
  • the terminal returns the display information and operation behavior information of the service resource to the service server 100 as service data information.
  • the display information may include information such as the display times and display duration of the service resource, and the operation behavior may include no operation behavior, early closing behavior, resource clicking behavior, and the like.
  • the service server 100 receives service data information returned by one or more user terminals, and records it in a log for the service resource.
  • the service server 100 may respond to promotion requests of multiple service resources, and send multiple service resources to one or more user terminals in the user terminal cluster at the same time period.
  • the service server 100 can send service resource A to user terminal 200a, user terminal 200b, and user terminal 200c in the user terminal cluster, and send service resource B to user terminal 200b and user terminal 200b in the user terminal cluster within one day. 200n.
  • the target applications installed in multiple user terminals receiving the same service resource may be different.
  • the target application in the user terminal 200a is a short video application X
  • the target application in the user terminal 200b is a social application Y
  • the target application in the user terminal 200c is a live application Z
  • the user terminal 200a, the user terminal 200b, and the user terminal 200c Both can receive the service resource A transmitted from the service server 100, and then display the service resource A through their respective target applications.
  • the short video application X, the social application Y and the live broadcast application Z belong to different types of promotion applications, which can be called different site collections.
  • the short video application X can be site collection 1
  • the social application Y can be site collection 2.
  • Live application Z can be site set 3.
  • the service server 100 when the service server 100 responds to the promotion request of the target service resource (the target service resource is the service resource sent to the user terminal above), it will predict the ECPM of the target service resource according to the conversion bid of the target service resource. , and then send the target service resource to the corresponding user terminal. There are often differences between the predicted ECPM and the actual ECPM. Therefore, in the initial promotion stage of the target service resource, the service server 100 will perform data calibration on the estimated joint factor of the ECPM that affects the target service resource, and then pass the calibrated forecast. The ECPM of the target business resource is adjusted based on the estimated conversion rate and joint factors, so that the adjusted ECPM is closer to the actual value.
  • the specific process of data calibration is as follows:
  • the business server 100 will obtain the resource attribute information of the target business resource under N resource attribute types, and then cluster each historical business resource in the historical business resource set based on the S combination types to obtain H historical business resource subsets , and perform aggregation statistical processing based on the H historical business resource sub-sets to obtain the aggregated data set, and then the business server 100 will obtain the effective conversion number and effective joint factor for the target business resource in the aggregated data set according to the resource attribute information, according to The effective conversion number and the effective joint factor determine the calibration coefficient, and finally the estimated joint factor of the target business resource is calibrated according to the calibration coefficient.
  • N is a positive integer.
  • the resource attribute types in each combination type belong to N resource attribute types; the historical resource attribute combinations of each historical business resource in a historical business resource subset are the same; a historical resource attribute combination corresponds to a resource attribute of a combination type Types are associated; S is a positive integer less than or equal to N; H is a positive integer.
  • the aggregation data set includes the conversion numbers and joint factors corresponding to the H historical business resource sub-sets respectively; Each historical business resource in the historical business resource subset is determined by the corresponding estimated conversion rate and industry factor. Wherein, the estimated joint factor is determined according to the estimated conversion rate of the target business resource and the industry factor.
  • the method provided in the embodiment of the present application can be executed by a computer device, and the computer device includes but is not limited to a terminal or a server.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication , middleware services, domain name services, security services, CDN, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
  • the above-mentioned devices may be nodes in a distributed system, wherein the distributed system may be a district A block chain system, the block chain system may be a distributed system formed by connecting multiple nodes through network communication.
  • the peer-to-peer (P2P, Peer To Peer) network that can be formed between nodes
  • the P2P protocol is an application layer protocol that runs on the Transmission Control Protocol (TCP, Transmission Control Protocol) protocol.
  • TCP Transmission Control Protocol
  • any form of computer equipment such as servers, user terminals and other electronic equipment, can become a node in the blockchain system by joining the peer-to-peer network.
  • POS Point Of Sales, point of sale
  • the calibration of the estimated combination factor of the target service resource W by the service server is used as an example for illustration.
  • FIG. 2a-FIG. 2b are schematic diagrams of a data calibration scenario provided by an embodiment of the present application.
  • the realization process of the data calibration scene can be carried out in the service server 100 shown in FIG. 1, and can also be carried out in the user terminal (the One or more) can also be executed by the user terminal cluster and the service server in cooperation, which is not limited here.
  • the embodiment of the present application takes the user terminal cluster and the service server 100 as an example for illustration.
  • the service server 100 sends the historical service resource A1, the historical service resource A2, and the historical service resource A3 in the historical service resource set 300 to the user terminal 200a, the user terminal 200b, the user terminal 200c, ..., the user respectively.
  • historical business resource A1 can be a lipstick sale advertisement of brand x that advertiser A wants to promote
  • historical business resource A2 can be a game application recommendation advertisement of brand y that advertiser B wants to promote
  • historical business resource A3 can be an advertisement The game application recommendation advertisement of the brand y that the main A wants to promote; in short, each historical business resource in the historical business resource collection can be the advertising resource of the product or service information that the advertiser wants to spread to consumers or users.
  • one historical service resource can be sent to different user terminals, and one user terminal can receive different historical service resources.
  • historical service resource A1 can be sent to user terminal 200a and user terminal 200b, and user terminal 200a can receive historical service resources.
  • each user terminal displays the received historical service resources through its installed target application.
  • the target application refers to the application that can display the advertisement resources sent by the service server 100, and may include one or more different types of applications.
  • the target application may include application B1, application B2, application B3, ... .
  • Application Bn wherein application B1 may be a live broadcast application, application B2 may be a social application, application B3 may be a short video application, and so on.
  • the target application in each user terminal is a carrier for displaying service resources.
  • the same application can be installed in different user terminals.
  • user terminal 200a can be installed with application B1 and application B2, and user terminal 200b can also be installed with application B1.
  • the service server 100 also receives service data information for each historical service resource returned by each user terminal.
  • the service data information may include display information of historical service resources and operation behavior information for the historical service resources.
  • user a has a binding relationship with user terminal 200a. After user terminal 200a receives historical service resource A1, it displays historical service resource A1 through application B1 that user a is using. User a browses historical service resource A1. , is very interested in the lipstick promoted by the historical business resource A1, clicks on the purchase link provided by the historical business resource A1, and jumps to the shopping interface.
  • the user terminal 200a will record the number and time of displaying the historical service resource A1 through the application B1, and also record operations such as clicking or closing the historical service resource A1 by the user a, and send it to the service server 100 together as service data information.
  • the service server 100 will summarize and count the service data information of the historical service resource sent by each user terminal in the user terminal cluster, and record it in the log. For example, the target application of the historical service resource A1 in 100 user terminals has been displayed once , then the exposure corresponding to the historical business resource A1 in the log is 100, and 50 users click the purchase link provided by the historical business resource A1, then the corresponding click volume of the historical business resource A1 in the log is 50, and the corresponding click rate is 50 %.
  • the historical business resource A1 has completed a conversion.
  • the conversion of historical business resources refers to the specific optimization goal selected by the advertiser in the process of delivering business resources.
  • the conversion of the above-mentioned historical business resources A2 may be the user completing the registration of the game application.
  • user terminals may not be able to collect data such as lipstick purchase data and game application registration data through the target application. Therefore, the number of conversions of historical business resources mostly depends on the feedback from the advertiser.
  • the business server 100 receives the conversion data. will also be written to the log.
  • the conversion rate corresponding to the historical business resource A1 is the percentage of the number of conversions to the number of clicks, that is, 20%.
  • the advertising platform completes the promotion of business resources for the advertiser through the service server 100, and naturally needs to charge the advertiser.
  • the advertiser will give a conversion bid for the business resource (targetCpa, target Cost per Action), that is, the cost that the business resource is expected to convert once paid, and the advertiser is willing to pay
  • the desired total cost for is conversions*targetCpa.
  • the service server 100 charges based on the exposure amount*ECPM. In order to ensure that the actual charge is the same as the advertiser's expected total charge, the value of ECPM should be (conversion number/exposure amount)*targetCpa.
  • the business server 100 can obtain the specific value of the exposure and targetCpa, but the conversion number depends on the feedback from the advertiser, and the business server 100 cannot determine the specific value of the conversion number before putting in business resources.
  • Different prediction models predict the estimated click rate, estimated conversion rate, industry factors and other factors of the business resource, and then get the estimated conversion by exposure * estimated click rate * estimated conversion rate * industry factor * other factors and further obtain the ECPM of the business expense.
  • the service server 100 will calibrate the estimated joint factor when deploying service resources.
  • the estimated joint factor is the product of the estimated conversion rate and the industry factor. Then calculate the new ECPM based on the calibrated estimated joint factor, and then charge the advertiser based on the new ECPM* exposure, so that the charge is closer to the total cost that the advertiser expects to pay for the business resource.
  • the service server 100 will obtain the resource attribute information associated with the target service resource and N resource attribute types, assuming that the N resource attribute types include advertisers and brands, as shown in Figure 2b, the service server 100 will obtain the target service resource C1 and the advertisement The resource attribute information associated with the owner and the brand, assuming that the advertiser of the target business resource C1 is B, which belongs to brand x.
  • the service server 100 divides the historical service resource set based on the S combination types to obtain H historical service resource subsets.
  • each combination type includes one or more resource attribute types, and the resource attribute types in each combination type belong to the above N resource attribute types, as shown in Figure 2b, it can be assumed that the S combination types include [advertiser ] and [Brand], it is understandable that one of the S combination types can also be [Advertiser, Brand], and the setting of the combination type can be set according to the actual situation, here only [Advertiser] and [Brand] two combination types will be explained.
  • the business server 100 will first divide the historical business resource set 300 according to the combination type of [advertiser].
  • the service server 100 can obtain a historical service resource subset 311 (including advertiser A's historical service resources) and a historical service resource subset 312 (including advertiser A's B’s historical business resource); similarly, based on the combination type of [brand], the historical business resource set 300 is divided, and the business server 100 can obtain the historical business resource subset 321 and the historical business resource subset 322, wherein the historical business resource Resource subset 321 includes historical business resource A1 belonging to brand x, and historical business resource subset 322 includes historical business resource A2 and historical business resource A3 belonging to brand y.
  • the service server 100 After dividing the historical service resource set 300 according to each combination type, the service server 100 performs aggregate statistical processing on each historical service resource subset to obtain an aggregated data set 400 .
  • the aggregated data set includes conversion numbers and joint factors corresponding to each historical business resource subset. As shown in Figure 2b, m1 and n1 are the conversion number and joint factor corresponding to the historical business resource subset 311 respectively, m2 and n2 are the conversion number and joint factor corresponding to the historical business resource subset 312 respectively, and m3 and n3 are the historical business resource subset 312 respectively.
  • the conversion number and combination factor corresponding to the business resource subset 321, m4 and n4 are respectively the conversion number and the combination factor corresponding to the historical business resource subset 322.
  • the conversion number m1 can be the sum of the corresponding conversion numbers of the historical business resource A1 and the historical business resource A2, and the joint factor n1 is based on the estimated conversion rate and industry factor of the historical business resource A1 and the estimated historical business resource A2
  • the conversion rate and industry factors are determined; among them, the estimated conversion rate refers to the estimated value of the probability of conversion after the historical business resource is clicked, and the industry factor is the adjusted ECPM that can be used by business resources belonging to a specific industry factor.
  • the conversion number m2, the conversion number m3 and the conversion number m4 can also be determined according to the conversion numbers corresponding to the historical business resources included in their respective corresponding historical business resource subsets, the joint factor n2, the joint factor n3 and the joint factor n4 It may also be determined according to the estimated conversion rates and industry factors corresponding to the historical business resources included in their respective corresponding historical business resource subsets.
  • the business server 100 will obtain the corresponding conversion number and joint factor in the aggregated data set 400 according to the resource attribute information.
  • the advertiser of the target business resource C1 is B.
  • the target business resource C1 corresponds to the historical business resource subset 312, and the business server 100 will obtain the conversion number m2 and the joint factor n2 corresponding to the historical business resource subset 312.
  • the business server will also determine that the target business resource C1 corresponds to the historical business resource subset 321, and then obtain the conversion number m3 and the joint factor n3 corresponding to the historical business resource subset 321, and then the business server 100 will calculate the conversion number m2 , joint factor n2, conversion number m3 and joint factor n3, determine the effective conversion number and effective joint factor, and then determine the calibration coefficient according to the effective conversion number and effective joint factor.
  • the service server 100 calibrates the estimated combination factor of the target business resource C1 according to the calibration coefficient, wherein the estimated combination factor is determined according to the estimated conversion rate and the industry factor of the target business resource C1.
  • the overall error caused by the estimated conversion rate of the target business resource C1 and industry factors can be reduced, so that a more accurate estimate can be obtained when calculating the ECPM, and the ECPM estimate can be improved. estimated accuracy.
  • FIG. 3 is a schematic flowchart of a data calibration method provided in an embodiment of the present application.
  • the method is executed by the computer equipment in FIG. 1, and the computer equipment may be the service server 100 in FIG. 1, or may be a user terminal in the user terminal cluster in FIG. 200c and user terminal 200n).
  • the data calibration method may include the following steps S101-S105.
  • Step S101 acquiring resource attribute information of the target service resource under N resource attribute types; N is a positive integer.
  • the target business resource may be an advertisement resource that the advertiser wants to disseminate product or service information to consumers or users, for example, an oCPA advertisement, and the presentation form of the oCPA advertisement may include text, pictures, videos, and so on.
  • the essence of oCPA advertising is to pay according to user behavior. Advertisers can select specific optimization goals (such as activation of mobile applications, order placement on shopping websites, etc.) , and timely and accurately return the advertising conversion data to the advertising platform, the computer equipment will use the predictive model to estimate the fee that should be charged for each exposure, that is, get the estimated value of ECPM, and finally deduct the fee according to the exposure and ECPM.
  • the computer device will also adjust the estimated joint factor of the oCPA according to the data information of other advertisements related to this oCPA advertisement , to further adjust the ECPM.
  • the consumption data is the total fee charged by the advertising platform to the advertiser, and the consumption data is equal to the product of the exposure and the ECPM;
  • the expected total fee is the fee that the advertiser expects to pay, and the expected total fee is equal to the product of the targetCpa and the number of conversions.
  • the estimated joint factor is the product of the estimated conversion rate and the industry factor
  • the ECPM of the oCPA advertisement can be obtained according to the estimated joint factor, the estimated click rate, other factors and targetCpa.
  • the estimated conversion rate is the predicted probability that the oCPA advertisement will be converted after being clicked, that is, the probability that the oCPA advertisement will achieve the advertiser's optimization goal after being clicked.
  • industry factors are usually designed for a specific industry or a specific group of people in a certain industry to enhance the effect of a specific industry or a specific group of people; PCVR compensating factor for e-commerce and strengthening factor for high-conversion crowd of direct e-commerce.
  • the estimated click-through rate is the predicted probability of the oCPA advertisement being clicked after being exposed.
  • other factors refer to other factors that can affect ECPM, such as billing ratio factors, price adjustment factors, risk control factors, and so on.
  • the N resource attribute types may include one or more of advertisers, product brands, products, site collections, new and old advertisements, and may also include other resource attribute types, such as same group, same region, same orientation, etc. , here only takes N resource attribute types as the above five resource attribute types as an example for illustration.
  • the computer device obtains the resource attribute information of the target business resource under N resource attribute types, which may be, obtains advertiser information, product brand information, product information, site collection information, old and new advertisement information of the target business resource.
  • N resource attribute types which may be, obtains advertiser information, product brand information, product information, site collection information, old and new advertisement information of the target business resource.
  • the advertiser of lipstick advertisement N0 is Xiaoming
  • the product brand is x
  • the product is lipstick
  • the site collection is 27, which is a new advertisement.
  • the strategy used to define the new and old advertisements can be that the advertisements that are only exposed today and have not been exposed before are new advertisements, and other advertisements are old advertisements; it can also be that the advertisements that have been exposed in the current 2 days are new advertisements, and other advertisements are old advertisements ; Old and new ads are defined without limitation.
  • Step S102 clustering the historical business resources in the historical business resource set based on the S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to the N resource attributes type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer.
  • the set of historical business resources includes several historical business resources, wherein the historical business resources may be other advertising resources that have been placed by the above-mentioned advertising platforms.
  • S combination types include combination type Mi
  • i is a positive integer less than or equal to S
  • the historical business resource set includes historical business resource Td
  • d is a positive integer less than or equal to the total number of historical business resources in the historical business resource set
  • the computer equipment clusters each historical business resource in the historical business resource set based on S combination types, and obtains the process of H historical business resource subsets, which can be: determine the resource attribute type contained in the combination type Mi is the target resource attribute type; the historical resource attribute information associated with the historical business resource Td in the historical business resource set and the target resource attribute type is determined as the historical resource attribute combination of the historical business resource Td; in the historical business resource set, the historical resource Each historical business resource with the same resource attribute combination is added to the same historical business resource sub-set to obtain one or more historical business resource sub-sets corresponding to the combination
  • the resource attribute types in the N resource attribute types can be divided into dimensional attribute types and granular attribute types, and different dimensional attribute types can be combined with all granular attribute types to obtain a combined type.
  • the resource attribute types in the N resource attribute types can be divided into dimensional attribute types and granular attribute types, and different dimensional attribute types can be combined with all granular attribute types to obtain a combined type.
  • advertisers, commodity brands, commodities, site collections, and new and old advertisements advertisers, commodity brands, and commodities are dimension attribute types selected from the perspectives of the ownership, audience, and content of advertising resources, and the attribution of advertising resources Refers to the advertiser who created the advertisement. This advertiser reflects the group to which the advertisement resource belongs. All the advertisement resources under the same advertiser belong to the same advertiser, so they are similar to a certain extent; Advertisement The audience of the resource refers to the target of the advertisement.
  • the commodity brand can be used to represent the audience of the advertisement resource.
  • the audience of the advertisement resource of the same commodity brand has certain similarities;
  • the content of the resource refers to the product promoted by the advertising resource, and this content embodies the essence of the advertising resource itself. Therefore, the historical business resource set in the historical business resource set can be divided from the three dimensions of the advertiser, the commodity brand, and the commodity respectively.
  • site sets, new and old advertisements can be used as granular attribute types, which can be used to further divide the historical business resource collection obtained through the division of dimension attribute types.
  • the combination types of advertiser, product brand, product, site collection, and new and old advertisements can be obtained as [advertiser, site collection, new and old advertisement], [product brand, site collection, new and old advertisement], [product, site collection, new and old advertisement] advertise].
  • FIG. 4a is a schematic diagram of a scenario of clustering historical service resource collections provided by an embodiment of the present application.
  • the historical business resource set 400 includes historical business resource N1, historical business resource N2, and historical business resource N3, wherein the advertiser of historical business resource N1 is Xiaojia, and the site set 27 is a new advertisement;
  • the advertiser of the business resource N2 is Xiao B, and the site set 29 is a new advertisement;
  • the advertiser of the historical business resource N3 is Xiao A, and the site set 27 is an old advertisement.
  • the computer device will use the three resource attribute types of advertiser, site collection, and new and old advertisements in [advertiser, site collection, new and old advertisement] as the target resource attribute type, and then associate historical business resources with the target resource attribute type
  • Historical resource attribute information is determined as the historical resource attribute combination of historical business resources, then the historical resource attribute combination of historical business resource N1 is [Xiao Jia, 27, new]; the historical resource attribute combination of historical business resource N2 is [Xiao B , 29, new]; the historical resource attribute combination of historical business resource N3 is [Xiaojia, 27, new].
  • the combination of historical resource attributes of historical business resource N1 and historical business resource N3 is the same, therefore, as shown in Figure 4a, through [advertisers, site collections, old and new advertisements], computer equipment can be divided into historical business resource N1 and historical business resource
  • the historical business resource subset 4001 of N3 includes the historical business resource subset 4002 of historical business resource N2. According to each combination type, the above-mentioned division process is performed on the historical service resource set, and a total of H historical service resource subsets can be obtained.
  • Step S103 performing aggregation statistical processing based on the H historical business resource subsets respectively to obtain an aggregated data set;
  • the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource subsets; the The number of conversions is determined based on the number of conversions of each historical business resource in the corresponding historical business resource subset, and the associated joint factor is based on the estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset definite.
  • a historical business resource corresponds to a conversion number, an estimated conversion rate, and an industry factor
  • the joint factor of the historical business resource can be determined according to the estimated conversion rate and the industry factor; Aggregate the number of conversions corresponding to historical business resources to obtain the total number of conversions corresponding to all historical business resources in the historical business resource subset; perform aggregate statistics on the joint factor corresponding to each historical business resource in a historical business resource subset , the total joint factor corresponding to all historical business resources in the historical business resource subset can be obtained; then the above-mentioned total conversion number and total joint factor are respectively used as the conversion number and joint factor corresponding to the historical business resource subset, and added to the aggregated data concentrated.
  • Step S104 according to the resource attribute information, obtain the effective conversion number and effective combination factor for the target service resource in the aggregated data set.
  • the computer device will extract S resource attribute combinations from the resource attribute information based on the S combination types, and then search the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregation data set.
  • S resource attribute combinations include resource attribute combination Za
  • a is a positive integer less than or equal to S
  • the process of finding the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregation data set can be: In the historical resource attribute combination corresponding to the historical business resource subset, find the historical business resource subset that is the same as the resource attribute combination Za as the matching subset; obtain the conversion number and the joint factor corresponding to the matching subset in the aggregated data set as The conversion number and joint factor corresponding to the resource attribute combination Za.
  • the above combination type as [advertiser, site collection, new and old advertisement] as an example, assuming that the resource attribute combination extracted from the resource attribute information of the target business resource based on this combination type is [Xiaojia, 27, new], the above is based on In the historical business resource subset 4001 and the historical business resource subset 4002 of [advertiser, site collection, new and old advertisement], the historical resource attribute combination corresponding to the historical business resource subset 4001 is the same as the resource attribute combination, so the computer device will The historical service resource subset 4001 is used as a matching subset, and then the conversion numbers and joint factors corresponding to the matching subset are obtained from the aggregated data set as the conversion numbers and joint factors corresponding to the resource attribute combination. Finally, the computer device will determine the effective conversion numbers and effective combination factors for the target business resource from the corresponding conversion numbers and combination factors of the S resource attribute combinations.
  • the computer device will determine the effective conversion number and effective combination factor for the target business resource among the corresponding conversion numbers and combination factors of the S resource attribute combinations, which can be: according to the priorities of the S combination types, Determine the conversion numbers corresponding to the S resource attribute combinations and the priority of the joint factor; according to the consumption data corresponding to the S resource attribute combinations, determine the conversion numbers and the effectiveness of the joint factor corresponding to the S resource attribute combinations; Among the conversion numbers and joint factors corresponding to each resource attribute combination, the effective conversion numbers and joint factors are determined as candidate conversion numbers and candidate joint factors, and the candidate conversion numbers and candidate joint factors with the highest priority are used as targets for the target business. The number of effective conversions and effective joint factors for the resource.
  • the priority of the combination type can be set according to the actual situation, for example, the priority can be set as [advertiser], [commodity brand], [commodity] from high to low.
  • the conversion number and the effectiveness of the joint factor corresponding to a resource attribute combination can be determined according to whether the total consumption data of the historical service resource subset corresponding to the resource attribute combination is greater than four times the targetCpa.
  • FIG. 4b is a schematic diagram of a determination scenario of an effective conversion number and an effective combination factor provided by an embodiment of the present application.
  • the historical business resource subset corresponding to the candidate conversion number m1 and the candidate joint factor n1 is the historical business resource subset Z1, and the combination type corresponding to the historical business resource subset Z1 is [advertiser];
  • the candidate conversion number m2 is the historical business resource subset Z2, and the combination type corresponding to the historical business resource subset Z2 is [commodity brand];
  • the subset is the historical business resource subset Z3, and the combination type corresponding to the historical business resource subset Z3 is [commodity].
  • the computer equipment determines the number of effective conversions and effective joint factors, it will first determine the historical business resource subset Z1 Whether the total consumption data of all historical business resources is greater than four times the total targetCpa of all historical business resources in the historical business resource subset Z1; Joint factor; if not, the computer device will determine whether the total consumption data of all historical business resources in the historical business resource subset Z2 is greater than four times the total targetCpa of all historical business resources in the historical business resource subset Z2, if yes, the computer device The candidate conversion number m2 and the candidate joint factor n2 will be used as the effective conversion number and the effective joint factor; if not, the computer device will determine whether the total consumption data of all historical business resources in the historical business resource subset Z3 is greater than four times the historical business resources The total targetCpa of all historical business resources in the subset Z3, if yes, the computer device will use the candidate conversion
  • Step S105 determine a calibration coefficient according to the effective conversion number and the effective combination factor, and calibrate the estimated combination factor of the target service resource according to the calibration coefficient; the estimated combination factor is based on the target service resource The resource's estimated conversion rate and industry factors are determined.
  • the calibration coefficient can be calculated by the following formula (1):
  • cali_rate is the calibration coefficient
  • Conv_valid is the effective conversion number
  • PCVRMulFactor_valid is the valid joint factor.
  • the calibration process can refer to the following formula (2):
  • New_PCVRMulFactor old_PCVRMulFactor*cali_rate formula (2)
  • New_PCVRMulFactor is the estimated joint factor after calibration
  • old_PCVRMulFactor is the estimated joint factor before calibration.
  • the estimated joint factor before calibration is determined based on the estimated conversion rate and industry factor of the target business resource.
  • the estimated conversion rate and industry factors of the target business resources can be predicted through corresponding prediction models.
  • the above calibration process can only be applied to the initial promotion stage of the target business resource.
  • the computer device After the computer device receives the calibration request for the target business resource, it will first determine the promotion stage of the target business resource; if the promotion stage of the target business resource In the initial promotion stage, the above-mentioned calibration process is executed in response to the calibration request of the target business resource.
  • the initial promotion stage can be defined as a promotion stage in which the number of conversions of target business resources is less than or equal to 2 or the consumption is less than or equal to 2 times the targetCpa.
  • the resource attribute information of the target business resource under N resource attribute types can be obtained, and then each historical business resource in the historical business resource set is clustered based on the S combination types to obtain H historical business resource sub-sets, and based on the H historical business resource sub-sets, perform aggregate statistical processing respectively to obtain an aggregated data set, and then the business server 100 will obtain the effective conversion data for the target business resource in the aggregated data set according to the resource attribute information and the effective joint factor, determine the calibration coefficient according to the effective conversion number and the effective joint factor, and finally calibrate the estimated joint factor of the target business resource according to the calibration coefficient.
  • N is a positive integer.
  • the resource attribute types corresponding to each combination type belong to N resource attribute types; the historical resource attribute combinations of each historical business resource in a historical business resource subset are the same, and a historical resource attribute combination corresponds to a resource of a combination type Attribute types are associated; S is a positive integer less than or equal to N; H is a positive integer.
  • the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource subsets; The estimated conversion rate and industry factor corresponding to each historical business resource in the historical business resource subset are determined. Wherein, the estimated joint factor is determined through the estimated conversion rate of the target business resource and the industry factor.
  • the effective conversion number and effective joint factor determined according to the historical business resources related to the target business resource can be obtained, and then the effective conversion number and effective joint factor can be used Calibrate the estimated joint factor of the target business resource, and adjust the estimated display cost of the target business resource according to the calibrated estimated joint factor, which can reduce the deviation between the estimated display cost and the actual value, Improve forecast accuracy.
  • FIG. 5 is a schematic flowchart of a data calibration method provided by an embodiment of the present application.
  • the method is executed by the computer equipment in FIG. 1, and the computer equipment can be the service server 100 in FIG. terminal 200c and user terminal 200n).
  • the data calibration method may include the following steps S201-S208.
  • Step S201 acquiring resource attribute information of the target service resource under N resource attribute types; N is a positive integer.
  • Step S202 clustering the historical business resources in the historical business resource set based on the S combination types, to obtain H historical business resource subsets; the H historical business resource subsets include the historical business resource subset Nj, j is a positive integer less than or equal to H.
  • step S201 and step S02 reference may be made to the description of step S101 and step S102 in the above embodiment corresponding to FIG. 3 , which will not be repeated here.
  • Step S203 determine the first unit conversion number and the first unit combination factor corresponding to the historical business resource subset Nj; the first unit conversion number is based on the historical business resources in the historical business resource subset Nj The number of conversions within a unit of time is determined, and the first unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the first unit of time .
  • the computer device will determine each historical business resource in the historical business resource subset Nj as the historical business resource to be counted, and obtain the log information of the historical business resource to be counted; The number of conversions, estimated conversion rate, and industry factor within a unit of time; multiply the estimated conversion rate of the historical business resources to be counted within the first unit of time and the industry factor to obtain the historical business resources to be counted in the first unit of time Furthermore, the summation processing is performed on the joint factors of the historical business resources to be counted within the first unit duration to obtain the first unit joint factor corresponding to the historical business resource subset Nj. The computer equipment sums the conversion numbers of the historical business resources to be counted within the first unit duration to obtain the first unit conversion numbers corresponding to the historical business resource subset Nj.
  • the duration of the first unit may be one minute, one hour, two hours, etc., which is not limited here.
  • the end time of the first unit duration is usually the current system time. For example, when the duration of the first unit is one hour, and the current system time is 9:00, then the number of conversions, estimated conversion rates, and industry factors of the historical business resources to be counted within the last hour refer to the period from 8:00 to During the time period of 9:00, the number of conversions, estimated conversion rates, and industry factors of historical business resources are to be counted.
  • Step S204 determine the second unit conversion number and the second unit combination factor corresponding to the historical business resource subset Nj; the second unit conversion number is based on the historical business resources in the historical business resource subset Nj The number of conversions within two units of time is determined, and the second unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the second unit of time ; The second unit duration is greater than the first unit duration.
  • the computer device will determine the second unit duration of the historical service resource subset Nj, wherein the second unit duration can be one hour, two hours, the whole day, etc. It should be noted that the second unit duration should be greater than the above The first unit duration. Then, the computer device divides the second unit duration based on the first unit duration to obtain at least two statistical periods; the duration of each statistical period is less than or equal to the first unit duration. For example, the first unit duration is one hour, and the second unit duration may be a whole day. Wherein, the whole day refers to the period between 0:00 today and the current system time. For example, the current system time is 7:00, and the second unit duration refers to the duration between 0:00 and 7:00 today.
  • the computer device divides the second unit time according to the first unit time, and can get 0:00-1:00, 1:00-2:00, 2:00-3:00, 3:00-4:00, 4 :00-5:00, 5:00-6:00, 6:00-7:00 six statistical periods, each statistical period is one hour long.
  • the computer device will obtain the conversion number, estimated conversion rate, and industry factor of the historical business resources to be counted within each statistical period from the above log information; for each statistical period, the historical business resources to be counted within the statistical period The estimated conversion rate and the industry factor are multiplied to generate the joint factor of the historical business resources to be counted in the statistical period; then, the joint factors of the historical business resources to be counted in the statistical period are summed , to obtain the statistical period joint factor of the historical service resource subset Nj within the statistical period. For each statistical period, the computer equipment sums the conversion numbers of the historical business resources to be counted within the statistical period, and obtains the conversion numbers of the historical business resource subset Nj in each statistical period.
  • the statistical period conversion number and statistical period joint factor of the historical business resource subset Nj in each statistical period are processed, and the second unit conversion number and the second unit conversion number corresponding to the historical business resource subset Nj are obtained. joint factor.
  • the above-mentioned at least two statistical periods include a statistical period Lk, and k is a positive integer less than or equal to the total number of the at least two statistical periods; the start time of the statistical period Lk is earlier than the statistical period Lk+1 , then according to the time attenuation strategy mentioned above, the statistical period conversion number and the statistical period joint factor of the historical business resource subset Nj in each statistical period are processed, and the second unit conversion number and the second unit conversion number corresponding to the historical business resource subset Nj are obtained.
  • the process of two joint factors can be: according to the time decay factor, the total quantity of at least two statistical periods, and the forward order of the start time of the statistical period Lk in at least two statistical periods, the statistics within the period Lk
  • the number of conversions in the statistical period and the joint factor of the statistical period are respectively attenuated to obtain the number of attenuation conversions and the joint factor of attenuation; the number of attenuation conversions in each statistical period is summed to obtain the second unit corresponding to the historical business resource subset Nj
  • the number of conversions; the attenuation joint factors in each statistical period are summed to obtain the second unit joint factor corresponding to the historical service resource subset Nj.
  • the above determination process can be referred to the following formulas (3) and (4):
  • Conv_advertiser_day refers to the second unit conversion number of the historical business resource subset Nj, that is, the total conversion number of the day; I is the current moment (hour); lambda is the time decay coefficient, which can be valued according to the actual situation, such as 0.05; Conv_advertiser_hour is The number of statistical period conversions within the statistical period Lk; PCVRMulFactor_advertiser_day refers to the second unit joint factor of the historical business resource subset Nj, that is, the total joint factor of the day, that is, the sum of the total (PCVR*industry factors); PCVRMutor_advertiser_hour k is the statistical period Statistical period joint factor within Lk.
  • Step S205 obtaining the first unit consumption data of the historical business resource subset Nj; according to the first unit consumption, the first unit conversion number, the first unit combination factor, the second unit conversion number and the second unit combination factor , to determine the conversion number and joint factor corresponding to the historical business resource subset Nj.
  • the first unit consumption data refers to the sum of the consumption data of each historical service resource in the historical service resource subset Nj within the first unit duration.
  • Conv_advertiser refers to the conversion number corresponding to the historical service resource subset Nj
  • PCVRMulFactor_advertiser refers to the joint factor corresponding to the historical service resource subset Nj.
  • the process for the computer device to determine whether the first unit consumption data belongs to the sufficient consumption data may be as follows: obtaining the conversion transaction value data, and determining the sufficient data threshold according to the conversion transaction value data; if the first unit consumption data is greater than the sufficient data threshold , it is determined that the first unit consumption data belongs to sufficient consumption data; if the first unit consumption data is less than or equal to the sufficient data threshold, it is determined that the first unit consumption data belongs to insufficient consumption data.
  • the conversion transaction value data is the above-mentioned conversion bid targetCpa.
  • the sufficient data threshold may be equal to 4 times targetCpa.
  • Step S206 when the conversion numbers and joint factors corresponding to the H historical business resource subsets are obtained, generate an aggregated data set including the respective conversion numbers and joint factors corresponding to the H historical business resource subsets.
  • Step S207 according to the resource attribute information, obtain the effective conversion number and effective combination factor for the target service resource in the aggregated data set.
  • Step S208 determine a calibration coefficient according to the effective conversion number and the effective combination factor, and calibrate the estimated combination factor of the target service resource according to the calibration coefficient; the estimated combination factor is based on the target service The resource's estimated conversion rate and industry factors are determined.
  • steps S207 to S208 reference may be made to steps S104 to S105 in the above embodiment corresponding to FIG. 3 , which will not be repeated here.
  • the first unit conversion number and the first unit corresponding to each historical business subset Nj can be obtained when determining the aggregated data set Joint factor, second unit conversion number and second unit joint factor, and then by judging whether the first unit consumption data is sufficient consumption data, from the first unit conversion number corresponding to each historical business resource subset and the first unit joint From the factor, the second unit conversion number and the second unit joint factor, select the conversion number and the joint factor of the historical business resource subset, and then obtain the aggregated data set.
  • the timeliness and effectiveness of the conversion numbers and joint factors in the aggregated data set can be improved, thereby improving the accuracy of the calibration coefficients and finally improving the prediction accuracy.
  • FIG. 6 is a schematic flowchart of an analysis method for determining a combination factor of display costs provided by an embodiment of the present application.
  • the method is executed by the computer equipment in FIG. 1, and the computer equipment can be the service server 100 in FIG. terminal 200c and user terminal 200n).
  • the data calibration method may include the following steps S301-S308.
  • Step S301 in the set of historical business resources, according to the division granularity of historical business resources, the historical business resources whose expected consumption data is smaller than the actual consumption data are regarded as resources to be processed, and the resources to be processed are added to the set of resources to be processed;
  • the resource set to be processed includes resource Sr to be processed, and r is a positive integer less than or equal to the total number of resources to be processed in the resource set to be processed.
  • the computer device will divide the historical business resources with the same granularity information in the historical business resource set together, where the granularity information is the information associated with the division granularity of the historical resources; and then obtain the expected Historical business resources whose consumption data is smaller than the actual consumption data are regarded as pending resources, and the pending resources are added to the pending resource collection.
  • the granularity information of the historical service resources included in each resource to be processed is the same.
  • the granularity of historical business resources can be resource granularity, account granularity, group granularity, etc.
  • Business resource O2 historical business resource O3 of group A, and historical business resource O4 of group A, then the historical business resources after division are ⁇ historical business resource O1, historical business resource O3, historical business resource O4 ⁇ and ⁇ historical business resource O2 ⁇ , if only the expected consumption data of the historical business resource O3 is greater than the actual consumption data, the final two resources to be processed, one resource to be processed includes the historical business resource O1 and the historical business resource O4, and the other resource to be processed includes the historical business resource Resource O2.
  • the expected consumption data refers to the expected total cost GMV (GMV, Guaranteed Minimum Value) mentioned in the embodiment corresponding to Figure 3 above, that is, the cost that the advertiser expects to pay. If the advertiser bids targetCpa according to the conversion, then according to the formula ( 7) GMV can be calculated:
  • GMV targetCpa*conversion number formula Formula (7)
  • the actual consumption data refers to the consumption (Cost) mentioned in the above-mentioned embodiment corresponding to Figure 3, and the consumption data can be calculated according to formula (8):
  • ECPM is the display fee.
  • Cost GMV, which can achieve mutual benefits between the advertising platform and the advertiser, that is, the advertising platform neither charges more money nor charges less money.
  • GMV ⁇ Cost often means that the expected consumption data is less than the actual consumption data. This situation is that the platform overcharges, resulting in over-costs, which is not good for advertisers. Therefore, when analyzing the joint factor of the display fee, the computer device can select historical service resources whose expected consumption data is smaller than the actual consumption data in the historical service data set.
  • Step S302 according to the price adjustment factor, risk control factor, estimated conversion rate, actual conversion rate, estimated cost ratio factor, actual billing ratio factor, estimated click rate, actual click rate and industry of the resource Sr to be processed factor to determine the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor of the resource Sr to be processed.
  • targetCpa is the conversion bid of the advertiser
  • all_factor is a comprehensive factor, including (price adjustment factor*risk control factor)*gsp_factor*industry_factor.
  • (price adjustment factor * risk control factor) can be regarded as a whole, that is, the regulation factor for targetCpa; gap_factor refers to the billing ratio factor, usually caused by gsp; industry_factor refers to industry factors, and different industries have different industry factors, such as Industry factors in direct e-commerce include e-commerce industry factors and crowd weighting factors; pcvr refers to estimated conversion rate; pctr refers to estimated click-through rate.
  • the computer device will first determine the estimated conversion rate, estimated cost ratio factor, and estimated click rate of the resource Sr to be processed; then obtain the actual price adjustment factor, actual risk control factor, and Actual conversion rate, actual billing ratio factor, actual click-through rate, and actual industry factor.
  • the estimated conversion rate, the estimated cost ratio factor, and the estimated click-through rate can be determined through respective corresponding prediction models, and the prediction model can be a deep learning model.
  • the computer device can determine the control value of the resource Sr to be processed according to the product of the actual price adjustment factor and the actual risk control factor; the computer device can determine the conversion rate ratio of the resource Sr to be processed according to the ratio of the estimated conversion rate to the actual conversion rate ; The computer device can determine the ratio of the billing ratio factor of the resource Sr to be processed according to the ratio of the expected billing ratio factor to the actual billing ratio factor; the computer device can determine the resource to be processed according to the ratio of the estimated click rate to the actual click rate The click rate ratio of Sr; the computer device can determine the industry factor value of the resource Sr to be processed according to the industry factor.
  • Step S303 acquiring a value range.
  • the ideal values of the above control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor are all 1, so the range including 1 can be divided to obtain the value range.
  • a feasible area division method is [0 0.5 0.7. 0.9 1.0 1.1 1.3 ⁇ ], where ⁇ means infinite.
  • the divided value range may include [0, 0.5), [0.5, 0.7), [0.7, 0.9), [0.9, 1.0), [1.0, 1.1), [1.1, 1.3), [1.3, ⁇ ).
  • Step S304 in the set of resources to be processed, obtain resources to be processed whose conversion number is greater than or equal to the first conversion threshold and whose conversion number is less than the second conversion threshold, as the first conversion resource; the second conversion threshold is greater than The first conversion threshold.
  • the first conversion threshold and the second conversion threshold corresponding to different historical business resource division granularities may be different, and in different industrial advertising scenarios, the first conversion threshold and the second conversion threshold may also be different.
  • Step S305 according to the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor and the value range corresponding to the first conversion resource, determine the first control analysis ratio, the first conversion rate analysis Ratio, First Billing Ratio Factor Analysis Ratio, First Click Rate Analysis Ratio, and First Industry Factor Analysis Ratio.
  • the computer device calculates the first conversion resource corresponding to the control value, conversion rate ratio, billing ratio factor ratio, Click-through rate ratio, industry factor and value interval, the process of determining the first control analysis ratio, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio and the first industry factor analysis ratio, It may be as follows: the computer device determines the first resource quantity of the resource to be processed whose corresponding control value in the first conversion resource belongs to the value interval Gb, and the ratio of the first resource quantity to the total quantity of the resource to be processed in the first conversion resource, As the first regulation and analysis ratio corresponding to the value interval Gb; the computer device determines that the conversion rate ratio corresponding to the low-conversion resource belongs to the second resource quantity of the resource to be processed in the value interval Gb, and the second resource quantity and the first conversion resource The ratio of the total quantity of resources to be
  • Step S306 in the set of resources to be processed, obtain the resources to be processed whose conversion number is greater than or equal to the second conversion threshold, as the second conversion resources.
  • Step S307 according to the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor and the value range corresponding to the second conversion resource, determine the second control analysis ratio and the second conversion rate analysis Ratio, Second Billing Ratio Factor Analysis Ratio, Second Click Rate Analysis Ratio, and Second Industry Factor Analysis Ratio.
  • step S306 and step S307 reference may be made to the above-mentioned step S304 and step S306, which will not be repeated here.
  • Step S308 analyzing the first control analysis ratio, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio, the first industry factor analysis ratio, the second regulation analysis ratio, the second Conversion rate analysis ratio, the second billing ratio factor analysis ratio, the second click-through rate analysis ratio and the second industry factor analysis ratio are analyzed and processed to determine the impact factor used to adjust the expected income of the display; the impact factor includes estimated conversion rate and an industry factor, and the estimated conversion rate and the industry factor are jointly used to generate the estimated joint factor.
  • step S301-step S308 the analysis of the advertisement set in the direct-operated e-commerce scenario at the resource granularity, account granularity, and group granularity is used as an example to illustrate.
  • the price adjustment factor * risk control factor can be taken as a whole. First, take out all the advertisements with GMV ⁇ Cost (that is, the above-mentioned resources to be processed), which are divided into the following two situations:
  • Advertisements with conversions greater than or equal to 1 and less than 6 i.e. the first conversion resource:
  • the average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0055626343375126.
  • Ads with conversions greater than or equal to 6 that is, the second conversion resource:
  • the average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0151019986530374.
  • Advertisements with conversions greater than or equal to 1 and less than 6 i.e. the first conversion resource:
  • Ads with conversions greater than or equal to 6 that is, the second conversion resource:
  • the billing ratio factor gsp_factor investigate the accuracy of gsp_factor, divide the estimated gsp_factor (the above-mentioned estimated cost ratio factor) by the actual statistical gsp_factor (the above-mentioned actual billing ratio factor) to obtain the ratio of the billing ratio factor, and determine How close the billing ratio factor is to 1. The closer the billing ratio factor ratio is to 1, the higher the accuracy of the estimated gsp_factor is.
  • Advertisements with conversions greater than or equal to 1 and less than 6 i.e. the first conversion resource:
  • the average ratio of billing ratio factor 1.005334324931539.
  • Ads with conversions greater than or equal to 6 that is, the second conversion resource:
  • the average ratio of the billing ratio factor 0.9984545757710465.
  • Advertisements with conversions greater than or equal to 1 and less than 6 i.e. the first conversion resource:
  • Ads with conversions greater than or equal to 6 that is, the second conversion resource:
  • Advertisements with conversions greater than or equal to 1 and less than 6 i.e. the first conversion resource:
  • Ads with conversions greater than or equal to 6 that is, the second conversion resource:
  • the proportion of [1.1, 1.3) and [1.3, ⁇ ) is relatively large, and the average value of the industry factor is obviously greater than 1, which will lead to explosive costs . According to the proportion and average value of other factors in the value range, it can be determined that their impact on ECPM is small, which can be understood as a higher prediction accuracy.
  • the price adjustment factor * risk control factor can be taken as a whole. First, take out all accounts with GMV ⁇ Cost (one or more advertisements corresponding to the same account, that is, the resource to be processed above), which can be divided into the following two cases:
  • the average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0270681225915883.
  • the average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0163434608061954.
  • About the billing ratio factor gsp_factor investigate the accuracy of gsp_factor, and compare the estimated gsp_factor (the above-mentioned estimated cost ratio factor) with the actual statistical gsp_factor (the above-mentioned actual billing ratio factor) to obtain the ratio of the billing ratio factor, Determines how close the billing ratio factor ratio is to 1. The closer the billing ratio factor ratio is to 1, the higher the accuracy of the estimated gsp_factor is.
  • the average ratio of billing ratio factor 1.0063416604925397.
  • the average ratio of billing ratio factor 1.0013801378217713.
  • the price adjustment factor * risk control factor can be taken as a whole. First, take out all groups with GMV ⁇ Cost (one or more advertisements corresponding to the same group, that is, the resource to be processed above), which can be divided into the following two situations:
  • a group whose conversion number is greater than or equal to 1 and less than 10 that is, the conversion number of an advertisement corresponding to a group is greater than or equal to 1 and less than 10.
  • the average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.034216183755146.
  • the average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0296622775279651.
  • About the billing ratio factor gsp_factor investigate the accuracy of gsp_factor, and compare the estimated gsp_factor (the above-mentioned estimated cost ratio factor) with the actual statistical gsp_factor (the above-mentioned actual billing ratio factor) to obtain the ratio of the billing ratio factor, Determines how close the billing ratio factor ratio is to 1. The closer the billing ratio factor ratio is to 1, the higher the accuracy of the estimated gsp_factor is.
  • the average ratio of billing ratio factor 1.0015587163937008.
  • the ratio of the billing ratio factor falls within the corresponding value range as shown in Table 26 below:
  • the main factor causing the low GMV/COST is the overestimation of pcvr, and the deviations of other factors are relatively small.
  • the factors that have a greater impact on ECPM in advertisements in a specific industry scenario can be analyzed from different granularities, and then the product of the analyzed factors can be used as a joint factor, through the above Figure 3
  • the method provided in the corresponding embodiment calibrates the joint factors, so as to obtain a more accurate ECPM value and improve the prediction accuracy of ECPM.
  • FIG. 7 is a schematic structural diagram of a data calibration device provided by an embodiment of the present application.
  • the above-mentioned data calibration device may be a computer program (including program code) running on a computer device, for example, the data calibration device is an application software; the device may be used to execute the corresponding steps in the method provided by the embodiment of the present application.
  • the data calibration device may include: an acquisition module 11 , a division module 12 , an aggregation statistics module 13 , a valid data determination module 14 and a calibration module 15 .
  • An acquisition module 11 configured to acquire resource attribute information of target business resources under N resource attribute types; N is a positive integer;
  • the division module 12 is used to cluster the historical business resources in the historical business resource set based on the S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to N resources Attribute type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is positive integer;
  • Aggregation statistical module 13 is used for carrying out aggregation statistical processing based on H historical business resource sub-collections respectively, obtains aggregated data set;
  • Aggregate data set includes H historical business resource sub-collections respectively corresponding conversion number and joint factor; This conversion number is It is determined according to the respective conversion numbers of each historical business resource in the corresponding historical business resource subset, and the joint factor is determined according to the corresponding estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset;
  • Effective data determination module 14 is used for according to resource attribute information, obtains the effective transformation number and the effective joint factor for target business resource in aggregate data set;
  • the calibration module 15 is used to determine the calibration coefficient according to the effective conversion number and the effective joint factor, and calibrate the estimated joint factor of the target business resource according to the calibration coefficient; the estimated joint factor is based on the estimated conversion rate and industry factor of the target business resource definite.
  • the specific function implementation of the acquisition module 11, the division module 12, the aggregation statistics module 13, the effective data determination module 14 and the calibration module 15 can refer to the specific description of steps S101-step S105 in the corresponding embodiment in FIG. to repeat.
  • the S combination types include the combination type Mi, i is a positive integer less than or equal to S;
  • the historical business resource set includes the historical business resource Td, and d is a positive integer less than or equal to the total number of historical business resources in the historical business resource set ;
  • the dividing module 12 may include: a combination determining unit 121 and a subset determining unit 122 .
  • a combination determining unit 121 configured to determine the resource attribute type contained in the combination type Mi as the target resource attribute type
  • the combination determining unit 121 is further configured to determine the historical resource attribute information associated with the historical business resource Td in the historical business resource set and the target resource attribute type as the historical resource attribute combination of the historical business resource Td;
  • the sub-set determining unit 122 is configured to add each historical business resource with the same historical resource attribute combination to the same historical business resource sub-set in the historical business resource set, and obtain one or more historical business resource sub-sets corresponding to the combination type Mi gather;
  • the subset determination unit 122 is further configured to use one or more historical service resource subsets corresponding to each combination type to form H historical service resource subsets.
  • step S102 For the specific function implementation manners of the combination determining unit 121 and the subset determining unit 122, reference may be made to the specific description of step S102 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
  • the H historical business resource subsets include the historical business resource subset Nj, j is a positive integer less than or equal to H;
  • the aggregation statistics module 13 may include: a first determination unit 131 , a second determination unit 132 , a consumption acquisition unit 133 , a sufficient determination unit 134 and a data set generation unit 135 .
  • the first determining unit 131 is configured to determine the first unit conversion number and the first unit combination factor corresponding to the historical business resource subset Nj; the first unit conversion number is based on the historical business resources in the historical business resource subset Nj in the first The number of conversions per unit time is determined, and the first unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the first unit time;
  • the second determination unit 132 is configured to determine the second unit conversion number and the second unit combination factor corresponding to the historical business resource subset Nj; the second unit conversion number is based on the historical business resources in the historical business resource subset Nj in the second The number of conversions per unit time is determined, and the second unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the second unit time; the second unit time is greater than the first a unit of time;
  • a consumption acquisition unit 133 configured to acquire the first unit consumption data of the historical service resource subset Nj;
  • the sufficient judgment unit 134 is configured to use the first unit conversion number as the conversion number corresponding to the historical business resource subset Nj if the first unit consumption data belongs to the sufficient consumption data, and use the first unit combination factor as the historical business resource subset Nj The corresponding joint factor;
  • the sufficient judging unit 134 is further configured to use the second unit conversion number as the conversion number corresponding to the historical business resource subset Nj if the first unit consumption data belongs to insufficient consumption data, and use the second unit joint factor as the historical business resource sub-set The joint factor corresponding to the set Nj;
  • the data set generation unit 135 is configured to generate an aggregated data set including the conversion numbers and the joint factors corresponding to the H historical business resource subsets respectively when the respective conversion numbers and joint factors corresponding to the H historical business resource subsets are obtained.
  • the specific function implementation of the first determination unit 131, the second determination unit 132, the consumption acquisition unit 133, the sufficient determination unit 134, and the data set generation unit 135 can refer to the specifics of steps S203-step S206 in the corresponding embodiment in FIG. 5 description, and will not be repeated here.
  • the first determination unit 131 may include: a first information acquisition subunit 1311 and a first data determination subunit 1312 .
  • the first information acquisition subunit 1311 is configured to determine each historical business resource in the historical business resource subset Nj as a historical business resource to be counted, and obtain log information of the historical business resource to be counted;
  • the first information obtaining subunit 1311 is further configured to obtain the conversion number, estimated conversion rate and industry factor of the historical business resource to be counted within the first unit duration from the log information;
  • the first data determination subunit 1312 is used to multiply the estimated conversion rate of the historical business resources to be counted within the first unit time length and the industry factor to obtain the joint factor of the historical business resources to be counted within the first unit time length; Summing the joint factors of the historical business resources to be counted within the first unit duration to obtain the first unit joint factor corresponding to the historical business resource subset Nj;
  • the first data determining subunit 1312 is configured to sum the conversion numbers of the historical business resources to be counted within the first unit duration to obtain the first unit conversion numbers corresponding to the historical business resource subset Nj.
  • the specific function implementation manners of the first information acquiring subunit 1311 and the first data determining subunit 1312 can refer to the specific description of step S203 in the embodiment corresponding to FIG. 5 , which will not be repeated here.
  • the second determination unit 132 may include: a second information acquisition subunit 1321 , a second data determination subunit 1322 , and an attenuation processing subunit 1323 .
  • the second information acquisition subunit 1321 is configured to determine each historical business resource in the historical business resource subset Nj as the historical business resource to be counted, and obtain the log information of the historical business resource to be counted;
  • the second information acquiring subunit 1321 is also used to determine the second unit duration
  • the second information acquisition subunit 1321 is also used to divide the second unit duration based on the first unit duration to obtain at least two statistical periods; the duration of each statistical period is less than or equal to the first unit duration;
  • the second information obtaining subunit 1321 is also used to obtain the conversion number, estimated conversion rate and industry factor of each historical business resource to be counted within each statistical period from the log information;
  • the second data determination subunit 1322 is used for multiplying the estimated conversion rate of the historical business resource to be counted within the statistical time period and the industry factor for each statistical period to generate the historical business resource to be counted in The joint factor in the statistical period; summing the joint factors of the historical business resources to be counted in the statistical period to obtain the statistical period joint factor of the historical business resource subset Nj in the statistical period;
  • the second data determination subunit 1322 is also used to sum the conversion numbers of the historical business resources to be counted within the statistical period for each statistical period, and obtain the historical business resource subset Nj within the statistical period Conversions for the statistics period.
  • the attenuation processing subunit 1323 is used to process the statistical period conversion numbers and statistical period joint factors of the historical business resource subset Nj in each statistical period according to the time decay strategy, and obtain the first historical service resource subset Nj corresponding to Two-unit transformation number and second joint factor.
  • step S204 for the implementation of specific functions of the second information acquisition subunit 1321 , the second data determination subunit 1322 and the attenuation processing subunit 1323 , please refer to the specific description of step S204 in the embodiment corresponding to FIG. 5 , which will not be repeated here.
  • At least two statistical periods include a statistical period Lk, k is a positive integer less than or equal to the total number of at least two statistical periods; the start time of the statistical period Lk is earlier than the statistical period Lk+1;
  • the attenuation processing subunit is specifically used to: according to the time attenuation factor, the total number of at least two statistical periods, and the forward sequence of the start time of the statistical period Lk in at least two statistical periods, the statistics within the statistical period Lk
  • the number of period conversions and the joint factor of the statistical period are respectively attenuated to obtain the number of attenuation conversions and the joint factor of attenuation; the number of attenuation conversions in each statistical period is summed to obtain the second unit conversion corresponding to the historical business resource subset Nj number; the attenuation joint factors in each statistical period are summed to obtain the second unit joint factor corresponding to the historical service resource subset Nj.
  • the first unit consumption data is the sum of the consumption data of each historical business resource in the historical business resource subset Nj within the first unit duration
  • the data calibration device 1 may further include: a consumption determination module 16 .
  • the consumption determination module 16 is used to obtain the conversion transaction value data, and determine the sufficient data threshold according to the conversion transaction value data;
  • the consumption determination module 16 is further configured to determine that the first unit consumption data belongs to sufficient consumption data if the first unit consumption data is greater than the sufficient data threshold;
  • the consumption determination module 16 is further configured to determine that the first unit consumption data belongs to insufficient consumption data if the first unit consumption data is less than or equal to the sufficient data threshold.
  • the specific function implementation of the consumption determination module 16 can refer to the specific description of step S205 in the embodiment corresponding to FIG. 5 , which will not be repeated here.
  • the valid data determination module 14 may include: a combination extraction unit 141 , a data search unit 142 and a valid determination unit 143 .
  • a combination extraction unit 141 configured to extract S resource attribute combinations from the resource attribute information based on the S combination types
  • a data search unit 142 configured to search the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregated data set;
  • the effective determination unit 143 is configured to determine the effective conversion number and the effective combination factor for the target service resource among the conversion numbers and the combination factors corresponding to the S resource attribute combinations.
  • step S104 For the implementation of specific functions of the combination extraction unit 141 , the data search unit 142 and the validity determination unit 143 , refer to the specific description of step S104 in the embodiment corresponding to FIG. 3 , and details are not repeated here.
  • the S resource attribute combinations include the resource attribute combination Za, and a is a positive integer less than or equal to S;
  • the data searching unit 142 may include: a matching determining subunit 1421 and a data obtaining subunit 1422 .
  • the matching determination subunit 1421 is used to search for the same historical service resource subset as the resource attribute combination Za in the historical resource attribute combinations corresponding to the H historical service resource subsets as the matching subset;
  • the data obtaining subunit 1422 is configured to obtain the conversion number and the combination factor corresponding to the matching subset in the aggregated data set as the conversion number and the combination factor corresponding to the resource attribute combination Za.
  • the specific function implementation manners of the matching determination subunit 1421 and the data acquisition subunit 1422 can refer to the specific description of step S104 in the embodiment corresponding to FIG. 3 , and details are not repeated here.
  • the validity determination unit 143 may include: a priority determination subunit 1431 , a validity determination subunit 1432 and a valid data determination subunit 1433 .
  • the priority determination subunit 1431 is configured to determine the priority of the conversion numbers and joint factors corresponding to each of the S resource attribute combinations according to the priorities of the S combination types;
  • the validity determining subunit 1432 is configured to determine the validity of the conversion numbers and joint factors corresponding to each of the S resource attribute combinations according to the consumption data corresponding to each of the S resource attribute combinations;
  • the effective data determination subunit 1433 is used to determine the effective conversion numbers and combination factors among the corresponding conversion numbers and combination factors of the S resource attribute combinations as candidate conversion numbers and candidate combination factors, and select the candidate conversion numbers with the highest priority
  • the number of conversions and the candidate joint factor are used as the effective conversion number and effective joint factor for the target business resource.
  • step S104 for the implementation of specific functions of the priority determination subunit 1431 , the validity determination subunit 1432 and the valid data determination subunit 1433 , please refer to the specific description of step S104 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
  • the data calibration device 1 may also include: a request receiving module 17 and a stage determining module 18.
  • a request receiving module 17, configured to receive a calibration request for a target service resource
  • Stage determination module 18 used to determine the promotion stage of target business resources
  • the stage determination module 18 is further configured to respond to the calibration request of the target service resource and execute the step of acquiring resource attribute information of the target service resource under N resource attribute types if the promotion stage of the target service resource is the initial promotion stage.
  • the specific function implementation manners of the request receiving module 17 and the stage determining module 18 can refer to the specific description of step S105 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
  • the data calibration device 1 may further include: an impact factor determination module 19 .
  • the impact factor determination module 19 is used to, in the set of historical business resources, according to the division granularity of historical business resources, use the historical business resources whose expected consumption data is smaller than the actual consumption data as resources to be processed, and add the resources to be processed to the set of resources to be processed ;
  • the resource set to be processed includes the resource Sr to be processed, and r is a positive integer less than or equal to the total number of resources to be processed in the resource set to be processed;
  • the impact factor determination module 19 is also used to adjust the price according to the resource Sr to be processed, the risk control factor, the estimated conversion rate, the actual conversion rate, the estimated cost ratio factor, the actual billing ratio factor, the estimated click rate, the actual Click rate and industry factor, determine the control value of the resource Sr to be processed, conversion rate ratio, billing ratio factor ratio, click rate ratio and industry factor;
  • the impact factor determination module 19 is also used to obtain the value range when the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor corresponding to each resource to be processed are determined;
  • the impact factor determination module 19 is also used to obtain resources to be processed whose conversion number is greater than or equal to the first conversion threshold and whose conversion number is less than the second conversion threshold in the set of resources to be processed, as the first conversion resource; the second conversion threshold is greater than first conversion threshold;
  • the impact factor determination module 19 is also used to determine the first control analysis ratio, the first conversion rate, and the first conversion rate according to the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor and value interval corresponding to the first conversion resource.
  • the impact factor determination module 19 is also used to acquire resources to be processed whose conversion number is greater than or equal to the second conversion threshold in the resource set to be processed, as the second converted resource;
  • the influence factor determination module 19 is also used to determine the second regulation analysis ratio, the second conversion rate analysis ratio, the second billing ratio factor analysis ratio, the second click rate analysis ratio and the second industry factor analysis ratio;
  • the impact factor determination module 19 is also used to analyze the ratio of the first regulation analysis, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio, the first industry factor analysis ratio, and the second regulation analysis Ratio, the second conversion rate analysis ratio, the second billing ratio factor analysis ratio, the second click-through rate analysis ratio and the second industry factor analysis ratio are analyzed and processed to determine the impact factor used to adjust the expected income of the display; the impact factors include forecast The estimated conversion rate and industry factors are used together to generate an estimated joint factor.
  • FIG. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the data calibration device 1 in the above-mentioned embodiment corresponding to Figure 7 can be applied to the above-mentioned computer equipment 1000, and the above-mentioned computer equipment 1000 can include: a processor 1001, a network interface 1004 and a memory 1005, in addition, the above-mentioned computer equipment 1000 also includes: a user interface 1003 , and at least one communication bus 1002 . Wherein, the communication bus 1002 is used to realize connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
  • the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 can be a high-speed RAM memory, or a non-volatile memory, such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 .
  • the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide a network communication function; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the data calibration method provided in the embodiment of this application.
  • the computer equipment 1000 described in the embodiment of the present application can execute the description of the data calibration method in the previous embodiments, and can also execute the description of the data calibration device 1 in the embodiment corresponding to FIG. 7 above. This will not be repeated here. In addition, the description of the beneficial effect of adopting the same method will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, and the above-mentioned computer-readable storage medium stores the computer program executed by the data calibration device 1 mentioned above.
  • the computer loads and executes the above-mentioned computer program, it can execute the description of the above-mentioned data calibration method in any of the above-mentioned embodiments, so details will not be repeated here.
  • the description of the beneficial effect of adopting the same method will not be repeated here.
  • the above-mentioned computer-readable storage medium may be the data calibration device provided in any of the foregoing embodiments or an internal storage unit of the above-mentioned computer equipment, such as a hard disk or memory of the computer equipment.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, Flash card (flash card), etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.

Abstract

A data calibration method and apparatus, and a computer device and a readable storage medium. The data calibration method comprises: acquiring resource attribute information of a target service resource under N resource attribute types; performing clustering on a historical service resource set on the basis of S combination types, so as to obtain H historical service resource subsets; performing aggregation statistics processing on the basis of the H historical service resource subsets, so as to obtain an aggregated data set; according to the resource attribute information, acquiring, from the aggregated data set, an effective conversion quantity and an effective joint factor for the target service resource; and determining a calibration coefficient according to the effective conversion quantity and the effective joint factor, and calibrating an estimated joint factor of the target service resource according to the calibration coefficient, wherein the estimated joint factor is determined according to an estimated conversion rate and an industry factor of the target service resource. By means of the method, a deviation between an estimated value and an actual value of advertisement display costs can be reduced, thereby improving the estimation accuracy.

Description

一种数据校准方法、装置、计算机设备以及可读存储介质A data calibration method, device, computer equipment and readable storage medium
本申请要求于2021年05月11日提交中国专利局、申请号为202110513300X、申请名称为“一种数据校准方法、装置、计算机设备以及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on May 11, 2021, with the application number 202110513300X and the application name "A data calibration method, device, computer equipment, and readable storage medium", all of which The contents are incorporated by reference in this application.
技术领域technical field
本申请涉及互联网技术领域,尤其涉及广告领域的数据校准技术。This application relates to the field of Internet technology, and in particular to data calibration technology in the field of advertising.
背景技术Background technique
随着互联网的兴起,网络广告成为各大广告平台以及广告主的主要赢利方式。广告平台在投放广告之前,需要考虑各方面(包括用户、广告主及广告平台)的受益情况,进而据此预估待投放广告的展示费用(ECPM,Effective Cost per Mile),即预估广告平台曝光一次待投放广告可以获得的广告费用。With the rise of the Internet, online advertising has become the main profit method for major advertising platforms and advertisers. Before advertising, the advertising platform needs to consider the benefits of all aspects (including users, advertisers and advertising platforms), and then estimate the display cost of the advertisement to be placed (ECPM, Effective Cost per Mile), that is, the estimated advertising platform The advertising fee that can be obtained by exposing an advertisement to be delivered.
目前,对于优化行为出价(oCPA,Optimized Cost per Action)广告,通常需要广告主给出转化出价(targetCpa,target Cost per Action),即期望该广告转化一次支付的费用,其中,转化是指达成广告主期望的优化行为,比如通过广告完成一次应用的注册等等。oCPA的ECPM预估,需要预估曝光量和转化数之间的关系,然后再根据taregtCpa结合所预估的关系来确定ECPM。不过通过现有技术得到的ECPM预估值与实际值之间经常存在一定偏差,预估准确率较低。At present, for optimized behavior bidding (oCPA, Optimized Cost per Action) advertisements, advertisers usually need to give a conversion bid (targetCpa, target Cost per Action), that is, the cost that is expected to be paid once the advertisement is converted, wherein conversion refers to the completion of the advertisement The optimization behavior expected by the master, such as completing an application registration through advertisements, etc. The ECPM estimation of oCPA needs to estimate the relationship between the exposure amount and the conversion number, and then determine the ECPM according to the taregtCpa combined with the estimated relationship. However, there is often a certain deviation between the estimated value of ECPM obtained by the existing technology and the actual value, and the estimation accuracy is low.
发明内容Contents of the invention
本申请实施例提供一种数据校准方法、装置、计算机设备以及可读存储介质,可以减小广告展示费用的预估值与实际值之间的偏差,提高预估准确率。The embodiments of the present application provide a data calibration method, device, computer equipment, and readable storage medium, which can reduce the deviation between the estimated value and the actual value of the advertising display fee, and improve the estimated accuracy.
本申请实施例一方面提供了一种数据校准方法,由计算机设备执行,包括:On the one hand, an embodiment of the present application provides a data calibration method executed by a computer device, including:
获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数;Obtain the resource attribute information of the target business resource under N resource attribute types; N is a positive integer;
基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;每个组合类型对应的资源属性类型均属于N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数;Each historical business resource in the historical business resource set is clustered based on S combination types to obtain H historical business resource subsets; the resource attribute types corresponding to each combination type belong to N resource attribute types; a historical business resource The historical resource attribute combinations of each historical business resource in the resource subset are the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer;
基于H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集;聚合数据集包括H个历史业务资源子集合各自对应的转化数和联合因子;该转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,该联合因子是根据对应的历史业务资源子集合中各历史业务资源各自 对应的预估转化率和行业因子确定的;Based on the H historical business resource sub-sets, the aggregated statistical processing is performed separately to obtain the aggregated data set; the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource sub-sets; the conversion numbers are based on the corresponding historical business resource sub-sets The conversion number of each historical business resource in the set is determined, and the joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset;
根据资源属性信息,在聚合数据集中获取针对目标业务资源的有效转化数和有效联合因子,根据有效转化数和有效联合因子确定校准系数;According to the resource attribute information, the effective conversion number and effective joint factor for the target business resource are obtained in the aggregated data set, and the calibration coefficient is determined according to the effective conversion number and effective joint factor;
根据校准系数对目标业务资源的预估联合因子进行校准;预估联合因子是根据目标业务资源的预估转化率和行业因子确定的。The estimated joint factor of the target business resource is calibrated according to the calibration coefficient; the estimated joint factor is determined according to the estimated conversion rate and the industry factor of the target business resource.
本申请实施例一方面提供了一种数据校准装置,包括:An embodiment of the present application provides a data calibration device on the one hand, including:
获取模块,用于获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数;An acquisition module, configured to acquire resource attribute information of target business resources under N resource attribute types; N is a positive integer;
划分模块,用于基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;每个组合类型对应的资源属性类型均属于N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数;The division module is used to cluster the historical business resources in the historical business resource set based on S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to N resource attributes type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer;
聚合统计模块,用于基于H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集;聚合数据集包括H个历史业务资源子集合各自对应的转化数和联合因子;该转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,该联合因子是根据对应的历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定的;The aggregation statistics module is used to perform aggregate statistics processing based on the H historical business resource sub-sets respectively to obtain an aggregated data set; the aggregated data set includes the respective conversion numbers and joint factors corresponding to the H historical business resource sub-sets; the conversion numbers are based on The conversion number of each historical business resource in the corresponding historical business resource subset is determined, and the joint factor is determined according to the corresponding estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset;
有效数据确定模块,用于根据资源属性信息,在聚合数据集中获取针对目标业务资源的有效转化数和有效联合因子;The effective data determination module is used to obtain the effective conversion number and effective joint factor for the target business resource in the aggregated data set according to the resource attribute information;
校准模块,用于根据有效转化数和有效联合因子确定校准系数,根据校准系数对目标业务资源的预估联合因子进行校准;预估联合因子是根据目标业务资源的预估转化率和行业因子确定的。The calibration module is used to determine the calibration coefficient according to the effective conversion number and the effective joint factor, and to calibrate the estimated joint factor of the target business resource according to the calibration coefficient; the estimated joint factor is determined according to the estimated conversion rate and industry factor of the target business resource of.
本申请实施例一方面提供了一种计算机设备,包括:处理器、存储器、网络接口;An embodiment of the present application provides a computer device, including: a processor, a memory, and a network interface;
上述处理器与上述存储器、上述网络接口相连,其中,上述网络接口用于提供数据通信功能,上述存储器用于存储计算机程序,上述处理器用于调用上述计算机程序,以执行本申请实施例中的方法。The above-mentioned processor is connected to the above-mentioned memory and the above-mentioned network interface, wherein the above-mentioned network interface is used to provide a data communication function, the above-mentioned memory is used to store a computer program, and the above-mentioned processor is used to call the above-mentioned computer program to execute the method in the embodiment of the present application .
本申请实施例一方面提供了一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,上述计算机程序适于由处理器加载并执行本申请实施例中的方法。An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is adapted to be loaded by a processor and execute the method in the embodiment of the present application.
本申请实施例一方面提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例中的方法。Embodiments of the present application provide a computer program product or computer program on the one hand, the computer program product or computer program includes computer instructions, the computer instructions are stored in a computer-readable storage medium, and the processor of the computer device reads from the computer-readable storage The medium reads the computer instruction, and the processor executes the computer instruction, so that the computer device executes the method in the embodiment of the present application.
本申请实施例可以根据目标业务资源的资源属性信息,获取根据与该目标业务资源相关的历史业务资源确定的有效转化数和有效联合因子,然后, 利用该有效转化数和有效联合因子,对目标业务资源的预估联合因子进行校准,进而,根据校准后的预估联合因子,调整目标业务资源的展示费用的预估值。其中,预估联合因子是根据目标业务资源的预估转化率和行业因子确定的。其中,预估转化率和行业因子均是计算目标业务资源的展示费用的主要因子。通过校准预估联合因子,可以降低由于预估转化率或行业因子不准确而给目标业务资源展示费用带来的整体计算误差,从而减小目标业务资源展示费用的预估值与实际值之间的偏差,提高预估准确率。According to the resource attribute information of the target business resource, the embodiment of the present application can obtain the effective conversion number and the effective combination factor determined according to the historical business resources related to the target business resource, and then use the effective conversion number and the effective combination factor to calculate the target The estimated joint factor of the service resource is calibrated, and then, according to the calibrated estimated joint factor, the estimated value of the display fee of the target service resource is adjusted. Wherein, the estimated joint factor is determined according to the estimated conversion rate of the target business resource and the industry factor. Among them, the estimated conversion rate and the industry factor are the main factors for calculating the display fee of the target business resource. By calibrating the estimated joint factor, the overall calculation error caused by the inaccurate estimated conversion rate or industry factor to the target business resource display cost can be reduced, thereby reducing the gap between the estimated value and the actual value of the target business resource display cost deviation to improve the prediction accuracy.
附图说明Description of drawings
图1是本申请实施例提供的一种系统架构示意图;FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application;
图2a是本申请实施例提供的一种数据校准的场景示意图;Fig. 2a is a schematic diagram of a data calibration scenario provided by an embodiment of the present application;
图2b是本申请实施例提供的一种数据校准的场景示意图;Fig. 2b is a schematic diagram of a data calibration scenario provided by an embodiment of the present application;
图3是本申请实施例提供的一种数据校准方法的流程示意图;Fig. 3 is a schematic flow chart of a data calibration method provided by an embodiment of the present application;
图4a是本申请实施例提供的一种对历史业务资源集合进行划分的场景示意图;Fig. 4a is a schematic diagram of a scenario for dividing historical service resource sets provided by an embodiment of the present application;
图4b是本申请实施例提供的一种有效转化数和有效联合因子的确定场景示意图;Fig. 4b is a schematic diagram of a determination scenario of an effective conversion number and an effective combination factor provided by the embodiment of the present application;
图5是本申请实施例提供的一种数据校准方法的流程示意图;Fig. 5 is a schematic flow chart of a data calibration method provided by an embodiment of the present application;
图6是本申请实施例提供的一种确定展示费用联合因子的分析方法的流程示意图;Fig. 6 is a schematic flow chart of an analysis method for determining the combination factor of display costs provided by the embodiment of the present application;
图7是本申请实施例提供的一种数据校准装置的结构示意图;Fig. 7 is a schematic structural diagram of a data calibration device provided by an embodiment of the present application;
图8是本申请实施例提供的一种计算机设备的结构示意图。Fig. 8 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of this application.
请参见图1,是本申请实施例提供的一种系统架构示意图。该系统架构可以包括业务服务器100以及用户终端集群,用户终端集群可以包括:用户终端200a、用户终端200b、用户终端200c、…、用户终端200n,用户终端集群中的多个用户终端之间可以存在通信连接,例如,用户终端200a与用户终端200b之间存在通信连接,用户终端200a与用户终端200c之间存在通信连接。同时,用户终端集群中的任一用户终端均可以与业务服务器100存在通信连接,例如,用户终端200a与业务服务器100之间存在通信连接。上述通信连接不限定连接方式,可以通过有线通信方式进行直接或间接地连接,也可以通过无线通信方式进行直接或间接地连接,还可以通过其它方式,本申请在此不做限制。Please refer to FIG. 1 , which is a schematic diagram of a system architecture provided by an embodiment of the present application. The system architecture may include a service server 100 and a user terminal cluster. The user terminal cluster may include: user terminal 200a, user terminal 200b, user terminal 200c, ..., user terminal 200n, and multiple user terminals in the user terminal cluster may exist The communication connection, for example, there is a communication connection between the user terminal 200a and the user terminal 200b, and there is a communication connection between the user terminal 200a and the user terminal 200c. Meanwhile, any user terminal in the user terminal cluster may have a communication connection with the service server 100 , for example, there is a communication connection between the user terminal 200 a and the service server 100 . The above-mentioned communication connection is not limited to the connection method, and may be directly or indirectly connected by wired communication, wireless communication, or other methods, which are not limited in this application.
用户终端集群中的每个用户终端均可以安装目标应用,当该目标应用在各用户终端上运行时,可以分别与业务服务器100进行数据交互。其中,该目标应用可以包括具有显示文字、图像、音频或者视频等数据信息功能的应用(如游戏应用、视频编辑应用、社交应用、即时通信应用、直播应用、短视频应用、视频应用、音乐应用、购物应用、小说应用、支付应用、浏览器等)中的一个或多个应用。业务服务器100可以响应针对业务资源的推广请求,将业务资源发送给上述用户终端集群中的一个或多个用户终端。其中,业务资源可以是用于向消费者或用户传播商品或服务信息的资源,比如岗位招聘广告、商品售卖广告、电影宣传广告、游戏推荐广告等等。一个或多个用户终端均可以在接收到业务服务器100发送的业务资源后,通过目标应用加载并显示该业务资源,然后采集用户通过用户终端针对该业务资源进行的操作行为,一个或多个用户终端会将该业务资源的显示信息和操作行为信息作为业务数据信息,返回给业务服务器100。其中,显示信息可以包括该业务资源的显示次数、显示时长等信息,操作行为可以包括无操作行为、提前关闭行为和资源点击行为等等。业务服务器100接收一个或多个用户终端返回的业务数据信息,并将其记录在针对该业务资源的日志中。Each user terminal in the user terminal cluster can install a target application, and when the target application runs on each user terminal, it can perform data interaction with the service server 100 respectively. Wherein, the target application may include an application with the function of displaying data information such as text, image, audio or video (such as game application, video editing application, social application, instant messaging application, live broadcast application, short video application, video application, music application, etc. , shopping application, novel application, payment application, browser, etc.) in one or more applications. The service server 100 may respond to the promotion request for the service resource, and send the service resource to one or more user terminals in the above-mentioned user terminal cluster. Wherein, the business resources may be resources used to disseminate product or service information to consumers or users, such as job recruitment advertisements, commodity sales advertisements, movie promotion advertisements, game recommendation advertisements, and the like. After receiving the service resource sent by the service server 100, one or more user terminals can load and display the service resource through the target application, and then collect the user's operation behavior on the service resource through the user terminal, one or more user terminals The terminal returns the display information and operation behavior information of the service resource to the service server 100 as service data information. Wherein, the display information may include information such as the display times and display duration of the service resource, and the operation behavior may include no operation behavior, early closing behavior, resource clicking behavior, and the like. The service server 100 receives service data information returned by one or more user terminals, and records it in a log for the service resource.
需要说明的是,业务服务器100可以响应多个业务资源的推广请求,在同一时间段将多个业务资源发送给用户终端集群中的一个或多个用户终端。比如,业务服务器100可以在一天内,将业务资源A发送给用户终端集群中的用户终端200a、用户终端200b和用户终端200c,将业务资源B发送给用户终端集群中的用户终端200b和用户终端200n,将业务资源C发送给用户终端集群中的用户终端200c。接收同一业务资源的多个用户终端中安装的目标应用可以不同。比如,用户终端200a中的目标应用为短视频应用X,用户终端200b中的目标应用为社交应用Y,用户终端200c中的目标应用为直播应用Z,用户终端200a、用户终端200b、用户终端200c均可以接收业务服务器100传来的业务资源A,然后通过各自的目标应用显示该业务资源A。其中,短视频应用X、社交应用Y和直播应用Z属于不同类型的推广应用,可以分别称为不同的站点集,比如短视频应用X可以为站点集1,社交应用Y可以为站点集2,直播应用Z可以为站点集3。It should be noted that the service server 100 may respond to promotion requests of multiple service resources, and send multiple service resources to one or more user terminals in the user terminal cluster at the same time period. For example, the service server 100 can send service resource A to user terminal 200a, user terminal 200b, and user terminal 200c in the user terminal cluster, and send service resource B to user terminal 200b and user terminal 200b in the user terminal cluster within one day. 200n. Send the service resource C to the user terminal 200c in the user terminal cluster. The target applications installed in multiple user terminals receiving the same service resource may be different. For example, the target application in the user terminal 200a is a short video application X, the target application in the user terminal 200b is a social application Y, the target application in the user terminal 200c is a live application Z, the user terminal 200a, the user terminal 200b, and the user terminal 200c Both can receive the service resource A transmitted from the service server 100, and then display the service resource A through their respective target applications. Among them, the short video application X, the social application Y and the live broadcast application Z belong to different types of promotion applications, which can be called different site collections. For example, the short video application X can be site collection 1, and the social application Y can be site collection 2. Live application Z can be site set 3.
相应的,业务服务器100在响应目标业务资源(目标业务资源即是上文中发送给用户终端的业务资源)的推广请求时,会根据该目标业务资源的转化出价,预测出该目标业务资源的ECPM,进而再将该目标业务资源发送给对应的用户终端。预测出的ECPM与实际的ECPM往往会存在差异,因此在目标业务资源的初始推广阶段,业务服务器100会对影响目标业务资源的ECPM的预估联合因子进行数据校准,进而再通过校准后的预估转化率和联合因子调整目标业务资源的ECPM,使得调整后的ECPM更接近实际值。数据校准的具体过程如下:Correspondingly, when the service server 100 responds to the promotion request of the target service resource (the target service resource is the service resource sent to the user terminal above), it will predict the ECPM of the target service resource according to the conversion bid of the target service resource. , and then send the target service resource to the corresponding user terminal. There are often differences between the predicted ECPM and the actual ECPM. Therefore, in the initial promotion stage of the target service resource, the service server 100 will perform data calibration on the estimated joint factor of the ECPM that affects the target service resource, and then pass the calibrated forecast. The ECPM of the target business resource is adjusted based on the estimated conversion rate and joint factors, so that the adjusted ECPM is closer to the actual value. The specific process of data calibration is as follows:
业务服务器100会获取目标业务资源在N个资源属性类型下的资源属性信息,然后基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合,并基于H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集,之后业务服务器100会根据资源属性信息,在聚合数据集中获取针对目标业务资源的有效转化数和有效联合因子,根据有效转化数和有效联合因子确定校准系数,最后根据校准系数对目标业务资源的预估联合因子进行校准。其中,N为正整数。其中,每个组合类型中的资源属性类型均属于N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同;一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数。其中,聚合数据集包括H个历史业务资源子集合分别对应的转化数和联合因子;转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,联合因子是根据该历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定的。其中,预估联合因子是根据目标业务资源的预估转化率和行业因子确定的。The business server 100 will obtain the resource attribute information of the target business resource under N resource attribute types, and then cluster each historical business resource in the historical business resource set based on the S combination types to obtain H historical business resource subsets , and perform aggregation statistical processing based on the H historical business resource sub-sets to obtain the aggregated data set, and then the business server 100 will obtain the effective conversion number and effective joint factor for the target business resource in the aggregated data set according to the resource attribute information, according to The effective conversion number and the effective joint factor determine the calibration coefficient, and finally the estimated joint factor of the target business resource is calibrated according to the calibration coefficient. Wherein, N is a positive integer. Among them, the resource attribute types in each combination type belong to N resource attribute types; the historical resource attribute combinations of each historical business resource in a historical business resource subset are the same; a historical resource attribute combination corresponds to a resource attribute of a combination type Types are associated; S is a positive integer less than or equal to N; H is a positive integer. Among them, the aggregation data set includes the conversion numbers and joint factors corresponding to the H historical business resource sub-sets respectively; Each historical business resource in the historical business resource subset is determined by the corresponding estimated conversion rate and industry factor. Wherein, the estimated joint factor is determined according to the estimated conversion rate of the target business resource and the industry factor.
可以理解的是,本申请实施例提供的方法可以由计算机设备执行,计算机设备包括但不限于终端或服务器。其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。It can be understood that the method provided in the embodiment of the present application can be executed by a computer device, and the computer device includes but is not limited to a terminal or a server. Among them, the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication , middleware services, domain name services, security services, CDN, and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms.
可以理解的是,上述设备(如上述业务服务器100、用户终端200a、用户终端200b、用户终端200c、…、用户终端200n)可以是分布式系统中的节点,其中,该分布式系统可以为区块链系统,该区块链系统可以是由该多个节点通过网络通信的形式连接形成的分布式系统。其中,节点之间可以组成的点对点(P2P,Peer To Peer)网络,P2P协议是一个运行在传输控制协议(TCP,Transmission Control Protocol)协议之上的应用层协议。在分布式系统中,任意形式的计算机设备,比如服务器、用户终端等电子设备都可以通过加入点对点网络,而成为该区块链系统中的一个节点。It can be understood that the above-mentioned devices (such as the above-mentioned service server 100, user terminal 200a, user terminal 200b, user terminal 200c, ..., user terminal 200n) may be nodes in a distributed system, wherein the distributed system may be a district A block chain system, the block chain system may be a distributed system formed by connecting multiple nodes through network communication. Among them, the peer-to-peer (P2P, Peer To Peer) network that can be formed between nodes, the P2P protocol is an application layer protocol that runs on the Transmission Control Protocol (TCP, Transmission Control Protocol) protocol. In a distributed system, any form of computer equipment, such as servers, user terminals and other electronic equipment, can become a node in the blockchain system by joining the peer-to-peer network.
其中,图1中的用户终端200a、用户终端200b、用户终端200c、…、用户终端200n可以包括手机、平板电脑、笔记本电脑、掌上电脑、智能音响、移动互联网设备(MID,mobile internet device)、POS(Point Of Sales,销售点)机、可穿戴设备(例如智能手表、智能手环等)等。Wherein, the user terminal 200a, the user terminal 200b, the user terminal 200c, ..., and the user terminal 200n in FIG. POS (Point Of Sales, point of sale) machines, wearable devices (such as smart watches, smart bracelets, etc.), etc.
为便于理解上述数据校准过程,下述以业务服务器对目标业务资源W的预估联合因子进行校准为例进行说明。In order to facilitate the understanding of the above data calibration process, the calibration of the estimated combination factor of the target service resource W by the service server is used as an example for illustration.
请一并参见图2a-图2b,是本申请实施例提供的一种数据校准的场景示意图。该数据校准场景的实现过程可以在图1所示的业务服务器100中进行, 也可以在用户终端(图1所示的用户终端200a、用户终端200b、用户终端200c、…、用户终端200n中的一个或多个)中进行,还可以由用户终端集群和业务服务器协同执行,此处不做限制,本申请实施例以用户终端集群和业务服务器100协同执行为例进行说明。Please refer to FIG. 2a-FIG. 2b together, which are schematic diagrams of a data calibration scenario provided by an embodiment of the present application. The realization process of the data calibration scene can be carried out in the service server 100 shown in FIG. 1, and can also be carried out in the user terminal (the One or more) can also be executed by the user terminal cluster and the service server in cooperation, which is not limited here. The embodiment of the present application takes the user terminal cluster and the service server 100 as an example for illustration.
如图2a所示,业务服务器100将历史业务资源集合300中的历史业务资源A1、历史业务资源A2、历史业务资源A3分别发送给了用户终端200a、用户终端200b、用户终端200c、…、用户终端200n中的一个或多个用户终端。其中,历史业务资源A1可以是广告主甲想要推广的品牌x的口红售卖广告,历史业务资源A2可以是广告主乙想要推广的品牌y的游戏应用推荐广告,历史业务资源A3可以是广告主甲想要推广的品牌y的游戏应用推荐广告;简言之,历史业务资源集合中的每个历史业务资源可以为广告主想向消费者或用户传播的商品或服务信息的广告资源。其中,一个历史业务资源可以被发送给不同的用户终端,一个用户终端可以接收不同的历史业务资源,比如,历史业务资源A1可以发送给用户终端200a、用户终端200b,用户终端200a可以接收历史业务资源A2、历史业务资源A3。然后,每个用户终端会通过其安装的目标应用,显示接收到的历史业务资源。其中,目标应用是指可以显示业务服务器100所发送的广告资源的应用,可以包括一个或多个不同类型的应用,如图2a所示,目标应用可以包括应用B1、应用B2、应用B3、…、应用Bn,其中,应用B1可以是直播应用,应用B2可以是社交应用,应用B3可以是短视频应用等等。每个用户终端中的目标应用是显示业务资源的载体,不同用户终端中可以安装同一个应用,比如,用户终端200a可以安装有应用B1和应用B2,用户终端200b也可以安装有应用B1。As shown in Figure 2a, the service server 100 sends the historical service resource A1, the historical service resource A2, and the historical service resource A3 in the historical service resource set 300 to the user terminal 200a, the user terminal 200b, the user terminal 200c, ..., the user respectively. One or more user terminals in terminal 200n. Among them, historical business resource A1 can be a lipstick sale advertisement of brand x that advertiser A wants to promote, historical business resource A2 can be a game application recommendation advertisement of brand y that advertiser B wants to promote, and historical business resource A3 can be an advertisement The game application recommendation advertisement of the brand y that the main A wants to promote; in short, each historical business resource in the historical business resource collection can be the advertising resource of the product or service information that the advertiser wants to spread to consumers or users. Wherein, one historical service resource can be sent to different user terminals, and one user terminal can receive different historical service resources. For example, historical service resource A1 can be sent to user terminal 200a and user terminal 200b, and user terminal 200a can receive historical service resources. Resource A2, historical business resource A3. Then, each user terminal displays the received historical service resources through its installed target application. Wherein, the target application refers to the application that can display the advertisement resources sent by the service server 100, and may include one or more different types of applications. As shown in FIG. 2a, the target application may include application B1, application B2, application B3, ... . Application Bn, wherein application B1 may be a live broadcast application, application B2 may be a social application, application B3 may be a short video application, and so on. The target application in each user terminal is a carrier for displaying service resources. The same application can be installed in different user terminals. For example, user terminal 200a can be installed with application B1 and application B2, and user terminal 200b can also be installed with application B1.
如图2a所示,业务服务器100还会接收每个用户终端返回的针对各个历史业务资源的业务数据信息。其中,业务数据信息可以包括历史业务资源的显示信息和针对该历史业务资源的操作行为信息。比如,与用户终端200a具有绑定关系的是用户a,用户终端200a接收到历史业务资源A1后,通过用户a正在使用的应用B1显示了该历史业务资源A1,用户a浏览历史业务资源A1后,对该历史业务资源A1推广的口红很感兴趣,便针对该历史业务资源A1提供的购买链接进行了点击操作,跳转到了购物界面。用户终端200a会记录历史业务资源A1通过应用B1进行展示的次数、时间,还会记录用户a针对该历史业务资源A1进行的点击或者关闭等操作,一起作为业务数据信息发送给业务服务器100。业务服务器100会汇总统计用户终端集群中各用户终端发送的历史业务资源的业务数据信息,并将其记录在日志中,比如,历史业务资源A1在100个用户终端中的目标应用均展示过一次,则日志中历史业务资源A1对应的曝光量为100,有50个用户点击了历史业务资源A1提供的购买链接,则日志中历史业务资源A1对应的点击量为50,对应的点击率为50%。相应的,如果用户a在购物界面购买了该口红,可以认为历史 业务资源A1完成了一次转化。历史业务资源的转化是指广告主在业务资源投放流程中选定的特定的优化目标,比如上述历史业务资源A2的转化可以是用户完成对游戏应用的注册。需要说明的是,像口红的购买数据、游戏应用的注册数据等数据,用户终端通过目标应用可能无法采集,因此历史业务资源的转化数大多依赖于广告主的回传,业务服务器100接收到转化数,也会写入日志中。假设广告主回传的转化数据是上述历史业务资源A1的转化数为10,则该历史业务资源A1对应的转化率为转化数占点击量的百分比,即20%。As shown in FIG. 2a, the service server 100 also receives service data information for each historical service resource returned by each user terminal. Wherein, the service data information may include display information of historical service resources and operation behavior information for the historical service resources. For example, user a has a binding relationship with user terminal 200a. After user terminal 200a receives historical service resource A1, it displays historical service resource A1 through application B1 that user a is using. User a browses historical service resource A1. , is very interested in the lipstick promoted by the historical business resource A1, clicks on the purchase link provided by the historical business resource A1, and jumps to the shopping interface. The user terminal 200a will record the number and time of displaying the historical service resource A1 through the application B1, and also record operations such as clicking or closing the historical service resource A1 by the user a, and send it to the service server 100 together as service data information. The service server 100 will summarize and count the service data information of the historical service resource sent by each user terminal in the user terminal cluster, and record it in the log. For example, the target application of the historical service resource A1 in 100 user terminals has been displayed once , then the exposure corresponding to the historical business resource A1 in the log is 100, and 50 users click the purchase link provided by the historical business resource A1, then the corresponding click volume of the historical business resource A1 in the log is 50, and the corresponding click rate is 50 %. Correspondingly, if user a purchases the lipstick on the shopping interface, it can be considered that the historical business resource A1 has completed a conversion. The conversion of historical business resources refers to the specific optimization goal selected by the advertiser in the process of delivering business resources. For example, the conversion of the above-mentioned historical business resources A2 may be the user completing the registration of the game application. It should be noted that user terminals may not be able to collect data such as lipstick purchase data and game application registration data through the target application. Therefore, the number of conversions of historical business resources mostly depends on the feedback from the advertiser. The business server 100 receives the conversion data. will also be written to the log. Assuming that the conversion data sent back by the advertiser is that the number of conversions of the above-mentioned historical business resource A1 is 10, then the conversion rate corresponding to the historical business resource A1 is the percentage of the number of conversions to the number of clicks, that is, 20%.
可以理解的是,广告平台通过业务服务器100替广告主完成业务资源的推广,自然需要对广告主进行收费。对于根据转化数来进行收费的业务资源,比如oCPA广告,广告主会给出业务资源的转化出价(targetCpa,target Cost per Action),即期望该业务资源转化一次支付的费用,则广告主愿意支付的期望总费用为转化数*targetCpa。但是业务服务器100是基于曝光量*ECPM收费的,为了保证实际收费与广告主的期望总费用相同,ECPM的取值应该为(转化数/曝光量)*targetCpa。其中,业务服务器100可以获取曝光量和targetCpa的具体值,但是转化数依赖于广告主的回传,业务服务器100在投放业务资源前,无法确定转化数的具体值,因此,业务服务器100会通过不同的预测模型预测该业务资源的预估点击率、预估转化率、行业因子和其他因子,然后通过曝光量*预估点击率*预估转化率*行业因子*其他因子来得到预估转化数,进一步得到该业务费用的ECPM。It can be understood that the advertising platform completes the promotion of business resources for the advertiser through the service server 100, and naturally needs to charge the advertiser. For business resources that are charged based on the number of conversions, such as oCPA advertisements, the advertiser will give a conversion bid for the business resource (targetCpa, target Cost per Action), that is, the cost that the business resource is expected to convert once paid, and the advertiser is willing to pay The desired total cost for is conversions*targetCpa. However, the service server 100 charges based on the exposure amount*ECPM. In order to ensure that the actual charge is the same as the advertiser's expected total charge, the value of ECPM should be (conversion number/exposure amount)*targetCpa. Among them, the business server 100 can obtain the specific value of the exposure and targetCpa, but the conversion number depends on the feedback from the advertiser, and the business server 100 cannot determine the specific value of the conversion number before putting in business resources. Different prediction models predict the estimated click rate, estimated conversion rate, industry factors and other factors of the business resource, and then get the estimated conversion by exposure * estimated click rate * estimated conversion rate * industry factor * other factors and further obtain the ECPM of the business expense.
在投放业务资源的过程中,通过上述方式根据预测出的各个因子计算得到的ECPM与实际期望的ECPM值之间可能会有偏差,因为预估转化率和行业因子的乘积与实际转化率和行业因子乘积之间的误差可能较大,因此业务服务器100在投放业务资源时,会对预估联合因子进行校准。其中,预估联合因子是预估转化率和行业因子的乘积。然后通过校准后的预估联合因子计算新的ECPM,之后基于新的ECPM*曝光量对广告主进行收费,使得收费更接近广告主期望为该业务资源支付的总费用。In the process of deploying business resources, there may be a deviation between the ECPM calculated based on the various factors predicted in the above method and the actual expected ECPM value, because the product of the estimated conversion rate and industry factors is different from the actual conversion rate and industry The error between factor products may be relatively large, so the service server 100 will calibrate the estimated joint factor when deploying service resources. Among them, the estimated joint factor is the product of the estimated conversion rate and the industry factor. Then calculate the new ECPM based on the calibrated estimated joint factor, and then charge the advertiser based on the new ECPM* exposure, so that the charge is closer to the total cost that the advertiser expects to pay for the business resource.
以业务服务器100通过上述历史业务资源集合中的历史业务资源,来对新投放的目标业务资源的预估联合因子进行校准为例进行说明。It will be described by taking the service server 100 as an example to calibrate the estimated joint factor of the newly launched target service resource by using the historical service resource in the above historical service resource set.
业务服务器100会获取目标业务资源与N个资源属性类型相关联的资源属性信息,假设N个资源属性类型包括广告主和品牌,如图2b所示,业务服务器100会获取目标业务资源C1与广告主和品牌相关联的资源属性信息,假设目标业务资源C1的广告主为乙,属于品牌x。业务服务器100会基于S个组合类型对历史业务资源集合进行划分,得到H个历史业务资源子集合。其中,每个组合类型包括一个或多个资源属性类型,且每个组合类型中的资源属性类型均属于上述N个资源属性类型,如图2b所示,可以假设S个组合类型包括[广告主]和[品牌],可以理解的是,S个组合类型中的其中一个组 合类型也可以为[广告主,品牌],组合类型的设定可以根据实际情况进行设置,这里仅以[广告主]和[品牌]两个组合类型进行说明。业务服务器100会先根据[广告主]这一组合类型对历史业务资源集合300进行划分,上述可知,历史业务资源A1和历史业务资源A3的广告主均为甲,历史业务资源A2的广告主为乙,因此,基于[广告主]这一组合类型,业务服务器100能得到历史业务资源子集合311(其中包括广告主为甲的历史业务资源)和历史业务资源子集合312(其中包括广告主为乙的历史业务资源);同理,基于[品牌]这一组合类型对历史业务资源集合300进行划分,业务服务器100能得到历史业务资源子集合321和历史业务资源子集合322,其中,历史业务资源子集合321包括属于品牌x的历史业务资源A1,历史业务资源子集合322包括属于品牌y的历史业务资源A2和历史业务资源A3。The service server 100 will obtain the resource attribute information associated with the target service resource and N resource attribute types, assuming that the N resource attribute types include advertisers and brands, as shown in Figure 2b, the service server 100 will obtain the target service resource C1 and the advertisement The resource attribute information associated with the owner and the brand, assuming that the advertiser of the target business resource C1 is B, which belongs to brand x. The service server 100 divides the historical service resource set based on the S combination types to obtain H historical service resource subsets. Wherein, each combination type includes one or more resource attribute types, and the resource attribute types in each combination type belong to the above N resource attribute types, as shown in Figure 2b, it can be assumed that the S combination types include [advertiser ] and [Brand], it is understandable that one of the S combination types can also be [Advertiser, Brand], and the setting of the combination type can be set according to the actual situation, here only [Advertiser] and [Brand] two combination types will be explained. The business server 100 will first divide the historical business resource set 300 according to the combination type of [advertiser]. It can be seen from the above that the advertiser of the historical business resource A1 and the historical business resource A3 is both A, and the advertiser of the historical business resource A2 is B, therefore, based on the combination type of [advertiser], the service server 100 can obtain a historical service resource subset 311 (including advertiser A's historical service resources) and a historical service resource subset 312 (including advertiser A's B’s historical business resource); similarly, based on the combination type of [brand], the historical business resource set 300 is divided, and the business server 100 can obtain the historical business resource subset 321 and the historical business resource subset 322, wherein the historical business resource Resource subset 321 includes historical business resource A1 belonging to brand x, and historical business resource subset 322 includes historical business resource A2 and historical business resource A3 belonging to brand y.
在根据每个组合类型对历史业务资源集合300进行划分以后,业务服务器100会对每个历史业务资源子集合进行聚合统计处理,得到聚合数据集400。聚合数据集中包括每个历史业务资源子集合对应的转化数和联合因子。如图2b所示,m1和n1分别是历史业务资源子集合311对应的转化数和联合因子,m2和n2分别是历史业务资源子集合312对应的转化数和联合因子,m3和n3分别是历史业务资源子集合321对应的转化数和联合因子,m4和n4分别是历史业务资源子集合322对应的转化数和联合因子。其中,转化数m1可以是历史业务资源A1和历史业务资源A2各自对应的转化数之和,联合因子n1是根据历史业务资源A1的预估转化率和行业因子、以及历史业务资源A2的预估转化率和行业因子确定的;其中,预估转化率是指历史业务资源被点击之后得到转化的概率的预估值,行业因子是针对属于某个特定的行业的业务资源可以使用的调整ECPM的因子。同理,转化数m2、转化数m3和转化数m4也可以根据其各自对应的历史业务资源子集合中包括的历史业务资源对应的转化数来确定,联合因子n2、联合因子n3和联合因子n4也可以根据其各自对应的历史业务资源子集合中包括的历史业务资源对应的预估转化率和行业因子来确定。After dividing the historical service resource set 300 according to each combination type, the service server 100 performs aggregate statistical processing on each historical service resource subset to obtain an aggregated data set 400 . The aggregated data set includes conversion numbers and joint factors corresponding to each historical business resource subset. As shown in Figure 2b, m1 and n1 are the conversion number and joint factor corresponding to the historical business resource subset 311 respectively, m2 and n2 are the conversion number and joint factor corresponding to the historical business resource subset 312 respectively, and m3 and n3 are the historical business resource subset 312 respectively. The conversion number and combination factor corresponding to the business resource subset 321, m4 and n4 are respectively the conversion number and the combination factor corresponding to the historical business resource subset 322. Among them, the conversion number m1 can be the sum of the corresponding conversion numbers of the historical business resource A1 and the historical business resource A2, and the joint factor n1 is based on the estimated conversion rate and industry factor of the historical business resource A1 and the estimated historical business resource A2 The conversion rate and industry factors are determined; among them, the estimated conversion rate refers to the estimated value of the probability of conversion after the historical business resource is clicked, and the industry factor is the adjusted ECPM that can be used by business resources belonging to a specific industry factor. Similarly, the conversion number m2, the conversion number m3 and the conversion number m4 can also be determined according to the conversion numbers corresponding to the historical business resources included in their respective corresponding historical business resource subsets, the joint factor n2, the joint factor n3 and the joint factor n4 It may also be determined according to the estimated conversion rates and industry factors corresponding to the historical business resources included in their respective corresponding historical business resource subsets.
然后,业务服务器100会根据资源属性信息,在聚合数据集400中获取对应的转化数和联合因子,例如,目标业务资源C1的广告主为乙,因此,在通过[广告主]这一组合类型得到的历史业务资源子集合中,目标业务资源C1对应于历史业务资源子集合312,业务服务器100会获取历史业务资源子集合312对应的转化数m2和联合因子n2,同样,因为目标业务资源C1的品牌为x,业务服务器还会确定目标业务资源C1对应于历史业务资源子集合321,进而获取历史业务资源子集合321对应的转化数m3和联合因子n3,然后业务服务器100会在转化数m2、联合因子n2、转化数m3和联合因子n3中,确定出有效转化数和有效联合因子,然后根据有效转化数和有效联合因子确定校准系数。最后,业务服务器100会根据该校准系数对目标业务资源C1 的预估联合因子进行校准,其中,预估联合因子是根据目标业务资源C1的预估转化率和行业因子确定的。通过对目标业务资源C1的预估联合因子校准,可以降低目标业务资源C1的预估转化率和行业因子带来的整体误差,从而在计算ECPM时得到更准确的预估值,提高ECPM的预估准确率。Then, the business server 100 will obtain the corresponding conversion number and joint factor in the aggregated data set 400 according to the resource attribute information. For example, the advertiser of the target business resource C1 is B. Among the obtained historical business resource subsets, the target business resource C1 corresponds to the historical business resource subset 312, and the business server 100 will obtain the conversion number m2 and the joint factor n2 corresponding to the historical business resource subset 312. Similarly, because the target business resource C1 brand is x, the business server will also determine that the target business resource C1 corresponds to the historical business resource subset 321, and then obtain the conversion number m3 and the joint factor n3 corresponding to the historical business resource subset 321, and then the business server 100 will calculate the conversion number m2 , joint factor n2, conversion number m3 and joint factor n3, determine the effective conversion number and effective joint factor, and then determine the calibration coefficient according to the effective conversion number and effective joint factor. Finally, the service server 100 calibrates the estimated combination factor of the target business resource C1 according to the calibration coefficient, wherein the estimated combination factor is determined according to the estimated conversion rate and the industry factor of the target business resource C1. By calibrating the estimated joint factor of the target business resource C1, the overall error caused by the estimated conversion rate of the target business resource C1 and industry factors can be reduced, so that a more accurate estimate can be obtained when calculating the ECPM, and the ECPM estimate can be improved. estimated accuracy.
请参见图3,图3是本申请实施例提供的一种数据校准方法的流程示意图。该方法由图1中的计算机设备执行,该计算机设备可以为图1中的业务服务器100,也可以为图1中用户终端集群中的用户终端(也包括用户终端200a、用户终端200b、用户终端200c以及用户终端200n)。如图3所示,该数据校准方法可以包括如下步骤S101-步骤S105。Please refer to FIG. 3 . FIG. 3 is a schematic flowchart of a data calibration method provided in an embodiment of the present application. The method is executed by the computer equipment in FIG. 1, and the computer equipment may be the service server 100 in FIG. 1, or may be a user terminal in the user terminal cluster in FIG. 200c and user terminal 200n). As shown in FIG. 3 , the data calibration method may include the following steps S101-S105.
步骤S101,获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数。Step S101, acquiring resource attribute information of the target service resource under N resource attribute types; N is a positive integer.
具体的,目标业务资源可以是广告主想向消费者或用户传播商品或服务信息的广告资源,比如,oCPA广告,oCPA广告的呈现形式可以包括文字、图片、视频等等。oCPA广告本质是按照用户行为付费,广告主可以在广告投放流程中选定特定的优化目标(例如移动应用的激活、购物网站的下单等),并提供愿意为此优化目标支付的转化出价targetCpa,并及时、准确回传广告转化数据到广告平台,计算机设备将借助预测模型,预估每一次曝光应该收取的费用,即得到ECPM预估值,最终按照曝光量和ECPM进行扣费。Specifically, the target business resource may be an advertisement resource that the advertiser wants to disseminate product or service information to consumers or users, for example, an oCPA advertisement, and the presentation form of the oCPA advertisement may include text, pictures, videos, and so on. The essence of oCPA advertising is to pay according to user behavior. Advertisers can select specific optimization goals (such as activation of mobile applications, order placement on shopping websites, etc.) , and timely and accurately return the advertising conversion data to the advertising platform, the computer equipment will use the predictive model to estimate the fee that should be charged for each exposure, that is, get the estimated value of ECPM, and finally deduct the fee according to the exposure and ECPM.
为了使消耗数据与广告主的期望总费用更为接近,保证广告主和广告平台双方的利益,计算机设备还会根据与此oCPA广告相关的其余广告的数据信息来调整该oCPA的预估联合因子,进一步实现对ECPM进行调整。其中,消耗数据是广告平台收取广告主的总费用,消耗数据等于曝光量和ECPM的乘积;期望总费用是广告主期望支付的费用,期望总费用等于targetCpa和转化数的乘积。其中,预估联合因子是预估转化率和行业因子的乘积,根据预估联合因子和预估点击率、其他因子以及targetCpa可以求得oCPA广告的ECPM。其中,预估转化率是预测的该oCPA广告被点击后得到转化的概率,即该oCPA广告被点击后实现广告主的优化目标的概率。其中,行业因子通常是针对某个特定的行业或者某一行业的特定人群设计的用于进行特定行业或者特定人群效果增强的因子;比如,在直营电商行业,行业因子主要包括直营电商PCVR弥补因子和直营电商高转化人群加强因子。其中,预估点击率是预测的该oCPA广告被曝光后得到点击的概率。其中,其他因子指其他一些能够影响ECPM的因子,比如计费比因子、调价因子、风控因子等等。In order to make the consumption data closer to the expected total cost of the advertiser and ensure the interests of both the advertiser and the advertising platform, the computer device will also adjust the estimated joint factor of the oCPA according to the data information of other advertisements related to this oCPA advertisement , to further adjust the ECPM. Among them, the consumption data is the total fee charged by the advertising platform to the advertiser, and the consumption data is equal to the product of the exposure and the ECPM; the expected total fee is the fee that the advertiser expects to pay, and the expected total fee is equal to the product of the targetCpa and the number of conversions. Among them, the estimated joint factor is the product of the estimated conversion rate and the industry factor, and the ECPM of the oCPA advertisement can be obtained according to the estimated joint factor, the estimated click rate, other factors and targetCpa. Wherein, the estimated conversion rate is the predicted probability that the oCPA advertisement will be converted after being clicked, that is, the probability that the oCPA advertisement will achieve the advertiser's optimization goal after being clicked. Among them, industry factors are usually designed for a specific industry or a specific group of people in a certain industry to enhance the effect of a specific industry or a specific group of people; PCVR compensating factor for e-commerce and strengthening factor for high-conversion crowd of direct e-commerce. Wherein, the estimated click-through rate is the predicted probability of the oCPA advertisement being clicked after being exposed. Among them, other factors refer to other factors that can affect ECPM, such as billing ratio factors, price adjustment factors, risk control factors, and so on.
具体的,N个资源属性类型可以包括广告主、商品品牌、商品、站点集、新旧广告中的一个或者多个,也可以包括其他的资源属性类型,比如同集团、同区域、同定向等等,这里仅以N个资源属性类型为上述五个资源属性类型为例进行说明。计算机设备获取目标业务资源在N个资源属性类型下的资源属性信息,可以是,获取目标业务资源的广告主信息、商品品牌信息、商品 信息、站点集信息、新旧广告信息。比如,口红广告N0的广告主为小明,商品品牌为x,商品为口红,站点集为27,是新广告。定义新旧广告的采用的策略,可以是只在今天曝光且之前没曝光过的广告是新广告,其它广告属于旧广告;也可以是当前2天曝光过的广告是新广告,其它广告属于旧广告;新旧广告的定义不作限制。Specifically, the N resource attribute types may include one or more of advertisers, product brands, products, site collections, new and old advertisements, and may also include other resource attribute types, such as same group, same region, same orientation, etc. , here only takes N resource attribute types as the above five resource attribute types as an example for illustration. The computer device obtains the resource attribute information of the target business resource under N resource attribute types, which may be, obtains advertiser information, product brand information, product information, site collection information, old and new advertisement information of the target business resource. For example, the advertiser of lipstick advertisement N0 is Xiaoming, the product brand is x, the product is lipstick, and the site collection is 27, which is a new advertisement. The strategy used to define the new and old advertisements can be that the advertisements that are only exposed today and have not been exposed before are new advertisements, and other advertisements are old advertisements; it can also be that the advertisements that have been exposed in the current 2 days are new advertisements, and other advertisements are old advertisements ; Old and new ads are defined without limitation.
步骤S102,基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;每个组合类型对应的资源属性类型均属于所述N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数。Step S102, clustering the historical business resources in the historical business resource set based on the S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to the N resource attributes type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer.
具体的,历史业务资源集合中包括若干个历史业务资源,其中,历史业务资源可以是上述广告平台投放过的其他广告资源。假设S个组合类型中包括组合类型Mi,i为小于或等于S的正整数,历史业务资源集合包括历史业务资源Td,d为小于或等于历史业务资源集合中历史业务资源的总数量的正整数,则计算机设备基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合的过程,可以为:将组合类型Mi中包含的资源属性类型确定为目标资源属性类型;将历史业务资源集合中历史业务资源Td与目标资源属性类型相关联的历史资源属性信息,确定为历史业务资源Td的历史资源属性组合;在历史业务资源集合中,将历史资源属性组合相同的各历史业务资源添加至同一个历史业务资源子集合,得到组合类型Mi对应的一个或多个历史业务资源子集合;利用每个组合类型对应的一个或多个历史业务资源子集合,组成H个历史业务资源子集合。Specifically, the set of historical business resources includes several historical business resources, wherein the historical business resources may be other advertising resources that have been placed by the above-mentioned advertising platforms. Assume that S combination types include combination type Mi, i is a positive integer less than or equal to S, the historical business resource set includes historical business resource Td, and d is a positive integer less than or equal to the total number of historical business resources in the historical business resource set , then the computer equipment clusters each historical business resource in the historical business resource set based on S combination types, and obtains the process of H historical business resource subsets, which can be: determine the resource attribute type contained in the combination type Mi is the target resource attribute type; the historical resource attribute information associated with the historical business resource Td in the historical business resource set and the target resource attribute type is determined as the historical resource attribute combination of the historical business resource Td; in the historical business resource set, the historical resource Each historical business resource with the same resource attribute combination is added to the same historical business resource sub-set to obtain one or more historical business resource sub-sets corresponding to the combination type Mi; use one or more historical business resource sub-sets corresponding to each combination type collections to form H sub-collections of historical business resources.
具体的,N个资源属性类型中的资源属性类型可以分为维度属性类型和粒度属性类型,不同的维度属性类型和所有的粒度属性类型结合,可以得到一个组合类型。比如说,在广告主、商品品牌、商品、站点集、新旧广告中,广告主、商品品牌、商品分别是从广告资源的归属、受众、内容角度中选出的维度属性类型,广告资源的归属指的是广告主,其创建了此广告,这个广告主体现了广告资源所属的团体,同一个广告主下的所有广告资源因为隶属于同一个广告主,所以在一定程度上具有相似性;广告资源的受众指的是广告面向的对象,在广告资源本身的各类资源属性类型中,商品品牌可以用来表征广告资源的受众,相同商品品牌的广告资源面向的人群具备一定的相似性;广告资源的内容指的是广告资源宣传的商品,这个内容体现了广告资源本身的精髓。因此,可以从广告主、商品品牌、商品三个维度,分别对历史业务资源集合中的历史业务资源集合进行划分。而资源属性类型中的站点集、新旧广告可以作为粒度属性类型,可用于对通过维度属性类型划分得到的历史业务资源集合进行进一步划分。因此,广告主、商品品牌、商品、站点集、新旧广告可以得到的组合类型为[广告主,站点集,新旧广告],[商品品牌, 站点集,新旧广告],[商品,站点集,新旧广告]。Specifically, the resource attribute types in the N resource attribute types can be divided into dimensional attribute types and granular attribute types, and different dimensional attribute types can be combined with all granular attribute types to obtain a combined type. For example, among advertisers, commodity brands, commodities, site collections, and new and old advertisements, advertisers, commodity brands, and commodities are dimension attribute types selected from the perspectives of the ownership, audience, and content of advertising resources, and the attribution of advertising resources Refers to the advertiser who created the advertisement. This advertiser reflects the group to which the advertisement resource belongs. All the advertisement resources under the same advertiser belong to the same advertiser, so they are similar to a certain extent; Advertisement The audience of the resource refers to the target of the advertisement. Among the various resource attribute types of the advertisement resource itself, the commodity brand can be used to represent the audience of the advertisement resource. The audience of the advertisement resource of the same commodity brand has certain similarities; The content of the resource refers to the product promoted by the advertising resource, and this content embodies the essence of the advertising resource itself. Therefore, the historical business resource set in the historical business resource set can be divided from the three dimensions of the advertiser, the commodity brand, and the commodity respectively. In the resource attribute type, site sets, new and old advertisements can be used as granular attribute types, which can be used to further divide the historical business resource collection obtained through the division of dimension attribute types. Therefore, the combination types of advertiser, product brand, product, site collection, and new and old advertisements can be obtained as [advertiser, site collection, new and old advertisement], [product brand, site collection, new and old advertisement], [product, site collection, new and old advertisement] advertise].
具体的,以[广告主,站点集,新旧广告]为例对上述根据组合类型Mi对历史业务资源集合进行聚类,得到历史业务资源子集合为例进行说明。请一并参见图4a,图4a是本申请实施例提供的一种对历史业务资源集合进行聚类的场景示意图。如图4a所示,历史业务资源集合400中包括历史业务资源N1、历史业务资源N2、历史业务资源N3,其中,历史业务资源N1的广告主为小甲,站点集27,是新广告;历史业务资源N2的广告主为小乙,站点集29,是新广告;历史业务资源N3的广告主为小甲,站点集27,是旧广告。计算机设备会将[广告主,站点集,新旧广告]中的广告主、站点集和新旧广告三个资源属性类型作为目标资源属性类型,然后,会将历史业务资源与目标资源属性类型相关联的历史资源属性信息,确定为历史业务资源的历史资源属性组合,则,历史业务资源N1的历史资源属性组合为[小甲,27,新];历史业务资源N2的历史资源属性组合为[小乙,29,新];历史业务资源N3的历史资源属性组合为[小甲,27,新]。历史业务资源N1和历史业务资源N3的历史资源属性组合相同,因此,如图4a所示,通过[广告主,站点集,新旧广告],计算机设备能划分得到包括历史业务资源N1和历史业务资源N3的历史业务资源子集合4001、包括历史业务资源N2的历史业务资源子集合4002。根据每个组合类型分别对历史业务资源集合进行上述划分处理,一共能得到H个历史业务资源子集合。Specifically, taking [advertiser, site collection, new and old advertisement] as an example, the above-mentioned clustering of the historical service resource set according to the combination type Mi to obtain the historical service resource subset is described as an example. Please also refer to FIG. 4a. FIG. 4a is a schematic diagram of a scenario of clustering historical service resource collections provided by an embodiment of the present application. As shown in Figure 4a, the historical business resource set 400 includes historical business resource N1, historical business resource N2, and historical business resource N3, wherein the advertiser of historical business resource N1 is Xiaojia, and the site set 27 is a new advertisement; The advertiser of the business resource N2 is Xiao B, and the site set 29 is a new advertisement; the advertiser of the historical business resource N3 is Xiao A, and the site set 27 is an old advertisement. The computer device will use the three resource attribute types of advertiser, site collection, and new and old advertisements in [advertiser, site collection, new and old advertisement] as the target resource attribute type, and then associate historical business resources with the target resource attribute type Historical resource attribute information is determined as the historical resource attribute combination of historical business resources, then the historical resource attribute combination of historical business resource N1 is [Xiao Jia, 27, new]; the historical resource attribute combination of historical business resource N2 is [Xiao B , 29, new]; the historical resource attribute combination of historical business resource N3 is [Xiaojia, 27, new]. The combination of historical resource attributes of historical business resource N1 and historical business resource N3 is the same, therefore, as shown in Figure 4a, through [advertisers, site collections, old and new advertisements], computer equipment can be divided into historical business resource N1 and historical business resource The historical business resource subset 4001 of N3 includes the historical business resource subset 4002 of historical business resource N2. According to each combination type, the above-mentioned division process is performed on the historical service resource set, and a total of H historical service resource subsets can be obtained.
步骤S103,基于所述H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集;所述聚合数据集包括所述H个历史业务资源子集合各自对应的转化数和联合因子;所述转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,所属联合因子是根据对应的历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定的。Step S103, performing aggregation statistical processing based on the H historical business resource subsets respectively to obtain an aggregated data set; the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource subsets; the The number of conversions is determined based on the number of conversions of each historical business resource in the corresponding historical business resource subset, and the associated joint factor is based on the estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset definite.
具体的,一个历史业务资源对应有转化数、预估转换率和行业因子,根据预估转化率和行业因子可以确定该历史业务资源的联合因子;计算机设备对一个历史业务资源子集合中每个历史业务资源对应的转化数进行聚合统计,可以得到该历史业务资源子集合中所有历史业务资源对应的总转化数;对一个历史业务资源子集合中每个历史业务资源对应的联合因子进行聚合统计,可以得到该历史业务资源子集合中所有历史业务资源对应的总联合因子;然后将上述总转化数和总联合因子分别作为该历史业务资源子集合对应的转化数和联合因子,添加至聚合数据集中。Specifically, a historical business resource corresponds to a conversion number, an estimated conversion rate, and an industry factor, and the joint factor of the historical business resource can be determined according to the estimated conversion rate and the industry factor; Aggregate the number of conversions corresponding to historical business resources to obtain the total number of conversions corresponding to all historical business resources in the historical business resource subset; perform aggregate statistics on the joint factor corresponding to each historical business resource in a historical business resource subset , the total joint factor corresponding to all historical business resources in the historical business resource subset can be obtained; then the above-mentioned total conversion number and total joint factor are respectively used as the conversion number and joint factor corresponding to the historical business resource subset, and added to the aggregated data concentrated.
步骤S104,根据所述资源属性信息,在所述聚合数据集中获取针对所述目标业务资源的有效转化数和有效联合因子。Step S104, according to the resource attribute information, obtain the effective conversion number and effective combination factor for the target service resource in the aggregated data set.
具体的,计算机设备会基于S个组合类型,从资源属性信息中提取S个资源属性组合,然后在聚合数据集中查找S个资源属性组合各自对应的转化数和联合因子。假设S个资源属性组合包括资源属性组合Za,a为小于或等 于S的正整数,则在聚合数据集中查找S个资源属性组合各自对应的转化数和联合因子的过程,可以为:在H个历史业务资源子集合对应的历史资源属性组合中,查找与资源属性组合Za相同的历史业务资源子集合,作为匹配子集合;在聚合数据集中获取与匹配子集合对应的转化数和联合因子,作为资源属性组合Za对应的转化数和联合因子。Specifically, the computer device will extract S resource attribute combinations from the resource attribute information based on the S combination types, and then search the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregation data set. Assuming that S resource attribute combinations include resource attribute combination Za, and a is a positive integer less than or equal to S, the process of finding the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregation data set can be: In the historical resource attribute combination corresponding to the historical business resource subset, find the historical business resource subset that is the same as the resource attribute combination Za as the matching subset; obtain the conversion number and the joint factor corresponding to the matching subset in the aggregated data set as The conversion number and joint factor corresponding to the resource attribute combination Za.
以组合类型为上述[广告主,站点集,新旧广告]为例,假设基于该组合类型从目标业务资源的资源属性信息中提取到的资源属性组合为[小甲,27,新],上述基于[广告主,站点集,新旧广告]的历史业务资源子集合4001和历史业务资源子集合4002中,历史业务资源子集合4001对应的历史资源属性组合与该资源属性组合相同,因此计算机设备会将历史业务资源子集合4001作为匹配子集合,然后在聚合数据集中获取该匹配子集合对应的转化数和联合因子,作为该资源属性组合对应的转化数和联合因子。最后,计算机设备会在S个资源属性组合各自对应的转化数和联合因子中,确定针对目标业务资源的有效转化数和有效联合因子。Taking the above combination type as [advertiser, site collection, new and old advertisement] as an example, assuming that the resource attribute combination extracted from the resource attribute information of the target business resource based on this combination type is [Xiaojia, 27, new], the above is based on In the historical business resource subset 4001 and the historical business resource subset 4002 of [advertiser, site collection, new and old advertisement], the historical resource attribute combination corresponding to the historical business resource subset 4001 is the same as the resource attribute combination, so the computer device will The historical service resource subset 4001 is used as a matching subset, and then the conversion numbers and joint factors corresponding to the matching subset are obtained from the aggregated data set as the conversion numbers and joint factors corresponding to the resource attribute combination. Finally, the computer device will determine the effective conversion numbers and effective combination factors for the target business resource from the corresponding conversion numbers and combination factors of the S resource attribute combinations.
具体的,计算机设备会在S个资源属性组合各自对应的转化数和联合因子中,确定针对目标业务资源的有效转化数和有效联合因子的过程,可以为:根据S个组合类型的优先级,确定S个资源属性组合各自对应的转化数和联合因子的优先级;根据S个资源属性组合各自对应的消耗数据,确定S个资源属性组合各自对应的转化数和联合因子的有效性;将S个资源属性组合各自对应的转化数和联合因子中,具有有效性的转化数和联合因子确定为候选转化数和候选联合因子,将优先级最高的候选转化数和候选联合因子,作为针对目标业务资源的有效转化数和有效联合因子。其中,组合类型的优先级可以根据实际情况设置,比如可以设定优先级从高到低依次为[广告主]、[商品品牌]、[商品]。其中,一个资源属性组合对应的转化数和联合因子的有效性,可以根据该资源属性组合对应的历史业务资源子集合的总消耗数据是否大于四倍targetCpa来确定。Specifically, the computer device will determine the effective conversion number and effective combination factor for the target business resource among the corresponding conversion numbers and combination factors of the S resource attribute combinations, which can be: according to the priorities of the S combination types, Determine the conversion numbers corresponding to the S resource attribute combinations and the priority of the joint factor; according to the consumption data corresponding to the S resource attribute combinations, determine the conversion numbers and the effectiveness of the joint factor corresponding to the S resource attribute combinations; Among the conversion numbers and joint factors corresponding to each resource attribute combination, the effective conversion numbers and joint factors are determined as candidate conversion numbers and candidate joint factors, and the candidate conversion numbers and candidate joint factors with the highest priority are used as targets for the target business. The number of effective conversions and effective joint factors for the resource. Wherein, the priority of the combination type can be set according to the actual situation, for example, the priority can be set as [advertiser], [commodity brand], [commodity] from high to low. Wherein, the conversion number and the effectiveness of the joint factor corresponding to a resource attribute combination can be determined according to whether the total consumption data of the historical service resource subset corresponding to the resource attribute combination is greater than four times the targetCpa.
为便于理解,请一并参见图4b,图4b是本申请实施例提供的一种有效转化数和有效联合因子的确定场景示意图。如图4b所示,候选转化数m1和候选联合因子n1对应的历史业务资源子集合为历史业务资源子集合Z1,历史业务资源子集合Z1对应的组合类型为[广告主];候选转化数m2和候选联合因子n2对应的历史业务资源子集合为历史业务资源子集合Z2,历史业务资源子集合Z2对应的组合类型为[商品品牌];候选转化数m3和候选联合因子n3对应的历史业务资源子集合为历史业务资源子集合Z3,历史业务资源子集合Z3对应的组合类型为[商品]。For ease of understanding, please also refer to FIG. 4b. FIG. 4b is a schematic diagram of a determination scenario of an effective conversion number and an effective combination factor provided by an embodiment of the present application. As shown in Figure 4b, the historical business resource subset corresponding to the candidate conversion number m1 and the candidate joint factor n1 is the historical business resource subset Z1, and the combination type corresponding to the historical business resource subset Z1 is [advertiser]; the candidate conversion number m2 The historical business resource subset corresponding to the candidate joint factor n2 is the historical business resource subset Z2, and the combination type corresponding to the historical business resource subset Z2 is [commodity brand]; the candidate conversion number m3 and the historical business resource corresponding to the candidate joint factor n3 The subset is the historical business resource subset Z3, and the combination type corresponding to the historical business resource subset Z3 is [commodity].
假设组合类型的优先级从高到低依次为[广告主]、[商品品牌]、[商品],则计算机设备在确定有效转化数和有效联合因子时,会先确定历史业务资源子集合Z1中所有历史业务资源的总消耗数据是否大于四倍的历史业务资源 子集合Z1中所有历史业务资源的总targetCpa,如果是,计算机设备会将候选转化数m1和候选联合因子n1作为有效转化数和有效联合因子;如果不是,计算机设备会确定历史业务资源子集合Z2中所有历史业务资源的总消耗数据是否大于四倍的历史业务资源子集合Z2中所有历史业务资源的总targetCpa,如果是,计算机设备会将候选转化数m2和候选联合因子n2作为有效转化数和有效联合因子;如果不是,计算机设备会确定历史业务资源子集合Z3中所有历史业务资源的总消耗数据是否大于四倍的历史业务资源子集合Z3中所有历史业务资源的总targetCpa,如果是,计算机设备会将候选转化数m3和候选联合因子n3作为有效转化数和有效联合因子。Assuming that the priority of combination types from high to low is [advertiser], [commodity brand], [commodity], when the computer equipment determines the number of effective conversions and effective joint factors, it will first determine the historical business resource subset Z1 Whether the total consumption data of all historical business resources is greater than four times the total targetCpa of all historical business resources in the historical business resource subset Z1; Joint factor; if not, the computer device will determine whether the total consumption data of all historical business resources in the historical business resource subset Z2 is greater than four times the total targetCpa of all historical business resources in the historical business resource subset Z2, if yes, the computer device The candidate conversion number m2 and the candidate joint factor n2 will be used as the effective conversion number and the effective joint factor; if not, the computer device will determine whether the total consumption data of all historical business resources in the historical business resource subset Z3 is greater than four times the historical business resources The total targetCpa of all historical business resources in the subset Z3, if yes, the computer device will use the candidate conversion number m3 and the candidate joint factor n3 as the effective conversion number and effective joint factor.
步骤S105,根据所述有效转化数和所述有效联合因子确定校准系数,根据所述校准系数对所述目标业务资源的预估联合因子进行校准;所述预估联合因子是根据所述目标业务资源的预估转化率和行业因子确定的。Step S105, determine a calibration coefficient according to the effective conversion number and the effective combination factor, and calibrate the estimated combination factor of the target service resource according to the calibration coefficient; the estimated combination factor is based on the target service resource The resource's estimated conversion rate and industry factors are determined.
具体的,校准系数可以通过下述公式(1)来计算:Specifically, the calibration coefficient can be calculated by the following formula (1):
cali_rate=Conv_valid/PCVRMulFactor_valid      公式(1)cali_rate=Conv_valid/PCVRMulFactor_valid Formula (1)
其中,cali_rate是校准系数,Conv_valid是有效转化数,Among them, cali_rate is the calibration coefficient, Conv_valid is the effective conversion number,
PCVRMulFactor_valid是有效联合因子。PCVRMulFactor_valid is the valid joint factor.
具体的,校准过程可以参见下述公式(2):Specifically, the calibration process can refer to the following formula (2):
New_PCVRMulFactor=old_PCVRMulFactor*cali_rate     公式(2)New_PCVRMulFactor=old_PCVRMulFactor*cali_rate formula (2)
其中,New_PCVRMulFactor是校准后的预估联合因子,old_PCVRMulFactor是校准之前的预估联合因子。校准之前的预估联合因子是根据目标业务资源的预估转化率和行业因子确定的。其中,目标业务资源的预估转化率、行业因子都可以通过对应的预测模型进行预测得到。Among them, New_PCVRMulFactor is the estimated joint factor after calibration, and old_PCVRMulFactor is the estimated joint factor before calibration. The estimated joint factor before calibration is determined based on the estimated conversion rate and industry factor of the target business resource. Among them, the estimated conversion rate and industry factors of the target business resources can be predicted through corresponding prediction models.
可选的,上述校准过程可以仅应用于目标业务资源的初始推广阶段,此时计算机设备接收针对目标业务资源的校准请求后,会先确定目标业务资源的推广阶段;若目标业务资源的推广阶段为初始推广阶段,才响应目标业务资源的校准请求,执行上述校准过程。因为对于目标业务资源,在刚开始投放的初始推广阶段,由于没有当天的历史数据(或者历史数据不充分),其校准不能依靠目标业务资源自身的数据,更需要综合考虑其他历史业务资源的数据。初始推广阶段的定义可以为,目标业务资源的转化数小于等于2或者消耗小于等于2倍的targetCpa的推广阶段。Optionally, the above calibration process can only be applied to the initial promotion stage of the target business resource. At this time, after the computer device receives the calibration request for the target business resource, it will first determine the promotion stage of the target business resource; if the promotion stage of the target business resource In the initial promotion stage, the above-mentioned calibration process is executed in response to the calibration request of the target business resource. Because for the target business resources, in the initial promotion stage of the launch, since there is no historical data of the day (or the historical data is insufficient), its calibration cannot rely on the data of the target business resources themselves, and it is necessary to comprehensively consider the data of other historical business resources. . The initial promotion stage can be defined as a promotion stage in which the number of conversions of target business resources is less than or equal to 2 or the consumption is less than or equal to 2 times the targetCpa.
采用本申请实施例提供的方法,可以获取目标业务资源在N个资源属性类型下的资源属性信息,然后基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合,并基于H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集,之后业务服务器100会根据资源属性信息,在聚合数据集中获取针对目标业务资源的有效转化数和有效联合因子,根据有效转化数和有效联合因子确定校准系数,最 后根据校准系数对目标业务资源的预估联合因子进行校准。其中,N为正整数。其中,每个组合类型对应中的资源属性类型均属于N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数。其中,聚合数据集包括H个历史业务资源子集合各自对应的转化数和联合因子;转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,联合因子是根据对应的历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定的。其中,预估联合因子是通过目标业务资源的预估转化率和行业因子确定的。Using the method provided in the embodiment of the present application, the resource attribute information of the target business resource under N resource attribute types can be obtained, and then each historical business resource in the historical business resource set is clustered based on the S combination types to obtain H historical business resource sub-sets, and based on the H historical business resource sub-sets, perform aggregate statistical processing respectively to obtain an aggregated data set, and then the business server 100 will obtain the effective conversion data for the target business resource in the aggregated data set according to the resource attribute information and the effective joint factor, determine the calibration coefficient according to the effective conversion number and the effective joint factor, and finally calibrate the estimated joint factor of the target business resource according to the calibration coefficient. Wherein, N is a positive integer. Among them, the resource attribute types corresponding to each combination type belong to N resource attribute types; the historical resource attribute combinations of each historical business resource in a historical business resource subset are the same, and a historical resource attribute combination corresponds to a resource of a combination type Attribute types are associated; S is a positive integer less than or equal to N; H is a positive integer. Among them, the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource subsets; The estimated conversion rate and industry factor corresponding to each historical business resource in the historical business resource subset are determined. Wherein, the estimated joint factor is determined through the estimated conversion rate of the target business resource and the industry factor.
采用本申请提供的方法,可以基于目标业务资源的资源属性信息,获取根据与该目标业务资源相关的历史业务资源确定的有效转化数和有效联合因子,进而,利用该有效转化数和有效联合因子对目标业务资源的预估联合因子进行校准,根据校准后的预估联合因子来调整目标业务资源的展示费用的预估值,可以减小展示费用的预估值与实际值之间的偏差,提高预估准确率。Using the method provided in this application, based on the resource attribute information of the target business resource, the effective conversion number and effective joint factor determined according to the historical business resources related to the target business resource can be obtained, and then the effective conversion number and effective joint factor can be used Calibrate the estimated joint factor of the target business resource, and adjust the estimated display cost of the target business resource according to the calibrated estimated joint factor, which can reduce the deviation between the estimated display cost and the actual value, Improve forecast accuracy.
进一步地,请参见图5,图5是本申请实施例提供的一种数据校准方法的流程示意图。该方法由图1中的计算机设备执行,该计算机设备可以为图1中的业务服务器100,也可以为图1中的用户终端集群中的用户终端(也包括用户终端200a、用户终端200b、用户终端200c以及用户终端200n)。如图5所示,该数据校准方法可以包括如下步骤S201-步骤S208。Further, please refer to FIG. 5 , which is a schematic flowchart of a data calibration method provided by an embodiment of the present application. The method is executed by the computer equipment in FIG. 1, and the computer equipment can be the service server 100 in FIG. terminal 200c and user terminal 200n). As shown in FIG. 5 , the data calibration method may include the following steps S201-S208.
步骤S201,获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数。Step S201, acquiring resource attribute information of the target service resource under N resource attribute types; N is a positive integer.
步骤S202,基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;所述H个历史业务资源子集合包括历史业务资源子集合Nj,j为小于或等于H的正整数。Step S202, clustering the historical business resources in the historical business resource set based on the S combination types, to obtain H historical business resource subsets; the H historical business resource subsets include the historical business resource subset Nj, j is a positive integer less than or equal to H.
具体的,步骤S201和步骤S02的具体实现过程,可以参见上述图3所对应实施例中步骤S101和步骤S102的描述,这里不再进行赘述。Specifically, for the specific implementation process of step S201 and step S02, reference may be made to the description of step S101 and step S102 in the above embodiment corresponding to FIG. 3 , which will not be repeated here.
步骤S203,确定所述历史业务资源子集合Nj对应的第一单位转化数和第一单位联合因子;所述第一单位转化数是根据所述历史业务资源子集合Nj中各历史业务资源在第一单位时长内的转化数确定的,所述第一单位联合因子是根据所述历史业务资源子集合Nj中各历史业务资源在所述第一单位时长内的预估转化率和行业因子确定的。Step S203, determine the first unit conversion number and the first unit combination factor corresponding to the historical business resource subset Nj; the first unit conversion number is based on the historical business resources in the historical business resource subset Nj The number of conversions within a unit of time is determined, and the first unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the first unit of time .
具体的,计算机设备会将历史业务资源子集合Nj中各历史业务资源确定为待统计历史业务资源,获取待统计历史业务资源的日志信息;然后从日志信息中获取各待统计历史业务资源在第一单位时长内的转化数、预估转化率以及行业因子;将待统计历史业务资源在第一单位时长内的预估转化率和行业因子相乘,得到待统计历史业务资源在第一单位时长内的联合因子;进而,对各待统计历史业务资源在第一单位时长内的联合因子进行求和处理,得到 历史业务资源子集合Nj对应的第一单位联合因子。计算机设备对各待统计历史业务资源在第一单位时长内的转化数进行求和处理,得到历史业务资源子集合Nj对应的第一单位转化数。Specifically, the computer device will determine each historical business resource in the historical business resource subset Nj as the historical business resource to be counted, and obtain the log information of the historical business resource to be counted; The number of conversions, estimated conversion rate, and industry factor within a unit of time; multiply the estimated conversion rate of the historical business resources to be counted within the first unit of time and the industry factor to obtain the historical business resources to be counted in the first unit of time Furthermore, the summation processing is performed on the joint factors of the historical business resources to be counted within the first unit duration to obtain the first unit joint factor corresponding to the historical business resource subset Nj. The computer equipment sums the conversion numbers of the historical business resources to be counted within the first unit duration to obtain the first unit conversion numbers corresponding to the historical business resource subset Nj.
其中,第一单元时长可以为一分钟、一小时、两小时等等,这里不作限制。其中,第一单元时长的结束时刻,通常为当前的系统时刻。比如,当第一单元时长为一小时时,当前的系统时刻为9:00,则待统计历史业务资源最近一小时时长内的转化数、预估转化率以及行业因子指的是8:00到9:00这个时间段内,待统计历史业务资源的转化数、预估转化率以及行业因子。Wherein, the duration of the first unit may be one minute, one hour, two hours, etc., which is not limited here. Wherein, the end time of the first unit duration is usually the current system time. For example, when the duration of the first unit is one hour, and the current system time is 9:00, then the number of conversions, estimated conversion rates, and industry factors of the historical business resources to be counted within the last hour refer to the period from 8:00 to During the time period of 9:00, the number of conversions, estimated conversion rates, and industry factors of historical business resources are to be counted.
步骤S204,确定所述历史业务资源子集合Nj对应的第二单位转化数和第二单位联合因子;所述第二单位转化数是根据所述历史业务资源子集合Nj中各历史业务资源在第二单位时长内的转化数确定的,所述第二单位联合因子是根据所述历史业务资源子集合Nj中各历史业务资源在所述第二单位时长内的预估转化率和行业因子确定的;所述第二单位时长大于所述第一单位时长。Step S204, determine the second unit conversion number and the second unit combination factor corresponding to the historical business resource subset Nj; the second unit conversion number is based on the historical business resources in the historical business resource subset Nj The number of conversions within two units of time is determined, and the second unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the second unit of time ; The second unit duration is greater than the first unit duration.
具体的,计算机设备会确定历史业务资源子集合Nj的第二单位时长,其中,第二单位时长可以为一小时、两小时、全天等等,需要说明的是,第二单位时长应该大于上述第一单位时长。然后,计算机设备基于第一单位时长对第二单位时长进行划分,得到至少两个统计时段;每个统计时段的时长均小于或者等于第一单位时长。比如,第一单位时长为一小时,第二单位时长可以为全天。其中,全天指的是今天零点到当前的系统时刻之间,比如当前的系统时刻为7:00,则第二单位时长指的是今天0:00到7:00之间的时长。计算机设备根据第一单位时长对第二单位时长进行划分,可以得到0:00-1:00、1:00-2:00、2:00-3:00、3:00-4:00、4:00-5:00、5:00-6:00、6:00-7:00六个统计时段,每个统计时段的时长均为一小时。然后,计算机设备会从上述日志信息中获取待统计历史业务资源在每个统计时段内的转化数、预估转化率以及行业因子;针对每个统计时段,对待统计历史业务资源在该统计时段内的预估转化率以及行业因子进行相乘处理,生成待统计历史业务资源在该统计时段内的联合因子;进而,对各待统计历史业务资源各自在该统计时段内的联合因子进行求和处理,得到历史业务资源子集合Nj在该统计时段内的统计时段联合因子。针对每个统计时段,计算机设备会对各待统计历史业务资源各自在该统计时段内的转化数进行求和处理,得到历史业务资源子集合Nj在每个统计时段内的统计时段转化数。最后,根据时间衰减策略,对历史业务资源子集合Nj在每个统计时段内的统计时段转化数和统计时段联合因子进行处理,得到历史业务资源子集合Nj对应的第二单位转化数和第二联合因子。Specifically, the computer device will determine the second unit duration of the historical service resource subset Nj, wherein the second unit duration can be one hour, two hours, the whole day, etc. It should be noted that the second unit duration should be greater than the above The first unit duration. Then, the computer device divides the second unit duration based on the first unit duration to obtain at least two statistical periods; the duration of each statistical period is less than or equal to the first unit duration. For example, the first unit duration is one hour, and the second unit duration may be a whole day. Wherein, the whole day refers to the period between 0:00 today and the current system time. For example, the current system time is 7:00, and the second unit duration refers to the duration between 0:00 and 7:00 today. The computer device divides the second unit time according to the first unit time, and can get 0:00-1:00, 1:00-2:00, 2:00-3:00, 3:00-4:00, 4 :00-5:00, 5:00-6:00, 6:00-7:00 six statistical periods, each statistical period is one hour long. Then, the computer device will obtain the conversion number, estimated conversion rate, and industry factor of the historical business resources to be counted within each statistical period from the above log information; for each statistical period, the historical business resources to be counted within the statistical period The estimated conversion rate and the industry factor are multiplied to generate the joint factor of the historical business resources to be counted in the statistical period; then, the joint factors of the historical business resources to be counted in the statistical period are summed , to obtain the statistical period joint factor of the historical service resource subset Nj within the statistical period. For each statistical period, the computer equipment sums the conversion numbers of the historical business resources to be counted within the statistical period, and obtains the conversion numbers of the historical business resource subset Nj in each statistical period. Finally, according to the time attenuation strategy, the statistical period conversion number and statistical period joint factor of the historical business resource subset Nj in each statistical period are processed, and the second unit conversion number and the second unit conversion number corresponding to the historical business resource subset Nj are obtained. joint factor.
具体的,假设上述至少两个统计时段包括统计时段Lk,k为小于或等于所述至少两个统计时段的总数量的正整数;所述统计时段Lk的起始时间早于统计时段Lk+1,则上述根据时间衰减策略,对历史业务资源子集合Nj在每 个统计时段内的统计时段转化数和统计时段联合因子进行处理,得到历史业务资源子集合Nj对应的第二单位转化数和第二联合因子的过程,可以为:根据时间衰减因子、至少两个统计时段的总数量、以及统计时段Lk的起始时间在至少两个统计时段中的正向排列顺序,对统计时段Lk内的统计时段转化数和统计时段联合因子分别进行衰减处理,得到衰减转化数和衰减联合因子;对每个统计时段内的衰减转化数进行求和处理,得到历史业务资源子集合Nj对应的第二单位转化数;对每个统计时段内的衰减联合因子进行求和处理,得到历史业务资源子集合Nj对应的第二单位联合因子。Specifically, it is assumed that the above-mentioned at least two statistical periods include a statistical period Lk, and k is a positive integer less than or equal to the total number of the at least two statistical periods; the start time of the statistical period Lk is earlier than the statistical period Lk+1 , then according to the time attenuation strategy mentioned above, the statistical period conversion number and the statistical period joint factor of the historical business resource subset Nj in each statistical period are processed, and the second unit conversion number and the second unit conversion number corresponding to the historical business resource subset Nj are obtained. The process of two joint factors can be: according to the time decay factor, the total quantity of at least two statistical periods, and the forward order of the start time of the statistical period Lk in at least two statistical periods, the statistics within the period Lk The number of conversions in the statistical period and the joint factor of the statistical period are respectively attenuated to obtain the number of attenuation conversions and the joint factor of attenuation; the number of attenuation conversions in each statistical period is summed to obtain the second unit corresponding to the historical business resource subset Nj The number of conversions; the attenuation joint factors in each statistical period are summed to obtain the second unit joint factor corresponding to the historical service resource subset Nj.
若第一单位时长为一小时,第二单位时长为全天,上述确定过程,可以参见下述公式(3)和公式(4):If the first unit duration is one hour and the second unit duration is a whole day, the above determination process can be referred to the following formulas (3) and (4):
Figure PCTCN2022087839-appb-000001
Figure PCTCN2022087839-appb-000001
Figure PCTCN2022087839-appb-000002
Figure PCTCN2022087839-appb-000002
其中,Conv_advertiser_day指历史业务资源子集合Nj的第二单位转化数,即当天总的转化数;I是当前时刻(小时);lambda是时间衰减系数,可以根据实际情况取值,比如0.05;Conv_advertiser_hourk是统计时段Lk内的统计时段转化数;PCVRMulFactor_advertiser_day指历史业务资源子集合Nj的第二单位联合因子,即当天总的联合因子,也就是总的(PCVR*行业因子)之和;PCVRMutor_advertiser_hour k是统计时段Lk内的统计时段联合因子。 Among them, Conv_advertiser_day refers to the second unit conversion number of the historical business resource subset Nj, that is, the total conversion number of the day; I is the current moment (hour); lambda is the time decay coefficient, which can be valued according to the actual situation, such as 0.05; Conv_advertiser_hour is The number of statistical period conversions within the statistical period Lk; PCVRMulFactor_advertiser_day refers to the second unit joint factor of the historical business resource subset Nj, that is, the total joint factor of the day, that is, the sum of the total (PCVR*industry factors); PCVRMutor_advertiser_hour k is the statistical period Statistical period joint factor within Lk.
步骤S205,获取所述历史业务资源子集合Nj的第一单位消耗数据;根据所述第一单位消耗、第一单位转化数、第一单位联合因子、第二单位转化数和第二单位联合因子,确定历史业务资源子集合Nj对应的转化数和联合因子。Step S205, obtaining the first unit consumption data of the historical business resource subset Nj; according to the first unit consumption, the first unit conversion number, the first unit combination factor, the second unit conversion number and the second unit combination factor , to determine the conversion number and joint factor corresponding to the historical business resource subset Nj.
具体的,若第一单位消耗数据属于充分消耗数据,则将第一单位转化数作为历史业务资源子集合Nj对应的转化数,将第一单位联合因子作为历史业务资源子集合Nj对应的联合因子;若第一单位消耗数据属于不充分消耗数据, 则将第二单位转化数作为历史业务资源子集合Nj对应的转化数,将第二单位联合因子作为历史业务资源子集合Nj对应的联合因子。其中,第一单位消耗数据是指历史业务资源子集合Nj中各历史业务资源各自在第一单位时长内的消耗数据的总和。上述过程可以参见公式(5)和公式(6):Specifically, if the first unit consumption data belongs to sufficient consumption data, the first unit conversion number is taken as the conversion number corresponding to the historical business resource subset Nj, and the first unit joint factor is taken as the joint factor corresponding to the historical business resource subset Nj ; If the first unit consumption data belongs to insufficient consumption data, the second unit conversion number is used as the conversion number corresponding to the historical business resource subset Nj, and the second unit joint factor is used as the joint factor corresponding to the historical business resource subset Nj. Wherein, the first unit consumption data refers to the sum of the consumption data of each historical service resource in the historical service resource subset Nj within the first unit duration. The above process can be referred to formula (5) and formula (6):
Figure PCTCN2022087839-appb-000003
Figure PCTCN2022087839-appb-000003
Figure PCTCN2022087839-appb-000004
Figure PCTCN2022087839-appb-000004
其中,Conv_advertiser指历史业务资源子集合Nj对应的转化数,PCVRMulFactor_advertiser指历史业务资源子集合Nj对应的联合因子。Among them, Conv_advertiser refers to the conversion number corresponding to the historical service resource subset Nj, and PCVRMulFactor_advertiser refers to the joint factor corresponding to the historical service resource subset Nj.
可选的,计算机设备确定第一单位消耗数据是否属于充分消耗数据的过程,可以为:获取转化交易价值数据,根据转化交易价值数据,确定充分数据阈值;若第一单位消耗数据大于充分数据阈值,则确定第一单位消耗数据属于充分消耗数据;若第一单位消耗数据小于或等于充分数据阈值,则确定第一单位消耗数据属于不充分消耗数据。其中,转化交易价值数据即上述所说的转化出价targetCpa。其中,充分数据阈值可以等于4倍targetCpa。Optionally, the process for the computer device to determine whether the first unit consumption data belongs to the sufficient consumption data may be as follows: obtaining the conversion transaction value data, and determining the sufficient data threshold according to the conversion transaction value data; if the first unit consumption data is greater than the sufficient data threshold , it is determined that the first unit consumption data belongs to sufficient consumption data; if the first unit consumption data is less than or equal to the sufficient data threshold, it is determined that the first unit consumption data belongs to insufficient consumption data. Wherein, the conversion transaction value data is the above-mentioned conversion bid targetCpa. Wherein, the sufficient data threshold may be equal to 4 times targetCpa.
步骤S206,当获取到所述H个历史业务资源子集合各自对应的转化数和联合因子时,生成包括所述H个历史业务资源子集合各自对应的转化数和联合因子的聚合数据集。Step S206, when the conversion numbers and joint factors corresponding to the H historical business resource subsets are obtained, generate an aggregated data set including the respective conversion numbers and joint factors corresponding to the H historical business resource subsets.
步骤S207,根据所述资源属性信息,在所述聚合数据集中获取针对所述目标业务资源的有效转化数和有效联合因子。Step S207, according to the resource attribute information, obtain the effective conversion number and effective combination factor for the target service resource in the aggregated data set.
步骤S208,根据所述有效转化数和所述有效联合因子确定校准系数,根据所述校准系数对所述目标业务资源的预估联合因子进行校准;所述预估联合因子是根据所述目标业务资源的预估转化率和行业因子确定的。Step S208, determine a calibration coefficient according to the effective conversion number and the effective combination factor, and calibrate the estimated combination factor of the target service resource according to the calibration coefficient; the estimated combination factor is based on the target service The resource's estimated conversion rate and industry factors are determined.
具体的,步骤S207到步骤S208的具体实现过程,可以参见上述图3所对应实施例的步骤S104到S105,这里不再进行赘述。Specifically, for the specific implementation process of steps S207 to S208, reference may be made to steps S104 to S105 in the above embodiment corresponding to FIG. 3 , which will not be repeated here.
通过本申请实施例提供的方法,在对目标业务资源的预估联合因子进行校准的过程中,确定聚合数据集时可以获取每个历史业务子集合Nj对应的第一单位转化数和第一单位联合因子、第二单位转化数和第二单位联合因子,然后通过判断第一单位消耗数据是否属于充分消耗数据,来从每个历史业务资源子集合对应的第一单位转化数和第一单位联合因子、第二单位转化数和 第二单位联合因子中,选出该历史业务资源子集合的转化数和联合因子,进而得到聚合数据集。采用本申请实施例提供的方法,可以提高聚合数据集中的转化数和联合因子的时效性和有效性,从而提高校准系数的准确性,最终提高预估准确率。Through the method provided by the embodiment of this application, in the process of calibrating the estimated joint factor of the target business resource, the first unit conversion number and the first unit corresponding to each historical business subset Nj can be obtained when determining the aggregated data set Joint factor, second unit conversion number and second unit joint factor, and then by judging whether the first unit consumption data is sufficient consumption data, from the first unit conversion number corresponding to each historical business resource subset and the first unit joint From the factor, the second unit conversion number and the second unit joint factor, select the conversion number and the joint factor of the historical business resource subset, and then obtain the aggregated data set. By adopting the method provided in the embodiment of the present application, the timeliness and effectiveness of the conversion numbers and joint factors in the aggregated data set can be improved, thereby improving the accuracy of the calibration coefficients and finally improving the prediction accuracy.
进一步地,请参见图6,图6是本申请实施例提供的一种确定展示费用联合因子的分析方法的流程示意图。该方法由图1中的计算机设备执行,该计算机设备可以为图1中的业务服务器100,也可以为图1中的用户终端集群中的用户终端(也包括用户终端200a、用户终端200b、用户终端200c以及用户终端200n)。如图6所示,该数据校准方法可以包括如下步骤S301-步骤S308。Further, please refer to FIG. 6 . FIG. 6 is a schematic flowchart of an analysis method for determining a combination factor of display costs provided by an embodiment of the present application. The method is executed by the computer equipment in FIG. 1, and the computer equipment can be the service server 100 in FIG. terminal 200c and user terminal 200n). As shown in FIG. 6, the data calibration method may include the following steps S301-S308.
步骤S301,在历史业务资源集合中,根据历史业务资源划分粒度,将预期消耗数据小于实际消耗数据的历史业务资源作为待处理资源,将所述待处理资源添加至待处理资源集合中;所述待处理资源集合中包括待处理资源Sr,r为小于或等于所述待处理资源集合中待处理资源的总数量的正整数。Step S301, in the set of historical business resources, according to the division granularity of historical business resources, the historical business resources whose expected consumption data is smaller than the actual consumption data are regarded as resources to be processed, and the resources to be processed are added to the set of resources to be processed; The resource set to be processed includes resource Sr to be processed, and r is a positive integer less than or equal to the total number of resources to be processed in the resource set to be processed.
具体的,计算机设备会将历史业务资源集合中粒度信息相同的历史业务资源划分到一起,此处的粒度信息是与历史资源划分粒度相关联的信息;然后从划分后的历史业务资源中获取预期消耗数据小于实际消耗数据的历史业务资源,作为待处理资源,并将待处理资源添加至待处理资源集合中。其中,每一个待处理资源中包括的历史业务资源的粒度信息相同。其中,历史业务资源划分粒度可以是资源粒度、账户粒度、集团粒度等等,比如,历史业务资源划分粒度为集团粒度,假设历史业务资源集合中包括集团A的历史业务资源O1、集团B的历史业务资源O2、集团A的历史业务资源O3、集团A的历史业务资源O4,则划分后的历史业务资源为{历史业务资源O1,历史业务资源O3,历史业务资源O4}和{历史业务资源O2},如果只有历史业务资源O3的预期消耗数据大于实际消耗数据,则最终得到的两个待处理资源,一个待处理资源包括历史业务资源O1和历史业务资源O4,另一个待处理资源包括历史业务资源O2。Specifically, the computer device will divide the historical business resources with the same granularity information in the historical business resource set together, where the granularity information is the information associated with the division granularity of the historical resources; and then obtain the expected Historical business resources whose consumption data is smaller than the actual consumption data are regarded as pending resources, and the pending resources are added to the pending resource collection. Wherein, the granularity information of the historical service resources included in each resource to be processed is the same. Among them, the granularity of historical business resources can be resource granularity, account granularity, group granularity, etc. Business resource O2, historical business resource O3 of group A, and historical business resource O4 of group A, then the historical business resources after division are {historical business resource O1, historical business resource O3, historical business resource O4} and {historical business resource O2 }, if only the expected consumption data of the historical business resource O3 is greater than the actual consumption data, the final two resources to be processed, one resource to be processed includes the historical business resource O1 and the historical business resource O4, and the other resource to be processed includes the historical business resource Resource O2.
具体的,预期消耗数据指上述图3所对应实施例中提到的期望总费用GMV(GMV,Guaranteed Minimum Value),即广告主期望支付的费用,如果广告主按照转换出价targetCpa,则根据公式(7)可以计算GMV:Specifically, the expected consumption data refers to the expected total cost GMV (GMV, Guaranteed Minimum Value) mentioned in the embodiment corresponding to Figure 3 above, that is, the cost that the advertiser expects to pay. If the advertiser bids targetCpa according to the conversion, then according to the formula ( 7) GMV can be calculated:
GMV=targetCpa*转化数公式      公式(7)GMV=targetCpa*conversion number formula Formula (7)
实际消耗数据指上述图3所对应实施例中提到的消耗(Cost),消耗数据可以根据公式(8)来计算:The actual consumption data refers to the consumption (Cost) mentioned in the above-mentioned embodiment corresponding to Figure 3, and the consumption data can be calculated according to formula (8):
Cost=ECPM*曝光量公式       公式(8)Cost=ECPM*exposure formula Formula (8)
其中,ECPM即展示费用。在理想的情况下Cost=GMV,这样能够达到广告平台与广告主的互赢互惠,即广告平台既没有多收钱也没有少收钱。但是随着广告系统的越来越复杂,演化版本越来越多,往往GMV<Cost,即预 期消耗数据小于实际消耗数据。这种情况就是平台超收,而造成超成本,对广告主不利。因此分析展示费用的联合因子时,计算机设备可以选择历史业务数据集中预期消耗数据小于实际消耗数据的历史业务资源。Among them, ECPM is the display fee. In an ideal situation, Cost = GMV, which can achieve mutual benefits between the advertising platform and the advertiser, that is, the advertising platform neither charges more money nor charges less money. However, as the advertising system becomes more and more complex and there are more and more evolutionary versions, GMV<Cost often means that the expected consumption data is less than the actual consumption data. This situation is that the platform overcharges, resulting in over-costs, which is not good for advertisers. Therefore, when analyzing the joint factor of the display fee, the computer device can select historical service resources whose expected consumption data is smaller than the actual consumption data in the historical service data set.
步骤S302,根据所述待处理资源Sr的调价因子、风控因子、预估转化率、实际转化率、预估计费比因子、实际计费比因子、预估点击率、实际点击率以及行业因子,确定所述待处理资源Sr的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子。Step S302, according to the price adjustment factor, risk control factor, estimated conversion rate, actual conversion rate, estimated cost ratio factor, actual billing ratio factor, estimated click rate, actual click rate and industry of the resource Sr to be processed factor to determine the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor of the resource Sr to be processed.
具体的,上述ECPM可以通过公式(9)来计算,即:Specifically, the above ECPM can be calculated by formula (9), namely:
ECPM=targetCpa*all_factor*pcvr*pctr    公式(9)ECPM=targetCpa*all_factor*pcvr*pctr formula (9)
其中,targetCpa是广告主的转化出价,all_factor是综合因子,包括(调价因子*风控因子)*gsp_factor*industry_factor。其中,(调价因子*风控因子)可以看作一个整体,即对于targetCpa的调控因子;gap_factor指计费比因子,通常由于gsp引起;industry_factor指行业因子,不同行业对于的行业因子有差异,比如直营电商中行业因子包括电商行业因子和人群加权因子;pcvr指预估转化率;pctr指预估点击率。通过对比上述公式(7)、公式(8)和公式(9)可知,GMV<Cost,是(调价因子*风控因子)*gsp_factor*industry_factor*pcvr*pctr这些因子的乘积偏高导致的。因子本身越准确对ECPM造成的影响越小,不过这些因子各具特点,其预估的准确性也各不相同,需要从对应的待处理资源的数据本身出发去分析各因子的准确程度。Among them, targetCpa is the conversion bid of the advertiser, and all_factor is a comprehensive factor, including (price adjustment factor*risk control factor)*gsp_factor*industry_factor. Among them, (price adjustment factor * risk control factor) can be regarded as a whole, that is, the regulation factor for targetCpa; gap_factor refers to the billing ratio factor, usually caused by gsp; industry_factor refers to industry factors, and different industries have different industry factors, such as Industry factors in direct e-commerce include e-commerce industry factors and crowd weighting factors; pcvr refers to estimated conversion rate; pctr refers to estimated click-through rate. By comparing the above formula (7), formula (8) and formula (9), we can see that GMV<Cost is caused by the high product of (price adjustment factor*risk control factor)*gsp_factor*industry_factor*pcvr*pctr. The more accurate the factor itself is, the smaller the impact it will have on ECPM. However, these factors have their own characteristics, and the accuracy of their predictions is also different. It is necessary to analyze the accuracy of each factor from the data of the corresponding resources to be processed.
具体的,计算机设备会先确定待处理资源Sr的预估转化率、预估计费比因子、预估点击率;然后从日志信息中获取待处理资源Sr的实际调价因子、实际风控因子、实际转化率、实际计费比因子、实际点击率以及实际行业因子。其中,预估转化率、预估计费比因子、预估点击率可以通过各自对应的预测模型来确定,预测模型可以选用深度学习模型。然后,计算机设备可以根据实际调价因子和实际风控因子的乘积,确定待处理资源Sr的调控值;计算机设备可以根据预估转化率和实际转化率的比值,确定待处理资源Sr的转化率比值;计算机设备可以根据预期计费比因子和实际计费比因子的比值,确定待处理资源Sr的计费比因子比值;计算机设备可以根据预估点击率和实际点击率的比值,确定待处理资源Sr的点击率比值;计算机设备可以根据行业因子,确定待处理资源Sr的行业因子值。Specifically, the computer device will first determine the estimated conversion rate, estimated cost ratio factor, and estimated click rate of the resource Sr to be processed; then obtain the actual price adjustment factor, actual risk control factor, and Actual conversion rate, actual billing ratio factor, actual click-through rate, and actual industry factor. Wherein, the estimated conversion rate, the estimated cost ratio factor, and the estimated click-through rate can be determined through respective corresponding prediction models, and the prediction model can be a deep learning model. Then, the computer device can determine the control value of the resource Sr to be processed according to the product of the actual price adjustment factor and the actual risk control factor; the computer device can determine the conversion rate ratio of the resource Sr to be processed according to the ratio of the estimated conversion rate to the actual conversion rate ; The computer device can determine the ratio of the billing ratio factor of the resource Sr to be processed according to the ratio of the expected billing ratio factor to the actual billing ratio factor; the computer device can determine the resource to be processed according to the ratio of the estimated click rate to the actual click rate The click rate ratio of Sr; the computer device can determine the industry factor value of the resource Sr to be processed according to the industry factor.
步骤S303,获取取值区间。Step S303, acquiring a value range.
具体的,上述调控值、转化率比值、计费比因子比值、点击率比值以及行业因子的理想取值均为1,因此可以对包含1的范围进行划分,得到取值区间。一个可行的区域划分方式为[0 0.5 0.7. 0.9 1.0 1.1 1.3 ∞],其中,∞指无限大。划分后取值区间可以包括[0,0.5),[0.5,0.7),[0.7,0.9),[0.9,1.0),[1.0,1.1),[1.1,1.3),[1.3,∞)。Specifically, the ideal values of the above control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor are all 1, so the range including 1 can be divided to obtain the value range. A feasible area division method is [0 0.5 0.7. 0.9 1.0 1.1 1.3 ∞], where ∞ means infinite. The divided value range may include [0, 0.5), [0.5, 0.7), [0.7, 0.9), [0.9, 1.0), [1.0, 1.1), [1.1, 1.3), [1.3, ∞).
步骤S304,在所述待处理资源集合中,获取转化数大于或等于第一转化 阈值且所述转化数小于第二转化阈值的待处理资源,作为第一转化资源;所述第二转化阈值大于所述第一转化阈值。Step S304, in the set of resources to be processed, obtain resources to be processed whose conversion number is greater than or equal to the first conversion threshold and whose conversion number is less than the second conversion threshold, as the first conversion resource; the second conversion threshold is greater than The first conversion threshold.
具体的,不同的历史业务资源划分粒度对应的第一转化阈值和第二转化阈值可以不同,不同的行业广告场景下,第一转化阈值和第二转化阈值也可以不同。Specifically, the first conversion threshold and the second conversion threshold corresponding to different historical business resource division granularities may be different, and in different industrial advertising scenarios, the first conversion threshold and the second conversion threshold may also be different.
步骤S305,根据所述第一转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和所述取值区间,确定第一调控分析比值、第一转化率分析比值、第一计费比因子分析比值、第一点击率分析比值以及第一行业因子分析比值。Step S305, according to the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor and the value range corresponding to the first conversion resource, determine the first control analysis ratio, the first conversion rate analysis Ratio, First Billing Ratio Factor Analysis Ratio, First Click Rate Analysis Ratio, and First Industry Factor Analysis Ratio.
具体的,假设取值区间包括取值区间Gb,b为小于或等于取值区间总数量的正整数,则计算机设备根据第一转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和取值区间,确定第一调控分析比值、第一转化率分析比值、第一计费比因子分析比值、第一点击率分析比值以及第一行业因子分析比值的过程,可以为:计算机设备确定第一转化资源中对应的调控值属于取值区间Gb的待处理资源的第一资源数量,将第一资源数量和第一转化资源中待处理资源的总数量的比值,作为取值区间Gb对应的第一调控分析比值;计算机设备确定低转化资源中对应的转化率比值属于取值区间Gb的待处理资源的第二资源数量,将第二资源数量和第一转化资源中待处理资源的总数量的比值,作为取值区间Gb对应的第一转化率分析比值;计算机设备确定第一转化资源中对应的计费比因子比值属于取值区间Gb的待处理资源的第三资源数量,将第三资源数量和第一转化资源中待处理资源的总数量的比值,作为取值区间Gb对应的第一计费比因子分析比值;计算机设备确定第一转化资源中对应的点击率比值属于取值区间Gb的待处理资源的第四资源数量,将第四资源数量和第一转化资源中待处理资源的总数量的比值,作为取值区间Gb对应的第一点击率分析比值;计算机设备确定低转化资源中对应的行业因子属于取值区间Gb的待处理资源的第五资源数量,将第五资源数量和第一转化资源中待处理资源的总数量的比值,作为取值区间Gb对应的第一行业因子分析比值。Specifically, assuming that the value interval includes the value interval Gb, and b is a positive integer less than or equal to the total number of value intervals, then the computer device calculates the first conversion resource corresponding to the control value, conversion rate ratio, billing ratio factor ratio, Click-through rate ratio, industry factor and value interval, the process of determining the first control analysis ratio, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio and the first industry factor analysis ratio, It may be as follows: the computer device determines the first resource quantity of the resource to be processed whose corresponding control value in the first conversion resource belongs to the value interval Gb, and the ratio of the first resource quantity to the total quantity of the resource to be processed in the first conversion resource, As the first regulation and analysis ratio corresponding to the value interval Gb; the computer device determines that the conversion rate ratio corresponding to the low-conversion resource belongs to the second resource quantity of the resource to be processed in the value interval Gb, and the second resource quantity and the first conversion resource The ratio of the total quantity of resources to be processed is used as the first conversion rate analysis ratio corresponding to the value range Gb; the computer device determines that the corresponding billing ratio factor ratio in the first conversion resource belongs to the value range Gb of the resources to be processed Three resource quantities, the ratio of the third resource quantity and the total quantity of resources to be processed in the first conversion resource is used as the first billing ratio factor analysis ratio corresponding to the value interval Gb; the computer device determines the corresponding first conversion resource The click rate ratio belongs to the fourth resource quantity of resources to be processed in the value interval Gb, and the ratio of the fourth resource quantity to the total quantity of resources to be processed in the first conversion resource is used as the first click rate analysis corresponding to the value interval Gb Ratio; the computer equipment determines that the corresponding industry factor in the low conversion resource belongs to the fifth resource quantity of the resource to be processed in the value interval Gb, and the ratio of the fifth resource quantity and the total quantity of the resource to be processed in the first conversion resource is taken as The factor analysis ratio of the first industry corresponding to the value interval Gb.
步骤S306,在所述待处理资源集合中,获取转化数大于或等于所述第二转化阈值的待处理资源,作为第二转化资源。Step S306, in the set of resources to be processed, obtain the resources to be processed whose conversion number is greater than or equal to the second conversion threshold, as the second conversion resources.
步骤S307,根据所述第二转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和所述取值区间,确定第二调控分析比值、第二转化率分析比值、第二计费比因子分析比值、第二点击率分析比值以及第二行业因子分析比值。Step S307, according to the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor and the value range corresponding to the second conversion resource, determine the second control analysis ratio and the second conversion rate analysis Ratio, Second Billing Ratio Factor Analysis Ratio, Second Click Rate Analysis Ratio, and Second Industry Factor Analysis Ratio.
具体的,步骤S306和步骤S307的具体实现,可以参见上述步骤S304和步骤S306,这里不再进行赘述。Specifically, for the specific implementation of step S306 and step S307, reference may be made to the above-mentioned step S304 and step S306, which will not be repeated here.
步骤S308,对所述第一调控分析比值、第一转化率分析比值、第一计费 比因子分析比值、第一点击率分析比值、第一行业因子分析比值、第二调控分析比值、第二转化率分析比值、第二计费比因子分析比值、第二点击率分析比值以及第二行业因子分析比值进行分析处理,确定用于调整展示期望收入的影响因子;所述影响因子包括预估转化率和行业因子,所述预估转化率和所述行业因子共同用于生成所述预估联合因子。Step S308, analyzing the first control analysis ratio, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio, the first industry factor analysis ratio, the second regulation analysis ratio, the second Conversion rate analysis ratio, the second billing ratio factor analysis ratio, the second click-through rate analysis ratio and the second industry factor analysis ratio are analyzed and processed to determine the impact factor used to adjust the expected income of the display; the impact factor includes estimated conversion rate and an industry factor, and the estimated conversion rate and the industry factor are jointly used to generate the estimated joint factor.
为更好的理解上述步骤S301-步骤S308,下述以资源粒度、账户粒度、集团粒度分别对直营电商场景下的广告集合进行分析为例进行说明。In order to better understand the above step S301-step S308, the analysis of the advertisement set in the direct-operated e-commerce scenario at the resource granularity, account granularity, and group granularity is used as an example to illustrate.
首先从资源粒度的角度分析各因子的偏差过程如下:First, analyze the deviation process of each factor from the perspective of resource granularity as follows:
关于调价因子*风控因子:研究调价因子的范围,可将调价因子*风控因子作为一个整体。首先,拿出所有的GMV<Cost的广告(即上述待处理资源),分为下面两种情况:About the price adjustment factor * risk control factor: To study the scope of the price adjustment factor, the price adjustment factor * risk control factor can be taken as a whole. First, take out all the advertisements with GMV<Cost (that is, the above-mentioned resources to be processed), which are divided into the following two situations:
转化数大于等于1、且小于6的广告(即第一转化资源):Advertisements with conversions greater than or equal to 1 and less than 6 (i.e. the first conversion resource):
调价因子*风控因子(即上述调控值)的均值:1.0055626343375126。The average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0055626343375126.
调控值落在对应取值区间的占比如下表1所示:The proportion of the control value falling in the corresponding value range is shown in Table 1 below:
表1Table 1
Figure PCTCN2022087839-appb-000005
Figure PCTCN2022087839-appb-000005
转化数大于等于6的广告(即第二转化资源):Ads with conversions greater than or equal to 6 (that is, the second conversion resource):
调价因子*风控因子(即上述调控值)的均值:1.0151019986530374。The average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0151019986530374.
调控值落在对应取值区间的占比如下表2所示:The proportion of the control value falling in the corresponding value range is shown in Table 2 below:
表2Table 2
Figure PCTCN2022087839-appb-000006
Figure PCTCN2022087839-appb-000006
关于预估转化率pcvr:调研pcvr预估的准确性,比较pcvr与广告实际转化率,确定转化率比值与1的接近程度。转化率比值越接近1,说明pcvr越接近实际转化率,pcvr的准确率越高。About the estimated conversion rate pcvr: investigate the accuracy of pcvr estimation, compare pcvr and the actual conversion rate of the advertisement, and determine the closeness of the conversion rate ratio to 1. The closer the conversion rate ratio is to 1, the closer the pcvr is to the actual conversion rate, and the higher the accuracy of pcvr.
首先,拿出所有的GMV<Cost的广告,分为下面两种情况:First, take out all the advertisements with GMV<Cost, which are divided into the following two situations:
转化数大于等于1、且小于6的广告(即第一转化资源):Advertisements with conversions greater than or equal to 1 and less than 6 (i.e. the first conversion resource):
转化率比值的均值:1.2929644284281734。Mean conversion rate ratio: 1.2929644284281734.
转化率比值落在对应取值区间的占比如下表3所示:The proportion of the conversion rate ratio falling within the corresponding value range is shown in Table 3 below:
表3table 3
Figure PCTCN2022087839-appb-000007
Figure PCTCN2022087839-appb-000007
转化数大于等于6的广告(即第二转化资源):Ads with conversions greater than or equal to 6 (that is, the second conversion resource):
转化率比值的均值:1.0258310512901163。Mean conversion rate ratio: 1.0258310512901163.
转化率比值落在对应取值区间的占比如下表4所示:The proportion of the conversion rate ratio falling within the corresponding value range is shown in Table 4 below:
表4Table 4
Figure PCTCN2022087839-appb-000008
Figure PCTCN2022087839-appb-000008
关于计费比因子gsp_factor:调研gsp_factor的准确性,利用预估的gsp_factor(上述预估计费比因子)除以实际统计的gsp_factor(上述实际计费比因子),得到计费比因子比值,确定该计费比因子比值与1的接近程度。计费比因子比值越接近1,说明预估的gsp_factor准确率越高。Regarding the billing ratio factor gsp_factor: investigate the accuracy of gsp_factor, divide the estimated gsp_factor (the above-mentioned estimated cost ratio factor) by the actual statistical gsp_factor (the above-mentioned actual billing ratio factor) to obtain the ratio of the billing ratio factor, and determine How close the billing ratio factor is to 1. The closer the billing ratio factor ratio is to 1, the higher the accuracy of the estimated gsp_factor is.
首先,拿出所有的GMV<Cost的广告,分为下面两种情况:First, take out all the advertisements with GMV<Cost, which are divided into the following two situations:
转化数大于等于1、且小于6的广告(即第一转化资源):Advertisements with conversions greater than or equal to 1 and less than 6 (i.e. the first conversion resource):
计费比因子比值均值:1.005334324931539。The average ratio of billing ratio factor: 1.005334324931539.
计费比因子比值落在对应取值区间的占比如下表5所示:The ratio of the billing ratio factor falls within the corresponding value range as shown in Table 5 below:
表5table 5
Figure PCTCN2022087839-appb-000009
Figure PCTCN2022087839-appb-000009
转化数大于等于6的广告(即第二转化资源):Ads with conversions greater than or equal to 6 (that is, the second conversion resource):
计费比因子比值均值:0.9984545757710465。The average ratio of the billing ratio factor: 0.9984545757710465.
计费比因子比值落在对应取值区间的占比如下表6所示:The ratio of the billing ratio factor falls within the corresponding value range as shown in Table 6 below:
表6Table 6
Figure PCTCN2022087839-appb-000010
Figure PCTCN2022087839-appb-000010
关于预估点击率pctr:调研pctr预估的准确性,计算预估的pctr与实际点击率的比值,得到点击率比值,确定该点击率比值与1接近的程度。点击率比值越接近1,说明预估的pctr准确率越高。About the estimated click-through rate pctr: Investigate the accuracy of pctr estimates, calculate the ratio of the estimated pctr to the actual click-through rate, obtain the click-through rate ratio, and determine the degree to which the click-through rate ratio is close to 1. The closer the click-through rate ratio is to 1, the higher the accuracy of the estimated pctr.
首先,拿出所有的GMV<Cost的广告,分为下面两种情况:First, take out all the advertisements with GMV<Cost, which are divided into the following two situations:
转化数大于等于1、且小于6的广告(即第一转化资源):Advertisements with conversions greater than or equal to 1 and less than 6 (i.e. the first conversion resource):
点击率比值均值:1.030886270371121。Average CTR ratio: 1.030886270371121.
点击率比值落在对应取值区间的占比如下表7所示:The proportion of the click-through rate ratio falling within the corresponding value range is shown in Table 7 below:
表7Table 7
Figure PCTCN2022087839-appb-000011
Figure PCTCN2022087839-appb-000011
转化数大于等于6的广告(即第二转化资源):Ads with conversions greater than or equal to 6 (that is, the second conversion resource):
点击率比值均值:1.0017268891863382。Average CTR ratio: 1.0017268891863382.
点击率比值落在对应取值区间的占比如下表8所示:The proportion of the click-through rate ratio falling within the corresponding value range is shown in Table 8 below:
表8Table 8
Figure PCTCN2022087839-appb-000012
Figure PCTCN2022087839-appb-000012
关于行业因子:计算出行业因子(电商行业因子*人群加权因子)之后,确定行业因子与1接近的程度。About the industry factor: After calculating the industry factor (e-commerce industry factor * crowd weighting factor), determine the degree to which the industry factor is close to 1.
首先,拿出所有的GMV<Cost的广告,分为下面两种情况:First, take out all the advertisements with GMV<Cost, which are divided into the following two situations:
转化数大于等于1、且小于6的广告(即第一转化资源):Advertisements with conversions greater than or equal to 1 and less than 6 (i.e. the first conversion resource):
行业因子的均值:1.1719941122774267。The mean of the industry factor: 1.1719941122774267.
行业因子落在对应取值区间的占比如下表9所示:The proportion of industry factors falling within the corresponding value range is shown in Table 9 below:
表9Table 9
Figure PCTCN2022087839-appb-000013
Figure PCTCN2022087839-appb-000013
转化数大于等于6的广告(即第二转化资源):Ads with conversions greater than or equal to 6 (that is, the second conversion resource):
行业因子的均值:1.1646948281407095。The mean of the industry factor: 1.1646948281407095.
行业因子落在对应取值区间的占比如下表10所示:The proportion of industry factors falling in the corresponding value range is shown in Table 10 below:
表10Table 10
Figure PCTCN2022087839-appb-000014
Figure PCTCN2022087839-appb-000014
从上面这些数据,可以很明显地看出pcvr这个因子预估不准确在低转化数的广告中非常明显,pcvr落在[1.3,∞)的比例高达0.35306299,pcvr的均值也大于1,说明pcvr往往会大于实际转化率,会带来一定程度的爆成本,即实际消耗远大于预期总花费;不过在转化数较高的广告中,pcvr预估相对准确,pcvr落在[0.7,0.9)、[0.9,1.0)、[1.0,1.1)的比例相对较高,pcvr的均值也比较接近1。行业因子不管是在低转化数的广告还是高转化数的广告中,落在[1.1,1.3)和[1.3,∞)的占比较大,且行业因子的均值都明显大于1,会导致爆成本。根据其他因子在取值区间的占比和均值,可以确定其对ECPM的影响较小,可以理解为预估的准确率较高。From the above data, it can be clearly seen that the inaccurate estimation of the pcvr factor is very obvious in advertisements with low conversion numbers. The proportion of pcvr falling in [1.3, ∞) is as high as 0.35306299, and the average value of pcvr is also greater than 1, indicating that pcvr It is often greater than the actual conversion rate, which will bring a certain degree of explosive cost, that is, the actual consumption is much greater than the expected total cost; however, in the advertisement with a high conversion number, the pcvr estimation is relatively accurate, and the pcvr falls in [0.7, 0.9), The proportions of [0.9, 1.0), [1.0, 1.1) are relatively high, and the mean value of pcvr is also relatively close to 1. Whether the industry factor is in the advertisement with low conversion number or the advertisement with high conversion number, the proportion of [1.1, 1.3) and [1.3, ∞) is relatively large, and the average value of the industry factor is obviously greater than 1, which will lead to explosive costs . According to the proportion and average value of other factors in the value range, it can be determined that their impact on ECPM is small, which can be understood as a higher prediction accuracy.
然后从账户粒度的角度分析各因子的偏差过程如下:Then analyze the deviation process of each factor from the perspective of account granularity as follows:
关于调价因子*风控因子:研究调价因子的范围,可将调价因子*风控因子作为一个整体。首先,拿出所有的GMV<Cost的账户(同一账户对应的一个或多个广告,即为上述一个待处理资源),分为下面两种情况:About the price adjustment factor * risk control factor: To study the scope of the price adjustment factor, the price adjustment factor * risk control factor can be taken as a whole. First, take out all accounts with GMV<Cost (one or more advertisements corresponding to the same account, that is, the resource to be processed above), which can be divided into the following two cases:
转化数大于等于1、且小于10的账户(即一个账户对应的广告的转化数大于等于1且小于10):Accounts with a conversion number greater than or equal to 1 and less than 10 (that is, the conversion number of an advertisement corresponding to an account is greater than or equal to 1 and less than 10):
调价因子*风控因子(即上述调控值)的均值:1.0270681225915883。The average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0270681225915883.
调控值落在对应取值区间的占比如下表11所示:The proportion of the control value falling within the corresponding value range is shown in Table 11 below:
表11Table 11
Figure PCTCN2022087839-appb-000015
Figure PCTCN2022087839-appb-000015
Figure PCTCN2022087839-appb-000016
Figure PCTCN2022087839-appb-000016
转化数大于等于10的账户:Accounts with conversions greater than or equal to 10:
调价因子*风控因子(即上述调控值)的均值:1.0163434608061954。The average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0163434608061954.
调控值落在对应取值区间的占比如下表12所示:The proportion of the control value falling within the corresponding value range is shown in Table 12 below:
表12Table 12
Figure PCTCN2022087839-appb-000017
Figure PCTCN2022087839-appb-000017
关于预估转化率pcvr:调研同一账户下广告pcvr预估的准确性,将预估的pcvr与实际转化率做比较,确定转化率比值与1的接近程度。转化率比值越接近1,说明pcvr越接近实际转化率,pcvr的准确率越高。About the estimated conversion rate pcvr: investigate the accuracy of advertising pcvr estimates under the same account, compare the estimated pcvr with the actual conversion rate, and determine how close the conversion rate ratio is to 1. The closer the conversion rate ratio is to 1, the closer the pcvr is to the actual conversion rate, and the higher the accuracy of pcvr.
首先,拿出所有的GMV<Cost的账户,分为下面两种情况:First, take out all accounts with GMV<Cost, which can be divided into the following two situations:
转化数大于等于1、且小于10的账户:Accounts with conversions greater than or equal to 1 and less than 10:
转化率比值的均值:1.5795300001182175。Mean conversion rate ratio: 1.5795300001182175.
转化率比值落在对应取值区间的占比如下表13所示:The proportion of the conversion rate ratio falling within the corresponding value range is shown in Table 13 below:
表13Table 13
Figure PCTCN2022087839-appb-000018
Figure PCTCN2022087839-appb-000018
转化数大于等于10的账户:Accounts with conversions greater than or equal to 10:
转化率比值的均值:1.1302273285339024。Mean conversion ratio: 1.1302273285339024.
转化率比值落在对应取值区间的占比如下表14所示:The proportion of the conversion rate ratio falling within the corresponding value range is shown in Table 14 below:
表14Table 14
Figure PCTCN2022087839-appb-000019
Figure PCTCN2022087839-appb-000019
关于计费比因子gsp_factor:调研gsp_factor的准确性,将预估的gsp_factor(上述预估计费比因子)与实际统计的gsp_factor(上述实际计费比因子)相比,得到计费比因子比值,确定计费比因子比值与1的接近程度。 计费比因子比值越接近1,说明预估的gsp_factor准确率越高。About the billing ratio factor gsp_factor: investigate the accuracy of gsp_factor, and compare the estimated gsp_factor (the above-mentioned estimated cost ratio factor) with the actual statistical gsp_factor (the above-mentioned actual billing ratio factor) to obtain the ratio of the billing ratio factor, Determines how close the billing ratio factor ratio is to 1. The closer the billing ratio factor ratio is to 1, the higher the accuracy of the estimated gsp_factor is.
首先,拿出所有的GMV<Cost的账户,分为下面两种情况:First, take out all accounts with GMV<Cost, which can be divided into the following two situations:
转化数大于等于1、且小于10的账户:Accounts with conversions greater than or equal to 1 and less than 10:
计费比因子比值均值:1.0063416604925397。The average ratio of billing ratio factor: 1.0063416604925397.
计费比因子比值落在对应取值区间的占比如下表15所示:The ratio of the billing ratio factor falling within the corresponding value range is shown in Table 15 below:
表15Table 15
Figure PCTCN2022087839-appb-000020
Figure PCTCN2022087839-appb-000020
转化数大于等于10的账户:Accounts with conversions greater than or equal to 10:
计费比因子比值均值:1.0013801378217713。The average ratio of billing ratio factor: 1.0013801378217713.
计费比因子比值落在对应取值区间的占比如下表16所示:The ratio of the billing ratio factor falling within the corresponding value range is shown in Table 16 below:
表16Table 16
Figure PCTCN2022087839-appb-000021
Figure PCTCN2022087839-appb-000021
关于预估点击率pctr:调研pctr预估的准确性,将预估的pctr与实际点击率相比,得到点击率比值,确定点击率比值与1接近的程度。点击率比值越接近1,说明预估的pctr准确率越高。About the estimated click-through rate pctr: investigate the accuracy of pctr estimation, compare the estimated pctr with the actual click-through rate, obtain the click-through rate ratio, and determine the degree to which the click-through rate ratio is close to 1. The closer the click-through rate ratio is to 1, the higher the accuracy of the estimated pctr.
首先,拿出所有的GMV<Cost的账户,分为下面两种情况:First, take out all accounts with GMV<Cost, which can be divided into the following two situations:
转化数大于等于1、且小于10的账户:Accounts with conversions greater than or equal to 1 and less than 10:
点击率比值均值:1.030300842027766。Average CTR ratio: 1.030300842027766.
点击率比值落在对应取值区间的占比如下表17所示:The proportion of the click-through rate ratio falling within the corresponding value range is shown in Table 17 below:
表17Table 17
Figure PCTCN2022087839-appb-000022
Figure PCTCN2022087839-appb-000022
转化数大于等于10的账户:Accounts with conversions greater than or equal to 10:
点击率比值均值:1.0189707247565336。Average CTR ratio: 1.0189707247565336.
点击率比值落在对应取值区间的占比如下表18所示:The proportion of the click-through rate ratio falling within the corresponding value range is shown in Table 18 below:
表18Table 18
Figure PCTCN2022087839-appb-000023
Figure PCTCN2022087839-appb-000023
关于行业因子:计算出行业因子(电商行业因子*人群加权因子)之后,确定行业因子与1接近的程度。About the industry factor: After calculating the industry factor (e-commerce industry factor * crowd weighting factor), determine the degree to which the industry factor is close to 1.
首先,拿出所有的GMV<Cost的账户,分为下面两种情况:First, take out all accounts with GMV<Cost, which can be divided into the following two situations:
转化数大于等于1、且小于10的账户:Accounts with conversions greater than or equal to 1 and less than 10:
行业因子的均值:1.057864621100866。The mean of the industry factor: 1.057864621100866.
行业因子落在对应取值区间的占比如下表19所示:The proportion of industry factors falling within the corresponding value range is shown in Table 19 below:
表19Table 19
Figure PCTCN2022087839-appb-000024
Figure PCTCN2022087839-appb-000024
转化数大于等于10的账户:Accounts with conversions greater than or equal to 10:
行业因子的均值:1.0942537338421843。The mean of the industry factor: 1.0942537338421843.
行业因子落在对应取值区间的占比如下表20所示:The proportion of industry factors falling within the corresponding value range is shown in Table 20 below:
表20Table 20
Figure PCTCN2022087839-appb-000025
Figure PCTCN2022087839-appb-000025
从上面这些数据中,可以很明显地看出pcvr这个因子预估不准确在低转化数的账户中非常明显,而且是高估,会带来一定程度的爆成本;在转化数较高的账户中,情况稍好,但也存在较大的偏差。行业因子同样不管是在低转化数还是高转化数上,都有一定的偏差,而且明显大于其他的因子。From the above data, it can be clearly seen that the inaccurate estimation of the pcvr factor is very obvious in accounts with low conversion numbers, and it is overestimated, which will bring a certain degree of explosion costs; in accounts with higher conversion numbers In , the situation is slightly better, but there are also large deviations. The industry factor also has a certain deviation whether it is low conversion number or high conversion number, and it is obviously larger than other factors.
最后从集团粒度的角度分析各因子的偏差过程如下:Finally, the process of analyzing the deviation of each factor from the perspective of group granularity is as follows:
关于调价因子*风控因子:研究调价因子的范围,可将调价因子*风控因子作为一个整体。首先,拿出所有的GMV<Cost的集团(同一集团对应的一个或多个广告,即为上述一个待处理资源),分为下面两种情况:About the price adjustment factor * risk control factor: To study the scope of the price adjustment factor, the price adjustment factor * risk control factor can be taken as a whole. First, take out all groups with GMV<Cost (one or more advertisements corresponding to the same group, that is, the resource to be processed above), which can be divided into the following two situations:
转化数大于等于1、且小于10的集团(即一个集团对应的广告的转化数大于等于1且小于10):A group whose conversion number is greater than or equal to 1 and less than 10 (that is, the conversion number of an advertisement corresponding to a group is greater than or equal to 1 and less than 10):
调价因子*风控因子(即上述调控值)的均值:1.034216183755146。The average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.034216183755146.
调控值落在对应取值区间的占比如下表21所示:The proportion of the control value falling within the corresponding value range is shown in Table 21 below:
表21Table 21
Figure PCTCN2022087839-appb-000026
Figure PCTCN2022087839-appb-000026
转化数大于等于10的集团:Groups with conversions greater than or equal to 10:
调价因子*风控因子(即上述调控值)的均值:1.0296622775279651。The average value of price adjustment factor * risk control factor (that is, the above-mentioned control value): 1.0296622775279651.
调控值落在对应取值区间的占比如下表22所示:The proportion of the control value falling within the corresponding value range is shown in Table 22 below:
表22Table 22
Figure PCTCN2022087839-appb-000027
Figure PCTCN2022087839-appb-000027
关于预估转化率pcvr:调研同一集团下广告pcvr预估的准确性,将预估的pcvr与实际转化率做比较,确定转化率比值与1的接近程度。转化率比值越接近1,说明pcvr越接近实际转化率,pcvr的准确率越高。About the estimated conversion rate pcvr: investigate the accuracy of the estimated pcvr of advertising under the same group, compare the estimated pcvr with the actual conversion rate, and determine the closeness of the conversion rate ratio to 1. The closer the conversion rate ratio is to 1, the closer the pcvr is to the actual conversion rate, and the higher the accuracy of pcvr.
首先,拿出所有的GMV<Cost的集团,分为下面两种情况:First, take out all the GMV<Cost groups, which are divided into the following two situations:
转化数大于等于1、且小于10的集团:Groups with conversion numbers greater than or equal to 1 and less than 10:
转化率比值的均值:1.5079355403667978。Mean conversion ratio: 1.5079355403667978.
转化率比值落在对应取值区间的占比如下表23所示:The proportion of the conversion rate ratio falling within the corresponding value range is shown in Table 23 below:
表23Table 23
Figure PCTCN2022087839-appb-000028
Figure PCTCN2022087839-appb-000028
转化数大于等于10的集团:Groups with conversions greater than or equal to 10:
转化率比值的均值:1.1648124771184638。Mean conversion rate ratio: 1.1648124771184638.
转化率比值落在对应取值区间的占比如下表24所示:The proportion of the conversion rate ratio falling within the corresponding value range is shown in Table 24 below:
表24Table 24
Figure PCTCN2022087839-appb-000029
Figure PCTCN2022087839-appb-000029
关于计费比因子gsp_factor:调研gsp_factor的准确性,将预估的gsp_factor(上述预估计费比因子)与实际统计的gsp_factor(上述实际计费比因子)相比,得到计费比因子比值,确定计费比因子比值与1的接近程度。计费比因子比值越接近1,说明预估的gsp_factor准确率越高。About the billing ratio factor gsp_factor: investigate the accuracy of gsp_factor, and compare the estimated gsp_factor (the above-mentioned estimated cost ratio factor) with the actual statistical gsp_factor (the above-mentioned actual billing ratio factor) to obtain the ratio of the billing ratio factor, Determines how close the billing ratio factor ratio is to 1. The closer the billing ratio factor ratio is to 1, the higher the accuracy of the estimated gsp_factor is.
首先,拿出所有的GMV<Cost的集团,分为下面两种情况:First, take out all the GMV<Cost groups, which are divided into the following two situations:
转化数大于等于1、且小于10的集团:Groups with conversion numbers greater than or equal to 1 and less than 10:
计费比因子比值均值:1.007077182551767。Billing ratio factor ratio mean: 1.007077182551767.
计费比因子比值落在对应取值区间的占比如下表25所示:The proportion of the billing ratio factor falling within the corresponding value range is shown in Table 25 below:
表25Table 25
Figure PCTCN2022087839-appb-000030
Figure PCTCN2022087839-appb-000030
转化数大于等于10的集团:Groups with conversions greater than or equal to 10:
计费比因子比值均值:1.0015587163937008。The average ratio of billing ratio factor: 1.0015587163937008.
计费比因子比值落在对应取值区间的占比如下表26所示:The ratio of the billing ratio factor falls within the corresponding value range as shown in Table 26 below:
表26Table 26
Figure PCTCN2022087839-appb-000031
Figure PCTCN2022087839-appb-000031
关于预估点击率pctr:调研pctr预估的准确性,将预估的pctr与实际点 击率相比,得到点击率比值,确定点击率比值与1接近的程度。点击率比值越接近1,说明预估的pctr准确率越高。About the estimated click-through rate pctr: investigate the accuracy of pctr estimates, compare the estimated pctr with the actual click-through rate, obtain the click-through rate ratio, and determine the degree to which the click-through rate ratio is close to 1. The closer the click-through rate ratio is to 1, the higher the accuracy of the estimated pctr.
首先,拿出所有的GMV<Cost的集团,分为下面两种情况:First, take out all the GMV<Cost groups, which are divided into the following two situations:
转化数大于等于1、且小于10的集团:Groups with conversion numbers greater than or equal to 1 and less than 10:
点击率比值均值:1.0260427036634276。Average CTR ratio: 1.0260427036634276.
点击率比值落在对应取值区间的占比如下表27所示:The proportion of the click-through rate ratio falling within the corresponding value range is shown in Table 27 below:
表27Table 27
Figure PCTCN2022087839-appb-000032
Figure PCTCN2022087839-appb-000032
转化数大于等于10的集团:Groups with conversions greater than or equal to 10:
点击率比值均值:1.0238124641852109。Average CTR ratio: 1.0238124641852109.
点击率比值落在对应取值区间的占比如下表28所示:The proportion of the click-through rate ratio falling within the corresponding value range is shown in Table 28 below:
表28Table 28
Figure PCTCN2022087839-appb-000033
Figure PCTCN2022087839-appb-000033
关于行业因子:计算出行业因子(电商行业因子*人群加权因子)之后,确定行业因子与1接近的程度。About the industry factor: After calculating the industry factor (e-commerce industry factor * crowd weighting factor), determine the degree to which the industry factor is close to 1.
首先,拿出所有的GMV<Cost的集团,分为下面两种情况:First, take out all the GMV<Cost groups, which are divided into the following two situations:
转化数大于等于1、且小于10的集团:Groups with conversion numbers greater than or equal to 1 and less than 10:
行业因子的均值:0.9840837406933998。The mean of the industry factor: 0.9840837406933998.
行业因子落在对应取值区间的占比如下表29所示:The proportion of industry factors falling within the corresponding value range is shown in Table 29 below:
表29Table 29
Figure PCTCN2022087839-appb-000034
Figure PCTCN2022087839-appb-000034
转化数大于等于10的集团:Groups with conversions greater than or equal to 10:
行业因子的均值:1.0061536337623453。The mean of the industry factor: 1.0061536337623453.
行业因子落在对应取值区间的占比如下表30所示:The proportion of industry factors falling within the corresponding value range is shown in Table 30 below:
表30Table 30
Figure PCTCN2022087839-appb-000035
Figure PCTCN2022087839-appb-000035
从分集团来看,造成GMV/COST偏低的主要因子是pcvr的高估,其余的因子偏差较小。From the perspective of subgroups, the main factor causing the low GMV/COST is the overestimation of pcvr, and the deviations of other factors are relatively small.
通过上述从三个粒度的角度对各个因子的偏差进行分析的过程,发现影响ECPM最大的因子包括PCVR预估和行业因子。为了能让广告的成本接近广告主出价,需要对PCVR和行业因子进行联合校准,即将PCVR和行业因子的乘积作为联合因子,然后通过上述图3所对应实施例中的方法,对其进行校准。Through the above-mentioned process of analyzing the deviation of each factor from the perspective of three granularities, it is found that the factors that most affect ECPM include PCVR estimation and industry factors. In order to make the cost of the advertisement close to the advertiser's bid, it is necessary to jointly calibrate the PCVR and the industry factor, that is, the product of the PCVR and the industry factor is used as the joint factor, and then it is calibrated by the method in the embodiment corresponding to Figure 3 above.
通过本申请实施例提供的方法,可以从不同的粒度去分析某个特定行业场景下的广告中对ECPM影响较大的因子,然后可以将分析出的因子的乘积作为联合因子,通过上述图3所对应实施例提供的方法对联合因子进行校准,从而得到更准确的ECPM值,提高ECPM的预估准确率。Through the method provided by the embodiment of this application, the factors that have a greater impact on ECPM in advertisements in a specific industry scenario can be analyzed from different granularities, and then the product of the analyzed factors can be used as a joint factor, through the above Figure 3 The method provided in the corresponding embodiment calibrates the joint factors, so as to obtain a more accurate ECPM value and improve the prediction accuracy of ECPM.
进一步地,请参见图7,图7是本申请实施例提供的一种数据校准装置的结构示意图。上述数据校准装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该数据校准装置为一个应用软件;该装置可以用于执行本申请实施例提供的方法中的相应步骤。如图7所示,该数据校准装置可以包括:获取模块11、划分模块12、聚合统计模块13、有效数据确定模块14以及校准模块15。Further, please refer to FIG. 7 , which is a schematic structural diagram of a data calibration device provided by an embodiment of the present application. The above-mentioned data calibration device may be a computer program (including program code) running on a computer device, for example, the data calibration device is an application software; the device may be used to execute the corresponding steps in the method provided by the embodiment of the present application. As shown in FIG. 7 , the data calibration device may include: an acquisition module 11 , a division module 12 , an aggregation statistics module 13 , a valid data determination module 14 and a calibration module 15 .
获取模块11,用于获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数;An acquisition module 11, configured to acquire resource attribute information of target business resources under N resource attribute types; N is a positive integer;
划分模块12,用于基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;每个组合类型对应的资源属性类型均属于N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数;The division module 12 is used to cluster the historical business resources in the historical business resource set based on the S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to N resources Attribute type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is positive integer;
聚合统计模块13,用于基于H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集;聚合数据集包括H个历史业务资源子集合各自对应的转化数和联合因子;该转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,该联合因子是根据对应的历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定的;Aggregation statistical module 13, is used for carrying out aggregation statistical processing based on H historical business resource sub-collections respectively, obtains aggregated data set; Aggregate data set includes H historical business resource sub-collections respectively corresponding conversion number and joint factor; This conversion number is It is determined according to the respective conversion numbers of each historical business resource in the corresponding historical business resource subset, and the joint factor is determined according to the corresponding estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset;
有效数据确定模块14,用于根据资源属性信息,在聚合数据集中获取针 对目标业务资源的有效转化数和有效联合因子;Effective data determination module 14, is used for according to resource attribute information, obtains the effective transformation number and the effective joint factor for target business resource in aggregate data set;
校准模块15,用于根据有效转化数和有效联合因子确定校准系数,根据校准系数对目标业务资源的预估联合因子进行校准;预估联合因子是根据目标业务资源的预估转化率和行业因子确定的。The calibration module 15 is used to determine the calibration coefficient according to the effective conversion number and the effective joint factor, and calibrate the estimated joint factor of the target business resource according to the calibration coefficient; the estimated joint factor is based on the estimated conversion rate and industry factor of the target business resource definite.
其中,获取模块11、划分模块12、聚合统计模块13、有效数据确定模块14以及校准模块15的具体功能实现方式可以参见图3对应实施例中的步骤S101-步骤S105的具体描述,这里不再进行赘述。Wherein, the specific function implementation of the acquisition module 11, the division module 12, the aggregation statistics module 13, the effective data determination module 14 and the calibration module 15 can refer to the specific description of steps S101-step S105 in the corresponding embodiment in FIG. to repeat.
其中,S个组合类型包括组合类型Mi,i为小于或等于S的正整数;历史业务资源集合包括历史业务资源Td,d为小于或等于历史业务资源集合中历史业务资源的总数量的正整数;Wherein, the S combination types include the combination type Mi, i is a positive integer less than or equal to S; the historical business resource set includes the historical business resource Td, and d is a positive integer less than or equal to the total number of historical business resources in the historical business resource set ;
请再参见图7,划分模块12可以包括:组合确定单元121以及子集合确定单元122。Referring to FIG. 7 again, the dividing module 12 may include: a combination determining unit 121 and a subset determining unit 122 .
组合确定单元121,用于将组合类型Mi中包含的资源属性类型确定为目标资源属性类型;A combination determining unit 121, configured to determine the resource attribute type contained in the combination type Mi as the target resource attribute type;
组合确定单元121,还用于将历史业务资源集合中历史业务资源Td与目标资源属性类型相关联的历史资源属性信息,确定为历史业务资源Td的历史资源属性组合;The combination determining unit 121 is further configured to determine the historical resource attribute information associated with the historical business resource Td in the historical business resource set and the target resource attribute type as the historical resource attribute combination of the historical business resource Td;
子集合确定单元122,用于在历史业务资源集合中,将历史资源属性组合相同的各历史业务资源添加至同一个历史业务资源子集合,得到组合类型Mi对应的一个或多个历史业务资源子集合;The sub-set determining unit 122 is configured to add each historical business resource with the same historical resource attribute combination to the same historical business resource sub-set in the historical business resource set, and obtain one or more historical business resource sub-sets corresponding to the combination type Mi gather;
子集合确定单元122,还用于利用每个组合类型对应的一个或多个历史业务资源子集合,组成H个历史业务资源子集合。The subset determination unit 122 is further configured to use one or more historical service resource subsets corresponding to each combination type to form H historical service resource subsets.
其中,组合确定单元121以及子集合确定单元122的具体功能实现方式可以参见图3对应实施例中的步骤S102的具体描述,这里不再进行赘述。For the specific function implementation manners of the combination determining unit 121 and the subset determining unit 122, reference may be made to the specific description of step S102 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
其中,H个历史业务资源子集合包括历史业务资源子集合Nj,j为小于或等于H的正整数;Wherein, the H historical business resource subsets include the historical business resource subset Nj, j is a positive integer less than or equal to H;
请再参见图7,聚合统计模块13可以包括:第一确定单元131、第二确定单元132、消耗获取单元133、充分判断单元134以及数据集生成单元135。Please refer to FIG. 7 again, the aggregation statistics module 13 may include: a first determination unit 131 , a second determination unit 132 , a consumption acquisition unit 133 , a sufficient determination unit 134 and a data set generation unit 135 .
第一确定单元131,用于确定历史业务资源子集合Nj对应的第一单位转化数和第一单位联合因子;第一单位转化数是根据历史业务资源子集合Nj中各历史业务资源在第一单位时长内的转化数确定的,第一单位联合因子是根据历史业务资源子集合Nj中各历史业务资源在第一单位时长内的预估转化率和行业因子确定的;The first determining unit 131 is configured to determine the first unit conversion number and the first unit combination factor corresponding to the historical business resource subset Nj; the first unit conversion number is based on the historical business resources in the historical business resource subset Nj in the first The number of conversions per unit time is determined, and the first unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the first unit time;
第二确定单元132,用于确定历史业务资源子集合Nj对应的第二单位转化数和第二单位联合因子;第二单位转化数是根据历史业务资源子集合Nj中各历史业务资源在第二单位时长内的转化数确定的,第二单位联合因子是根据历史业务资源子集合Nj中各历史业务资源在第二单位时长内的预估转 化率和行业因子确定的;第二单位时长大于第一单位时长;The second determination unit 132 is configured to determine the second unit conversion number and the second unit combination factor corresponding to the historical business resource subset Nj; the second unit conversion number is based on the historical business resources in the historical business resource subset Nj in the second The number of conversions per unit time is determined, and the second unit joint factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the second unit time; the second unit time is greater than the first a unit of time;
消耗获取单元133,用于获取历史业务资源子集合Nj的第一单位消耗数据;A consumption acquisition unit 133, configured to acquire the first unit consumption data of the historical service resource subset Nj;
充分判断单元134,用于若第一单位消耗数据属于充分消耗数据,则将第一单位转化数作为历史业务资源子集合Nj对应的转化数,将第一单位联合因子作为历史业务资源子集合Nj对应的联合因子;The sufficient judgment unit 134 is configured to use the first unit conversion number as the conversion number corresponding to the historical business resource subset Nj if the first unit consumption data belongs to the sufficient consumption data, and use the first unit combination factor as the historical business resource subset Nj The corresponding joint factor;
充分判断单元134,还用于若第一单位消耗数据属于不充分消耗数据,则将第二单位转化数作为历史业务资源子集合Nj对应的转化数,将第二单位联合因子作为历史业务资源子集合Nj对应的联合因子;The sufficient judging unit 134 is further configured to use the second unit conversion number as the conversion number corresponding to the historical business resource subset Nj if the first unit consumption data belongs to insufficient consumption data, and use the second unit joint factor as the historical business resource sub-set The joint factor corresponding to the set Nj;
数据集生成单元135,用于当获取到H个历史业务资源子集合各自对应的转化数和联合因子时,生成包括H个历史业务资源子集合各自对应的转化数和联合因子的聚合数据集。The data set generation unit 135 is configured to generate an aggregated data set including the conversion numbers and the joint factors corresponding to the H historical business resource subsets respectively when the respective conversion numbers and joint factors corresponding to the H historical business resource subsets are obtained.
其中,第一确定单元131、第二确定单元132、消耗获取单元133、充分判断单元134以及数据集生成单元135的具体功能实现方式可以参见图5对应实施例中的步骤S203-步骤S206的具体描述,这里不再进行赘述。Wherein, the specific function implementation of the first determination unit 131, the second determination unit 132, the consumption acquisition unit 133, the sufficient determination unit 134, and the data set generation unit 135 can refer to the specifics of steps S203-step S206 in the corresponding embodiment in FIG. 5 description, and will not be repeated here.
请再参见图7,第一确定单元131可以包括:第一信息获取子单元1311以及第一数据确定子单元1312。Referring to FIG. 7 again, the first determination unit 131 may include: a first information acquisition subunit 1311 and a first data determination subunit 1312 .
第一信息获取子单元1311,用于将历史业务资源子集合Nj中各历史业务资源确定为待统计历史业务资源,获取待统计历史业务资源的日志信息;The first information acquisition subunit 1311 is configured to determine each historical business resource in the historical business resource subset Nj as a historical business resource to be counted, and obtain log information of the historical business resource to be counted;
第一信息获取子单元1311,还用于从所述日志信息中获取所述待统计历史业务资源在所述第一单位时长内的转化数、预估转化率以及行业因子;The first information obtaining subunit 1311 is further configured to obtain the conversion number, estimated conversion rate and industry factor of the historical business resource to be counted within the first unit duration from the log information;
第一数据确定子单元1312,用于将待统计历史业务资源在第一单位时长内的预估转化率和行业因子相乘,得到待统计历史业务资源在第一单位时长内的联合因子;对各待统计历史业务资源在第一单位时长内的联合因子进行求和处理,得到历史业务资源子集合Nj对应的第一单位联合因子;The first data determination subunit 1312 is used to multiply the estimated conversion rate of the historical business resources to be counted within the first unit time length and the industry factor to obtain the joint factor of the historical business resources to be counted within the first unit time length; Summing the joint factors of the historical business resources to be counted within the first unit duration to obtain the first unit joint factor corresponding to the historical business resource subset Nj;
第一数据确定子单元1312,用于对各待统计历史业务资源在第一单位时长内的转化数,进行求和处理,得到历史业务资源子集合Nj对应的第一单位转化数。The first data determining subunit 1312 is configured to sum the conversion numbers of the historical business resources to be counted within the first unit duration to obtain the first unit conversion numbers corresponding to the historical business resource subset Nj.
其中,第一信息获取子单元1311以及第一数据确定子单元1312的具体功能实现方式可以参见图5对应实施例中的步骤S203的具体描述,这里不再进行赘述。Wherein, the specific function implementation manners of the first information acquiring subunit 1311 and the first data determining subunit 1312 can refer to the specific description of step S203 in the embodiment corresponding to FIG. 5 , which will not be repeated here.
请再参见图7,第二确定单元132可以包括:第二信息获取子单元1321、第二数据确定子单元1322以及衰减处理子单元1323。Referring to FIG. 7 again, the second determination unit 132 may include: a second information acquisition subunit 1321 , a second data determination subunit 1322 , and an attenuation processing subunit 1323 .
第二信息获取子单元1321,用于将历史业务资源子集合Nj中各历史业务资源确定为待统计历史业务资源,获取待统计历史业务资源的日志信息;The second information acquisition subunit 1321 is configured to determine each historical business resource in the historical business resource subset Nj as the historical business resource to be counted, and obtain the log information of the historical business resource to be counted;
第二信息获取子单元1321,还用于确定第二单位时长;The second information acquiring subunit 1321 is also used to determine the second unit duration;
第二信息获取子单元1321,还用于基于第一单位时长对第二单位时长进 行划分,得到至少两个统计时段;每个统计时段的时长小于或等于第一单位时长;The second information acquisition subunit 1321 is also used to divide the second unit duration based on the first unit duration to obtain at least two statistical periods; the duration of each statistical period is less than or equal to the first unit duration;
第二信息获取子单元1321,还用于从日志信息中获取各待统计历史业务资源在每个统计时段内的转化数、预估转化率以及行业因子;The second information obtaining subunit 1321 is also used to obtain the conversion number, estimated conversion rate and industry factor of each historical business resource to be counted within each statistical period from the log information;
第二数据确定子单元1322,用于针对每个所述统计时段,对该待统计历史业务资源在该统计时段内的预估转化率以及行业因子进行相乘处理,生成待统计历史业务资源在该统计时段内的联合因子;对各待统计历史业务资源各自在该统计时段内的联合因子进行求和处理,得到历史业务资源子集合Nj在该统计时段内的统计时段联合因子;The second data determination subunit 1322 is used for multiplying the estimated conversion rate of the historical business resource to be counted within the statistical time period and the industry factor for each statistical period to generate the historical business resource to be counted in The joint factor in the statistical period; summing the joint factors of the historical business resources to be counted in the statistical period to obtain the statistical period joint factor of the historical business resource subset Nj in the statistical period;
第二数据确定子单元1322,还用于针对每个统计时段,对各待统计历史业务资源各自在该统计时段内的转化数进行求和处理,得到历史业务资源子集合Nj在该统计时段内的统计时段转化数。The second data determination subunit 1322 is also used to sum the conversion numbers of the historical business resources to be counted within the statistical period for each statistical period, and obtain the historical business resource subset Nj within the statistical period Conversions for the statistics period.
衰减处理子单元1323,用于根据时间衰减策略,对历史业务资源子集合Nj分别在每个统计时段内的统计时段转化数和统计时段联合因子进行处理,得到历史业务资源子集合Nj对应的第二单位转化数和第二联合因子。The attenuation processing subunit 1323 is used to process the statistical period conversion numbers and statistical period joint factors of the historical business resource subset Nj in each statistical period according to the time decay strategy, and obtain the first historical service resource subset Nj corresponding to Two-unit transformation number and second joint factor.
其中,第二信息获取子单元1321、第二数据确定子单元1322以及衰减处理子单元1323的具体功能实现方式可以参见图5对应实施例中的步骤S204的具体描述,这里不再进行赘述。For the implementation of specific functions of the second information acquisition subunit 1321 , the second data determination subunit 1322 and the attenuation processing subunit 1323 , please refer to the specific description of step S204 in the embodiment corresponding to FIG. 5 , which will not be repeated here.
其中,至少两个统计时段包括统计时段Lk,k为小于或等于至少两个统计时段的总数量的正整数;统计时段Lk的起始时间早于统计时段Lk+1;Wherein, at least two statistical periods include a statistical period Lk, k is a positive integer less than or equal to the total number of at least two statistical periods; the start time of the statistical period Lk is earlier than the statistical period Lk+1;
衰减处理子单元具体用于:根据时间衰减因子、至少两个统计时段的总数量、以及统计时段Lk的起始时间在至少两个统计时段中的正向排列顺序,对统计时段Lk内的统计时段转化数和统计时段联合因子分别进行衰减处理,得到衰减转化数和衰减联合因子;对每个统计时段内的衰减转化数进行求和处理,得到历史业务资源子集合Nj对应的第二单位转化数;对每个统计时段内的衰减联合因子进行求和处理,得到历史业务资源子集合Nj对应的第二单位联合因子。The attenuation processing subunit is specifically used to: according to the time attenuation factor, the total number of at least two statistical periods, and the forward sequence of the start time of the statistical period Lk in at least two statistical periods, the statistics within the statistical period Lk The number of period conversions and the joint factor of the statistical period are respectively attenuated to obtain the number of attenuation conversions and the joint factor of attenuation; the number of attenuation conversions in each statistical period is summed to obtain the second unit conversion corresponding to the historical business resource subset Nj number; the attenuation joint factors in each statistical period are summed to obtain the second unit joint factor corresponding to the historical service resource subset Nj.
其中,第一单位消耗数据是历史业务资源子集合Nj中各历史业务资源在第一单位时长内的消耗数据的总和;Wherein, the first unit consumption data is the sum of the consumption data of each historical business resource in the historical business resource subset Nj within the first unit duration;
请再参见图7,数据校准装置1还可以包括:消耗确定模块16。Referring to FIG. 7 again, the data calibration device 1 may further include: a consumption determination module 16 .
消耗确定模块16,用于获取转化交易价值数据,根据转化交易价值数据,确定充分数据阈值;The consumption determination module 16 is used to obtain the conversion transaction value data, and determine the sufficient data threshold according to the conversion transaction value data;
消耗确定模块16,还用于若第一单位消耗数据大于充分数据阈值,则确定第一单位消耗数据属于充分消耗数据;The consumption determination module 16 is further configured to determine that the first unit consumption data belongs to sufficient consumption data if the first unit consumption data is greater than the sufficient data threshold;
消耗确定模块16,还用于若第一单位消耗数据小于或等于充分数据阈值,则确定第一单位消耗数据属于不充分消耗数据。The consumption determination module 16 is further configured to determine that the first unit consumption data belongs to insufficient consumption data if the first unit consumption data is less than or equal to the sufficient data threshold.
其中,消耗确定模块16的具体功能实现方式可以参见图5对应实施例中 的步骤S205的具体描述,这里不再进行赘述。Wherein, the specific function implementation of the consumption determination module 16 can refer to the specific description of step S205 in the embodiment corresponding to FIG. 5 , which will not be repeated here.
请再参见图7,有效数据确定模块14可以包括:组合提取单元141、数据查找单元142以及有效确定单元143。Referring to FIG. 7 again, the valid data determination module 14 may include: a combination extraction unit 141 , a data search unit 142 and a valid determination unit 143 .
组合提取单元141,用于基于S个组合类型,从资源属性信息中提取S个资源属性组合;A combination extraction unit 141, configured to extract S resource attribute combinations from the resource attribute information based on the S combination types;
数据查找单元142,用于在聚合数据集中查找S个资源属性组合各自对应的转化数和联合因子;A data search unit 142, configured to search the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregated data set;
有效确定单元143,用于在S个资源属性组合各自对应的转化数和联合因子中,确定针对目标业务资源的有效转化数和有效联合因子。The effective determination unit 143 is configured to determine the effective conversion number and the effective combination factor for the target service resource among the conversion numbers and the combination factors corresponding to the S resource attribute combinations.
其中,组合提取单元141、数据查找单元142以及有效确定单元143的具体功能实现方式可以参见图3对应实施例中的步骤S104的具体描述,这里不再进行赘述。For the implementation of specific functions of the combination extraction unit 141 , the data search unit 142 and the validity determination unit 143 , refer to the specific description of step S104 in the embodiment corresponding to FIG. 3 , and details are not repeated here.
其中,S个资源属性组合包括资源属性组合Za,a为小于或等于S的正整数;Wherein, the S resource attribute combinations include the resource attribute combination Za, and a is a positive integer less than or equal to S;
请再参见图7,数据查找单元142可以包括:匹配确定子单元1421以及数据获取子单元1422。Referring to FIG. 7 again, the data searching unit 142 may include: a matching determining subunit 1421 and a data obtaining subunit 1422 .
匹配确定子单元1421,用于在H个历史业务资源子集合对应的历史资源属性组合中,查找与资源属性组合Za相同的历史业务资源子集合,作为匹配子集合;The matching determination subunit 1421 is used to search for the same historical service resource subset as the resource attribute combination Za in the historical resource attribute combinations corresponding to the H historical service resource subsets as the matching subset;
数据获取子单元1422,用于在聚合数据集中获取与匹配子集合对应的转化数和联合因子,作为资源属性组合Za对应的转化数和联合因子。The data obtaining subunit 1422 is configured to obtain the conversion number and the combination factor corresponding to the matching subset in the aggregated data set as the conversion number and the combination factor corresponding to the resource attribute combination Za.
其中,匹配确定子单元1421以及数据获取子单元1422的具体功能实现方式可以参见图3对应实施例中的步骤S104的具体描述,这里不再进行赘述。Wherein, the specific function implementation manners of the matching determination subunit 1421 and the data acquisition subunit 1422 can refer to the specific description of step S104 in the embodiment corresponding to FIG. 3 , and details are not repeated here.
请再参见图7,有效确定单元143可以包括:优先级确定子单元1431、有效性确定子单元1432以及有效数据确定子单元1433。Referring to FIG. 7 again, the validity determination unit 143 may include: a priority determination subunit 1431 , a validity determination subunit 1432 and a valid data determination subunit 1433 .
优先级确定子单元1431,用于根据S个组合类型的优先级,确定S个资源属性组合各自对应的转化数和联合因子的优先级;The priority determination subunit 1431 is configured to determine the priority of the conversion numbers and joint factors corresponding to each of the S resource attribute combinations according to the priorities of the S combination types;
有效性确定子单元1432,用于根据S个资源属性组合各自对应的消耗数据,确定S个资源属性组合各自对应的转化数和联合因子的有效性;The validity determining subunit 1432 is configured to determine the validity of the conversion numbers and joint factors corresponding to each of the S resource attribute combinations according to the consumption data corresponding to each of the S resource attribute combinations;
有效数据确定子单元1433,用于将S个资源属性组合各自对应的转化数和联合因子中具有有效性的转化数和联合因子确定为候选转化数和候选联合因子,将优先级最高的候选转化数和候选联合因子,作为针对目标业务资源的有效转化数和有效联合因子。The effective data determination subunit 1433 is used to determine the effective conversion numbers and combination factors among the corresponding conversion numbers and combination factors of the S resource attribute combinations as candidate conversion numbers and candidate combination factors, and select the candidate conversion numbers with the highest priority The number of conversions and the candidate joint factor are used as the effective conversion number and effective joint factor for the target business resource.
其中,优先级确定子单元1431、有效性确定子单元1432以及有效数据确定子单元1433的具体功能实现方式可以参见图3对应实施例中的步骤S104的具体描述,这里不再进行赘述。For the implementation of specific functions of the priority determination subunit 1431 , the validity determination subunit 1432 and the valid data determination subunit 1433 , please refer to the specific description of step S104 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
请再参见图7,数据校准装置1还可以包括:请求接收模块17以及阶段 确定模块18。Please refer to Fig. 7 again, the data calibration device 1 may also include: a request receiving module 17 and a stage determining module 18.
请求接收模块17,用于接收针对目标业务资源的校准请求;A request receiving module 17, configured to receive a calibration request for a target service resource;
阶段确定模块18,用于确定目标业务资源的推广阶段; Stage determination module 18, used to determine the promotion stage of target business resources;
阶段确定模块18,还用于若目标业务资源的推广阶段为初始推广阶段,则响应目标业务资源的校准请求,执行获取目标业务资源在N个资源属性类型下的资源属性信息的步骤。The stage determination module 18 is further configured to respond to the calibration request of the target service resource and execute the step of acquiring resource attribute information of the target service resource under N resource attribute types if the promotion stage of the target service resource is the initial promotion stage.
其中,请求接收模块17以及阶段确定模块18的具体功能实现方式可以参见图3对应实施例中的步骤S105的具体描述,这里不再进行赘述。Wherein, the specific function implementation manners of the request receiving module 17 and the stage determining module 18 can refer to the specific description of step S105 in the embodiment corresponding to FIG. 3 , which will not be repeated here.
请再参见图7,数据校准装置1还可以包括:影响因子确定模块19。Referring to FIG. 7 again, the data calibration device 1 may further include: an impact factor determination module 19 .
影响因子确定模块19,用于在历史业务资源集合中,根据历史业务资源划分粒度,将预期消耗数据小于实际消耗数据的历史业务资源作为待处理资源,将待处理资源添加至待处理资源集合中;待处理资源集合中包括待处理资源Sr,r为小于或等于待处理资源集合中待处理资源的总数量的正整数;The impact factor determination module 19 is used to, in the set of historical business resources, according to the division granularity of historical business resources, use the historical business resources whose expected consumption data is smaller than the actual consumption data as resources to be processed, and add the resources to be processed to the set of resources to be processed ; The resource set to be processed includes the resource Sr to be processed, and r is a positive integer less than or equal to the total number of resources to be processed in the resource set to be processed;
影响因子确定模块19,还用于根据待处理资源Sr的调价因子、风控因子、预估转化率、实际转化率、预估计费比因子、实际计费比因子、预估点击率、实际点击率以及行业因子,确定待处理资源Sr的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子;The impact factor determination module 19 is also used to adjust the price according to the resource Sr to be processed, the risk control factor, the estimated conversion rate, the actual conversion rate, the estimated cost ratio factor, the actual billing ratio factor, the estimated click rate, the actual Click rate and industry factor, determine the control value of the resource Sr to be processed, conversion rate ratio, billing ratio factor ratio, click rate ratio and industry factor;
影响因子确定模块19,还用于当确定出每个待处理资源分别对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子时,获取取值区间;The impact factor determination module 19 is also used to obtain the value range when the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor corresponding to each resource to be processed are determined;
影响因子确定模块19,还用于在待处理资源集合中,获取转化数大于或等于第一转化阈值且转化数小于第二转化阈值的待处理资源,作为第一转化资源;第二转化阈值大于第一转化阈值;The impact factor determination module 19 is also used to obtain resources to be processed whose conversion number is greater than or equal to the first conversion threshold and whose conversion number is less than the second conversion threshold in the set of resources to be processed, as the first conversion resource; the second conversion threshold is greater than first conversion threshold;
影响因子确定模块19,还用于根据第一转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和取值区间,确定第一调控分析比值、第一转化率分析比值、第一计费比因子分析比值、第一点击率分析比值以及第一行业因子分析比值;The impact factor determination module 19 is also used to determine the first control analysis ratio, the first conversion rate, and the first conversion rate according to the control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor and value interval corresponding to the first conversion resource. Rate Analysis Ratio, First Billing Ratio Factor Analysis Ratio, First Click Rate Analysis Ratio, and First Industry Factor Analysis Ratio;
影响因子确定模块19,还用于在待处理资源集合中,获取转化数大于或等于第二转化阈值的待处理资源,作为第二转化资源;The impact factor determination module 19 is also used to acquire resources to be processed whose conversion number is greater than or equal to the second conversion threshold in the resource set to be processed, as the second converted resource;
影响因子确定模块19,还用于根据第二转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和取值区间,确定第二调控分析比值、第二转化率分析比值、第二计费比因子分析比值、第二点击率分析比值以及第二行业因子分析比值;The influence factor determination module 19 is also used to determine the second regulation analysis ratio, the second conversion rate analysis ratio, the second billing ratio factor analysis ratio, the second click rate analysis ratio and the second industry factor analysis ratio;
影响因子确定模块19,还用于对第一调控分析比值、第一转化率分析比值、第一计费比因子分析比值、第一点击率分析比值、第一行业因子分析比值、第二调控分析比值、第二转化率分析比值、第二计费比因子分析比值、第二点击率分析比值以及第二行业因子分析比值进行分析处理,确定用于调 整展示期望收入的影响因子;影响因子包括预估转化率和行业因子,预估转化率和行业因子共同用于生成预估联合因子。The impact factor determination module 19 is also used to analyze the ratio of the first regulation analysis, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio, the first industry factor analysis ratio, and the second regulation analysis Ratio, the second conversion rate analysis ratio, the second billing ratio factor analysis ratio, the second click-through rate analysis ratio and the second industry factor analysis ratio are analyzed and processed to determine the impact factor used to adjust the expected income of the display; the impact factors include forecast The estimated conversion rate and industry factors are used together to generate an estimated joint factor.
其中,影响因子确定模块19的具体功能实现方式可以参见图6对应实施例中的步骤S301-步骤S308的具体描述,这里不再进行赘述。For the specific function implementation of the impact factor determining module 19 , refer to the specific description of steps S301 - S308 in the embodiment corresponding to FIG. 6 , which will not be repeated here.
进一步地,请参见图8,图8是本申请实施例提供的一种计算机设备的结构示意图。如图8所示,上述图7所对应实施例中的数据校准装置1可以应用于上述计算机设备1000,上述计算机设备1000可以包括:处理器1001,网络接口1004和存储器1005,此外,上述计算机设备1000还包括:用户接口1003,和至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。存储器1005可选的还可以是至少一个位于远离前述处理器1001的存储装置。如图8所示,作为一种计算机可读存储介质的存储器1005中可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。Further, please refer to FIG. 8 , which is a schematic structural diagram of a computer device provided by an embodiment of the present application. As shown in Figure 8, the data calibration device 1 in the above-mentioned embodiment corresponding to Figure 7 can be applied to the above-mentioned computer equipment 1000, and the above-mentioned computer equipment 1000 can include: a processor 1001, a network interface 1004 and a memory 1005, in addition, the above-mentioned computer equipment 1000 also includes: a user interface 1003 , and at least one communication bus 1002 . Wherein, the communication bus 1002 is used to realize connection and communication between these components. Wherein, the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. Optionally, the network interface 1004 may include a standard wired interface and a wireless interface (such as a WI-FI interface). The memory 1005 can be a high-speed RAM memory, or a non-volatile memory, such as at least one disk memory. Optionally, the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 . As shown in FIG. 8 , the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
在图8所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现本申请实施例提供的数据校准方法。In the computer device 1000 shown in FIG. 8 , the network interface 1004 can provide a network communication function; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005 program to implement the data calibration method provided in the embodiment of this application.
应当理解,本申请实施例中所描述的计算机设备1000可执行前文各个实施例中对该数据校准方法的描述,也可执行前文图7所对应实施例中对该数据校准装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。It should be understood that the computer equipment 1000 described in the embodiment of the present application can execute the description of the data calibration method in the previous embodiments, and can also execute the description of the data calibration device 1 in the embodiment corresponding to FIG. 7 above. This will not be repeated here. In addition, the description of the beneficial effect of adopting the same method will not be repeated here.
此外,这里需要指出的是:本申请实施例还提供了一种计算机可读存储介质,且上述计算机可读存储介质中存储有前文提及的数据校准装置1所执行的计算机程序,当上述处理器加载并执行上述计算机程序时,能够执行前文任一实施例对上述数据校准方法的描述,因此,这里将不再进行赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。对于本申请所涉及的计算机可读存储介质实施例中未披露的技术细节,请参照本申请方法实施例的描述。In addition, it should be pointed out here that the embodiment of the present application also provides a computer-readable storage medium, and the above-mentioned computer-readable storage medium stores the computer program executed by the data calibration device 1 mentioned above. When the computer loads and executes the above-mentioned computer program, it can execute the description of the above-mentioned data calibration method in any of the above-mentioned embodiments, so details will not be repeated here. In addition, the description of the beneficial effect of adopting the same method will not be repeated here. For the technical details not disclosed in the embodiments of the computer-readable storage medium involved in the present application, please refer to the description of the method embodiments of the present application.
上述计算机可读存储介质可以是前述任一实施例提供的数据校准装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介 质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。The above-mentioned computer-readable storage medium may be the data calibration device provided in any of the foregoing embodiments or an internal storage unit of the above-mentioned computer equipment, such as a hard disk or memory of the computer equipment. The computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (smart media card, SMC), a secure digital (secure digital, SD) card, Flash card (flash card), etc. Further, the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device. The computer-readable storage medium is used to store the computer program and other programs and data required by the computer device. The computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。The above disclosures are only preferred embodiments of the present application, which certainly cannot limit the scope of the present application. Therefore, equivalent changes made according to the claims of the present application still fall within the scope of the present application.

Claims (16)

  1. 一种数据校准方法,由计算机设备执行,包括:A data calibration method, performed by a computer device, comprising:
    获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数;Obtain the resource attribute information of the target business resource under N resource attribute types; N is a positive integer;
    基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;每个组合类型对应的资源属性类型均属于所述N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数;Based on the S combination types, each historical business resource in the historical business resource set is clustered to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to the N resource attribute types; a The historical resource attribute combination of each historical business resource in the historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer;
    基于所述H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集;所述聚合数据集包括所述H个历史业务资源子集合各自对应的转化数和联合因子;所述转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,所述联合因子是根据对应的历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定的;Based on the H sub-sets of historical business resources, aggregate statistical processing is performed respectively to obtain an aggregated data set; the aggregated data set includes the conversion numbers and joint factors corresponding to each of the H historical business resource sub-sets; the conversion numbers are It is determined according to the respective conversion numbers of each historical business resource in the corresponding historical business resource subset, and the joint factor is determined according to the corresponding estimated conversion rate and industry factor of each historical business resource in the corresponding historical business resource subset ;
    根据所述资源属性信息,在所述聚合数据集中获取针对所述目标业务资源的有效转化数和有效联合因子;According to the resource attribute information, obtain the effective conversion number and effective combination factor for the target business resource in the aggregated data set;
    根据所述有效转化数和所述有效联合因子确定校准系数,根据所述校准系数对所述目标业务资源的预估联合因子进行校准;所述预估联合因子是根据所述目标业务资源的预估转化率和行业因子确定的。Determine the calibration coefficient according to the effective conversion number and the effective joint factor, and calibrate the estimated joint factor of the target service resource according to the calibration coefficient; the estimated joint factor is based on the estimated joint factor of the target service resource Determined by estimated conversion rate and industry factors.
  2. 根据权利要求1所述的方法,所述S个组合类型包括组合类型Mi,i为小于或等于S的正整数;所述历史业务资源集合包括历史业务资源Td,d为小于或等于所述历史业务资源集合中历史业务资源的总数量的正整数;According to the method according to claim 1, the S combination types include combination types Mi, i is a positive integer less than or equal to S; the set of historical service resources includes historical service resources Td, and d is less than or equal to the historical service resource Td. A positive integer of the total number of historical business resources in the business resource collection;
    所述基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合,包括:According to the S combination types, each historical business resource in the historical business resource set is clustered respectively to obtain H historical business resource subsets, including:
    将所述组合类型Mi中包含的资源属性类型确定为目标资源属性类型;Determining the resource attribute type contained in the combination type Mi as the target resource attribute type;
    将所述历史业务资源集合中所述历史业务资源Td与所述目标资源属性类型相关联的历史资源属性信息,确定为所述历史业务资源Td的历史资源属性组合;determining the historical resource attribute information associated with the historical business resource Td in the historical business resource set and the target resource attribute type as the historical resource attribute combination of the historical business resource Td;
    在所述历史业务资源集合中,将历史资源属性组合相同的各历史业务资源添加至同一个历史业务资源子集合,得到所述组合类型Mi对应的一个或多个历史业务资源子集合;In the set of historical business resources, each historical business resource with the same combination of historical resource attributes is added to the same historical business resource subset to obtain one or more historical business resource subsets corresponding to the combination type Mi;
    利用每个组合类型对应的一个或多个历史业务资源子集合,组成所述H个历史业务资源子集合。One or more historical service resource subsets corresponding to each combination type are used to form the H historical service resource subsets.
  3. 根据权利要求1所述的方法,所述H个历史业务资源子集合包括历史业务资源子集合Nj,j为小于或等于H的正整数;The method according to claim 1, wherein the H historical service resource subsets include a historical service resource subset Nj, where j is a positive integer less than or equal to H;
    所述基于所述H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集,包括:The performing aggregated statistical processing based on the H historical service resource subsets respectively to obtain aggregated data sets, including:
    确定所述历史业务资源子集合Nj对应的第一单位转化数和第一单位联 合因子;所述第一单位转化数是根据所述历史业务资源子集合Nj中各历史业务资源在第一单位时长内的转化数确定的,所述第一单位联合因子是根据所述历史业务资源子集合Nj中各历史业务资源在所述第一单位时长内的预估转化率和行业因子确定的;Determine the first unit conversion number and the first unit joint factor corresponding to the historical business resource subset Nj; the first unit conversion number is based on the first unit duration of each historical business resource in the historical business resource subset Nj The first unit combination factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the first unit duration;
    确定所述历史业务资源子集合Nj对应的第二单位转化数和第二单位联合因子;所述第二单位转化数是根据所述历史业务资源子集合Nj中各历史业务资源在第二单位时长内的转化数确定的,所述第二单位联合因子是根据所述历史业务资源子集合Nj中各历史业务资源在所述第二单位时长内的预估转化率和行业因子确定的;所述第二单位时长大于所述第一单位时长;Determine the second unit conversion number and the second unit joint factor corresponding to the historical business resource subset Nj; the second unit conversion number is based on the second unit duration of each historical business resource in the historical business resource subset Nj The conversion number within the second unit is determined, and the second unit combination factor is determined according to the estimated conversion rate and industry factor of each historical business resource in the historical business resource subset Nj within the second unit duration; the The second unit duration is greater than the first unit duration;
    获取所述历史业务资源子集合Nj的第一单位消耗数据;Acquiring the first unit consumption data of the historical service resource subset Nj;
    若所述第一单位消耗数据属于充分消耗数据,则将所述第一单位转化数作为所述历史业务资源子集合Nj对应的转化数,将所述第一单位联合因子作为所述历史业务资源子集合Nj对应的联合因子;If the first unit consumption data belongs to sufficient consumption data, the first unit conversion number is used as the conversion number corresponding to the historical business resource subset Nj, and the first unit combination factor is used as the historical business resource The joint factor corresponding to the subset Nj;
    若所述第一单位消耗数据属于不充分消耗数据,则将所述第二单位转化数作为所述历史业务资源子集合Nj对应的转化数,将所述第二单位联合因子作为所述历史业务资源子集合Nj对应的联合因子;If the first unit consumption data belongs to insufficient consumption data, use the second unit conversion number as the conversion number corresponding to the historical business resource subset Nj, and use the second unit combination factor as the historical business The joint factor corresponding to the resource subset Nj;
    当获取到所述H个历史业务资源子集合各自对应的转化数和联合因子时,生成包括所述H个历史业务资源子集合各自对应的转化数和联合因子的聚合数据集。When the conversion numbers and joint factors corresponding to the H historical business resource subsets are obtained, an aggregated data set including the respective conversion numbers and joint factors corresponding to the H historical business resource subsets is generated.
  4. 根据权利要求3所述的方法,所述确定所述历史业务资源子集合Nj对应的第一单位转化数和第一单位联合因子,包括:According to the method according to claim 3, said determining the first unit conversion number and the first unit combination factor corresponding to the historical service resource subset Nj includes:
    将所述历史业务资源子集合Nj中各历史业务资源确定为待统计历史业务资源,获取所述待统计历史业务资源的日志信息;Determining each historical business resource in the historical business resource subset Nj as the historical business resource to be counted, and obtaining the log information of the historical business resource to be counted;
    从所述日志信息中获取所述待统计历史业务资源在所述第一单位时长内的转化数、预估转化率以及行业因子;Obtain the number of conversions, estimated conversion rates, and industry factors of the historical business resources to be counted within the first unit duration from the log information;
    将所述待统计历史业务资源在所述第一单位时长内的预估转化率和行业因子相乘,得到所述待统计历史业务资源在所述第一单位时长内的联合因子;对各所述待统计历史业务资源在所述第一单位时长内的联合因子进行求和处理,得到所述历史业务资源子集合Nj对应的第一单位联合因子;Multiply the estimated conversion rate of the historical business resources to be counted within the first unit of time by the industry factor to obtain the joint factor of the historical business resources to be counted within the first unit of time; Summing the combination factors of the historical service resources to be counted within the first unit duration to obtain the first unit combination factor corresponding to the historical service resource subset Nj;
    对各所述待统计历史业务资源在所述第一单位时长内的转化数进行求和处理,得到所述历史业务资源子集合Nj对应的第一单位转化数。The conversion numbers of each historical service resource to be counted within the first unit duration are summed to obtain the first unit conversion number corresponding to the historical service resource subset Nj.
  5. 根据权利要求3所述的方法,所述确定所述历史业务资源子集合Nj对应的第二单位转化数和第二单位联合因子,包括:The method according to claim 3, said determining the second unit conversion number and the second unit combination factor corresponding to the historical service resource subset Nj, comprising:
    将所述历史业务资源子集合Nj中各历史业务资源确定为待统计历史业务资源,获取所述待统计历史业务资源的日志信息;Determining each historical business resource in the historical business resource subset Nj as the historical business resource to be counted, and obtaining the log information of the historical business resource to be counted;
    确定所述第二单位时长;determining the second unit duration;
    基于所述第一单位时长对所述第二单位时长进行划分,得到至少两个统 计时段;每个所述统计时段的时长小于或等于所述第一单位时长;The second unit duration is divided based on the first unit duration to obtain at least two statistical periods; the duration of each statistical period is less than or equal to the first unit duration;
    从所述日志信息中获取各所述待统计历史业务资源在每个统计时段内的转化数、预估转化率以及行业因子;Obtain the number of conversions, estimated conversion rates, and industry factors of each of the historical business resources to be counted within each statistical period from the log information;
    针对每个所述统计时段,对所述待统计历史业务资源在所述统计时段内的预估转化率以及行业因子进行相乘处理,生成所述待统计历史业务资源在所述统计时段内的联合因子;对各所述待统计历史业务资源各自在所述统计时段内的联合因子进行求和处理,得到所述历史业务资源子集合Nj在所述统计时段内的统计时段联合因子;For each statistical period, the estimated conversion rate of the historical business resources to be counted within the statistical period and the industry factor are multiplied to generate the historical business resources to be counted within the statistical period Combining factors; summing the combining factors of each of the historical business resources to be counted in the statistical period to obtain the statistical period combining factors of the historical service resource subset Nj in the statistical period;
    针对每个所述统计时段,对各所述待统计历史业务资源各自在所述统计时段内的转化数进行求和处理,得到所述历史业务资源子集合Nj在所述统计时段内的统计时段转化数;For each statistical period, the conversion numbers of the historical business resources to be counted within the statistical period are summed to obtain the statistical period of the historical business resource subset Nj within the statistical period number of conversions;
    根据时间衰减策略,对所述历史业务资源子集合Nj在每个统计时段内的统计时段转化数和统计时段联合因子进行处理,得到所述历史业务资源子集合Nj对应的第二单位转化数和第二联合因子。According to the time attenuation strategy, the conversion number of the statistical period and the joint factor of the statistical period of the historical business resource subset Nj in each statistical period are processed, and the second unit conversion number and the corresponding second unit conversion number of the historical business resource subset Nj are obtained. Second joint factor.
  6. 根据权利要求5所述的方法,所述至少两个统计时段包括统计时段Lk,k为小于或等于所述至少两个统计时段的总数量的正整数;所述统计时段Lk的起始时间早于统计时段Lk+1;The method according to claim 5, the at least two statistical periods include a statistical period Lk, and k is a positive integer less than or equal to the total number of the at least two statistical periods; the start time of the statistical period Lk is earlier During the statistical period Lk+1;
    所述根据时间衰减策略,对所述历史业务资源子集合Nj分别在每个统计时段内的统计时段转化数和统计时段联合因子进行处理,得到所述历史业务资源子集合Nj对应的第二单位转化数和第二联合因子,包括:According to the time attenuation strategy, the conversion number of the statistical period and the joint factor of the statistical period in each statistical period of the historical service resource subset Nj are respectively processed to obtain the second unit corresponding to the historical service resource subset Nj Conversions and second syndication factors, including:
    根据时间衰减因子、所述至少两个统计时段的总数量、以及所述统计时段Lk的起始时间在所述至少两个统计时段中的正向排列顺序,对所述统计时段Lk内的统计时段转化数和统计时段联合因子分别进行衰减处理,得到衰减转化数和衰减联合因子;According to the time decay factor, the total number of the at least two statistical periods, and the forward order of the start time of the statistical period Lk in the at least two statistical periods, the statistics in the statistical period Lk The period conversion number and the statistical period joint factor are respectively attenuated to obtain the attenuation conversion number and the attenuation joint factor;
    对每个统计时段内的衰减转化数进行求和处理,得到所述历史业务资源子集合Nj对应的第二单位转化数;Summing the attenuation conversion numbers in each statistical period to obtain the second unit conversion numbers corresponding to the historical service resource subset Nj;
    对每个统计时段内的衰减联合因子进行求和处理,得到所述历史业务资源子集合Nj对应的第二单位联合因子。The attenuation combination factors in each statistical period are summed to obtain the second unit combination factor corresponding to the historical service resource subset Nj.
  7. 根据权利要求3所述的方法,所述第一单位消耗数据是所述历史业务资源子集合Nj中各历史业务资源在所述第一单位时长内的消耗数据的总和;According to the method according to claim 3, the first unit consumption data is the sum of the consumption data of each historical service resource in the historical service resource subset Nj within the first unit duration;
    所述方法还包括:The method also includes:
    获取转化交易价值数据,根据所述转化交易价值数据,确定充分数据阈值;Acquiring converted transaction value data, and determining a sufficient data threshold according to the converted transaction value data;
    若所述第一单位消耗数据大于所述充分数据阈值,则确定所述第一单位消耗数据属于所述充分消耗数据;If the first unit consumption data is greater than the sufficient data threshold, determine that the first unit consumption data belongs to the sufficient consumption data;
    若所述第一单位消耗数据小于或等于所述充分数据阈值,则确定所述第一单位消耗数据属于所述不充分消耗数据。If the first unit consumption data is less than or equal to the sufficient data threshold, it is determined that the first unit consumption data belongs to the insufficient consumption data.
  8. 根据权利要求1所述的方法,所述根据所述资源属性信息,在所述聚合数据集中获取针对所述目标业务资源的有效转化数和有效联合因子,包括:According to the method according to claim 1, said acquisition of effective conversion numbers and effective joint factors for said target business resources in said aggregated data set according to said resource attribute information comprises:
    基于所述S个组合类型,从所述资源属性信息中提取S个资源属性组合;Extracting S resource attribute combinations from the resource attribute information based on the S combination types;
    在所述聚合数据集中查找所述S个资源属性组合各自对应的转化数和联合因子;Find the conversion numbers and joint factors corresponding to each of the S resource attribute combinations in the aggregated data set;
    在所述S个资源属性组合各自对应的转化数和联合因子中,确定针对所述目标业务资源的有效转化数和有效联合因子。Among the conversion numbers and joint factors corresponding to the S resource attribute combinations, the effective conversion numbers and effective joint factors for the target service resource are determined.
  9. 根据权利要求8所述的方法,所述S个资源属性组合包括资源属性组合Za,a为小于或等于S的正整数;According to the method according to claim 8, the S resource attribute combinations include a resource attribute combination Za, where a is a positive integer less than or equal to S;
    所述在所述聚合数据集中查找所述S个资源属性组合各自对应的转化数和联合因子,包括:The searching for the corresponding conversion numbers and joint factors of the S resource attribute combinations in the aggregated data set includes:
    在所述H个历史业务资源子集合对应的历史资源属性组合中,查找与所述资源属性组合Za相同的历史业务资源子集合,作为匹配子集合;In the historical resource attribute combinations corresponding to the H historical service resource subsets, search for a historical service resource subset identical to the resource attribute combination Za as a matching subset;
    在所述聚合数据集中获取与所述匹配子集合对应的转化数和联合因子,作为资源属性组合Za对应的转化数和联合因子。The conversion numbers and combination factors corresponding to the matching sub-sets are obtained from the aggregated data set as the conversion numbers and combination factors corresponding to the resource attribute combination Za.
  10. 根据权利要求9所述的方法,所述在所述S个资源属性组合各自对应的转化数和联合因子中,确定针对所述目标业务资源的有效转化数和有效联合因子,包括:According to the method according to claim 9, the determination of the effective conversion number and the effective combination factor for the target service resource among the corresponding conversion numbers and combination factors of the S resource attribute combinations includes:
    根据所述S个组合类型的优先级,确定所述S个资源属性组合各自对应的转化数和联合因子的优先级;According to the priorities of the S combination types, determine the conversion numbers and the priorities of the joint factors corresponding to the S resource attribute combinations;
    根据S个资源属性组合各自对应的消耗数据,确定所述S个资源属性组合各自对应的转化数和联合因子的有效性;According to the consumption data corresponding to each of the S resource attribute combinations, determine the validity of the conversion numbers and the joint factors corresponding to each of the S resource attribute combinations;
    将所述S个资源属性组合各自对应的转化数和联合因子中具有有效性的转化数和联合因子确定为候选转化数和候选联合因子,将优先级最高的候选转化数和候选联合因子,作为针对所述目标业务资源的有效转化数和有效联合因子。Determine the valid conversion numbers and joint factors among the corresponding conversion numbers and joint factors of the S resource attribute combinations as candidate conversion numbers and candidate joint factors, and use the candidate conversion numbers and candidate joint factors with the highest priority as The number of effective conversions and the effective joint factor for the target business resource.
  11. 根据权利要求1所述的方法,所述方法还包括:The method according to claim 1, said method further comprising:
    接收针对所述目标业务资源的校准请求;receiving a calibration request for the target service resource;
    确定所述目标业务资源的推广阶段;Determining the promotion stage of the target business resource;
    若所述目标业务资源的推广阶段为初始推广阶段,则响应所述目标业务资源的校准请求,执行所述获取目标业务资源在N个资源属性类型下的资源属性信息的步骤。If the promotion stage of the target service resource is the initial promotion stage, the step of acquiring resource attribute information of the target service resource under N resource attribute types is performed in response to the calibration request of the target service resource.
  12. 根据权利要求1所述的方法,所述方法还包括:The method according to claim 1, said method further comprising:
    在所述历史业务资源集合中,根据历史业务资源划分粒度,将预期消耗数据小于实际消耗数据的历史业务资源作为待处理资源,将所述待处理资源添加至待处理资源集合中;所述待处理资源集合中包括待处理资源Sr,r为小于或等于所述待处理资源集合中待处理资源的总数量的正整数;In the set of historical business resources, according to the division granularity of historical business resources, the historical business resources whose expected consumption data is less than the actual consumption data are used as resources to be processed, and the resources to be processed are added to the set of resources to be processed; The resource set to be processed includes a resource to be processed Sr, where r is a positive integer less than or equal to the total number of resources to be processed in the resource set to be processed;
    根据所述待处理资源Sr的调价因子、风控因子、预估转化率、实际转化率、预估计费比因子、实际计费比因子、预估点击率、实际点击率以及行业因子,确定所述待处理资源Sr的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子;According to the price adjustment factor, risk control factor, estimated conversion rate, actual conversion rate, estimated cost ratio factor, actual billing ratio factor, estimated click rate, actual click rate, and industry factor of the resource Sr to be processed, determine The control value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio and industry factor of the resource Sr to be processed;
    获取取值区间;Get the value range;
    在所述待处理资源集合中,获取转化数大于或等于第一转化阈值且所述转化数小于第二转化阈值的待处理资源,作为第一转化资源;所述第二转化阈值大于所述第一转化阈值;In the set of resources to be processed, resources to be processed whose conversion number is greater than or equal to a first conversion threshold and whose conversion number is less than a second conversion threshold are acquired as first conversion resources; the second conversion threshold is greater than the first conversion threshold a conversion threshold;
    根据所述第一转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和所述取值区间,确定第一调控分析比值、第一转化率分析比值、第一计费比因子分析比值、第一点击率分析比值以及第一行业因子分析比值;According to the control value, conversion rate ratio, billing ratio factor ratio, click rate ratio, industry factor and the value range corresponding to the first conversion resource, determine the first control analysis ratio, the first conversion rate analysis ratio, and the second conversion rate analysis ratio. 1. Billing ratio factor analysis ratio, first click-through rate analysis ratio and first industry factor analysis ratio;
    在所述待处理资源集合中,获取转化数大于或等于所述第二转化阈值的待处理资源,作为第二转化资源;In the set of resources to be processed, acquire resources to be processed whose conversion numbers are greater than or equal to the second conversion threshold as second converted resources;
    根据所述第二转化资源对应的调控值、转化率比值、计费比因子比值、点击率比值以及行业因子和所述取值区间,确定第二调控分析比值、第二转化率分析比值、第二计费比因子分析比值、第二点击率分析比值以及第二行业因子分析比值;Determine the second control analysis ratio, the second conversion rate analysis ratio, the second conversion rate analysis ratio, and the second conversion rate analysis ratio according to the adjustment value, conversion rate ratio, billing ratio factor ratio, click-through rate ratio, industry factor, and the value range corresponding to the second conversion resource. The second billing ratio factor analysis ratio, the second click-through rate analysis ratio and the second industry factor analysis ratio;
    对所述第一调控分析比值、第一转化率分析比值、第一计费比因子分析比值、第一点击率分析比值、第一行业因子分析比值、第二调控分析比值、第二转化率分析比值、第二计费比因子分析比值、第二点击率分析比值以及第二行业因子分析比值进行分析处理,确定用于调整展示期望收入的影响因子;所述影响因子包括预估转化率和行业因子,所述预估转化率和所述行业因子共同用于生成所述预估联合因子。For the first control analysis ratio, the first conversion rate analysis ratio, the first billing ratio factor analysis ratio, the first click-through rate analysis ratio, the first industry factor analysis ratio, the second control analysis ratio, and the second conversion rate analysis Ratio, the second billing ratio factor analysis ratio, the second click-through rate analysis ratio and the second industry factor analysis ratio are analyzed and processed to determine the influencing factors used to adjust the expected income of the display; the influencing factors include estimated conversion rate and industry factor, the estimated conversion rate and the industry factor are jointly used to generate the estimated joint factor.
  13. 一种数据校准装置,包括:A data calibration device, comprising:
    获取模块,用于获取目标业务资源在N个资源属性类型下的资源属性信息;N为正整数;An acquisition module, configured to acquire resource attribute information of target business resources under N resource attribute types; N is a positive integer;
    划分模块,用于基于S个组合类型分别对历史业务资源集合中的各历史业务资源进行聚类,得到H个历史业务资源子集合;每个组合类型对应的资源属性类型均属于所述N个资源属性类型;一个历史业务资源子集合中各历史业务资源的历史资源属性组合相同,一个历史资源属性组合与一个组合类型对应的资源属性类型相关联;S为小于或等于N的正整数;H为正整数;The division module is used to cluster the historical business resources in the historical business resource set based on the S combination types to obtain H historical business resource subsets; the resource attribute type corresponding to each combination type belongs to the N Resource attribute type; the historical resource attribute combination of each historical business resource in a historical business resource subset is the same, and a historical resource attribute combination is associated with a resource attribute type corresponding to a combination type; S is a positive integer less than or equal to N; H is a positive integer;
    聚合统计模块,用于基于所述H个历史业务资源子集合分别进行聚合统计处理,得到聚合数据集;所述聚合数据集包括所述H个历史业务资源子集合各自对应的转化数和联合因子;所述转化数是根据对应的历史业务资源子集合中各历史业务资源各自的转化数确定的,所述联合因子是根据对应的历史业务资源子集合中各历史业务资源各自对应的预估转化率和行业因子确定 的;An aggregation statistics module, configured to perform aggregation statistics processing based on the H historical business resource subsets respectively to obtain an aggregated data set; the aggregated data set includes the respective conversion numbers and joint factors of the H historical business resource subsets ; The conversion number is determined according to the respective conversion numbers of each historical business resource in the corresponding historical business resource subset, and the joint factor is based on the corresponding estimated conversion of each historical business resource in the corresponding historical business resource subset rate and industry factors;
    有效数据确定模块,用于根据所述资源属性信息,在所述聚合数据集中获取针对所述目标业务资源的有效转化数和有效联合因子;an effective data determination module, configured to obtain the effective conversion number and effective combination factor for the target business resource in the aggregated data set according to the resource attribute information;
    校准模块,用于根据所述有效转化数和所述有效联合因子确定校准系数,根据所述校准系数对所述目标业务资源的预估联合因子进行校准;所述预估联合因子是根据所述目标业务资源的预估转化率和行业因子确定的。A calibration module, configured to determine a calibration coefficient according to the effective conversion number and the effective joint factor, and calibrate the estimated joint factor of the target service resource according to the calibration coefficient; the estimated joint factor is based on the Determined by the estimated conversion rate of the target business resource and industry factors.
  14. 一种计算机设备,包括:处理器、存储器以及网络接口;A computer device, comprising: a processor, a memory, and a network interface;
    所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口用于提供网络通信功能,所述存储器用于存储程序代码,所述处理器用于调用所述程序代码,以执行权利要求1-12任一项所述的方法。The processor is connected to the memory and the network interface, wherein the network interface is used to provide a network communication function, the memory is used to store program codes, and the processor is used to call the program codes to execute The method according to any one of claims 1-12.
  15. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,该计算机程序适于由处理器加载并执行权利要求1-12任一项所述的方法。A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and the computer program is adapted to be loaded by a processor and execute the method according to any one of claims 1-12.
  16. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机实现如权利要求1-12中任一项所述的方法。A computer program product comprising instructions, which, when run on a computer, cause the computer to implement the method according to any one of claims 1-12.
PCT/CN2022/087839 2021-05-11 2022-04-20 Data calibration method and apparatus, and computer device and readable storage medium WO2022237477A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110513300.X 2021-05-11
CN202110513300.XA CN115330428A (en) 2021-05-11 2021-05-11 Data calibration method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
WO2022237477A1 true WO2022237477A1 (en) 2022-11-17

Family

ID=83912194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/087839 WO2022237477A1 (en) 2021-05-11 2022-04-20 Data calibration method and apparatus, and computer device and readable storage medium

Country Status (2)

Country Link
CN (1) CN115330428A (en)
WO (1) WO2022237477A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116402571A (en) * 2023-03-14 2023-07-07 上海峰沄网络科技有限公司 Budget data processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140006144A1 (en) * 2012-06-29 2014-01-02 Yahoo Inc. Method of calculating a reserve price for an auction and apparatus conducting the same
CN110288379A (en) * 2019-05-28 2019-09-27 北京深演智能科技股份有限公司 Conversion price expectation method, apparatus, storage medium and the computer equipment of advertisement
CN110599295A (en) * 2019-08-22 2019-12-20 阿里巴巴集团控股有限公司 Method, device and equipment for pushing articles
CN112348593A (en) * 2020-11-25 2021-02-09 深圳市欢太科技有限公司 Information delivery control method, device and storage medium
CN112598136A (en) * 2020-12-25 2021-04-02 上海连尚网络科技有限公司 Data calibration method and device
US20210110417A1 (en) * 2019-10-11 2021-04-15 Live Nation Entertainment, Inc. Dynamic bidding determination using machine-learning models

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140006144A1 (en) * 2012-06-29 2014-01-02 Yahoo Inc. Method of calculating a reserve price for an auction and apparatus conducting the same
CN110288379A (en) * 2019-05-28 2019-09-27 北京深演智能科技股份有限公司 Conversion price expectation method, apparatus, storage medium and the computer equipment of advertisement
CN110599295A (en) * 2019-08-22 2019-12-20 阿里巴巴集团控股有限公司 Method, device and equipment for pushing articles
US20210110417A1 (en) * 2019-10-11 2021-04-15 Live Nation Entertainment, Inc. Dynamic bidding determination using machine-learning models
CN112348593A (en) * 2020-11-25 2021-02-09 深圳市欢太科技有限公司 Information delivery control method, device and storage medium
CN112598136A (en) * 2020-12-25 2021-04-02 上海连尚网络科技有限公司 Data calibration method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116402571A (en) * 2023-03-14 2023-07-07 上海峰沄网络科技有限公司 Budget data processing method, device, equipment and storage medium
CN116402571B (en) * 2023-03-14 2024-04-26 上海峰沄网络科技有限公司 Budget data processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115330428A (en) 2022-11-11

Similar Documents

Publication Publication Date Title
US10649818B2 (en) Multi-touch attribution model for valuing impressions and other online activities
US10861058B2 (en) System architecture and methods for facilitating client-side real-time auctions of advertising inventory
CN105447724B (en) Content item recommendation method and device
US20210118015A1 (en) System and Methods for Generating Dynamic Market Pricing for Use in Real-Time Auctions
Zhang et al. Optimal real-time bidding for display advertising
WO2018121253A1 (en) Method, device and equipment for adjusting advertisement delivery rate
JP6890652B2 (en) Methods and devices for measuring the effectiveness of information delivered to mobile devices
US11341535B1 (en) Conversion timing prediction for networked advertising
CN111667311B (en) Advertisement putting method, related device, equipment and storage medium
US20170091811A1 (en) Systems, methods, and devices for customized data event attribution and bid determination
JP2016024828A (en) Advertisement server and method for determining advertisement exposure amount
US20110055004A1 (en) Method and system for selecting and optimizing bid recommendation algorithms
US20200134663A1 (en) Automatic resource adjustment based on resource availability
US20140379460A1 (en) Real-time updates to digital marketing forecast models
US20150081422A1 (en) Service providing apparatus and service providing method
CN111932314A (en) Method, device and equipment for pushing recommended content and readable storage medium
WO2022237477A1 (en) Data calibration method and apparatus, and computer device and readable storage medium
US20110029338A1 (en) System and method for generating a valuation of online users and websites from user activities
US10872355B2 (en) Controlling user data visibility in online ad auctions
US20160342699A1 (en) Systems, methods, and devices for profiling audience populations of websites
JP6727031B2 (en) Advertising device, warning output method, and warning output program
JP2016526731A (en) Fixed price determination for guaranteed delivery of online advertising
CN111582901A (en) Method, device and system for quantizing display link effect and storage medium
US9665890B1 (en) Determining lookback windows
US10872123B2 (en) Prediction of content distribution statistics using a model simulating a content distribution program for a specified set of users over a time period

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22806447

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE