CN110322039B - Click rate estimation method, server and computer readable storage medium - Google Patents

Click rate estimation method, server and computer readable storage medium Download PDF

Info

Publication number
CN110322039B
CN110322039B CN201810275032.0A CN201810275032A CN110322039B CN 110322039 B CN110322039 B CN 110322039B CN 201810275032 A CN201810275032 A CN 201810275032A CN 110322039 B CN110322039 B CN 110322039B
Authority
CN
China
Prior art keywords
model
data
click rate
target
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810275032.0A
Other languages
Chinese (zh)
Other versions
CN110322039A (en
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810275032.0A priority Critical patent/CN110322039B/en
Publication of CN110322039A publication Critical patent/CN110322039A/en
Application granted granted Critical
Publication of CN110322039B publication Critical patent/CN110322039B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/147Network analysis or design for predicting network behaviour
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/535Tracking the activity of the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Abstract

The embodiment of the invention discloses a click rate estimation method, a server and a computer readable storage medium, which are used for obtaining a more accurate comprehensive estimation model, improving the performance of the comprehensive estimation model and increasing the accuracy of click rate estimation. The method provided by the embodiment of the invention comprises the following steps: acquiring sample data; converting the sample data into feature data; acquiring a comprehensive estimation model according to the characteristic data, wherein the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs; and carrying out click rate estimation on the characteristic data according to the comprehensive estimation model.

Description

Click rate estimation method, server and computer readable storage medium
Technical Field
The present invention relates to the field of computers, and in particular, to a click rate estimation method, a server, and a computer-readable storage medium.
Background
With the rapid development of the internet, the data scale is larger and larger, the data types are richer and richer, and how to acquire effective data information from massive data becomes more and more important.
The data content provider needs to provide targeted data for the user, for example, provide the user with the required advertisement or news, and the click rate is an important criterion for measuring whether the data provided by the data content provider is targeted. A data content provider needs to estimate the content to be provided and push news or advertisements with a high click rate to the user. When a user uses a recommendation platform to search and inquire, if a certain target content meeting the search requirement of the user is triggered, the target content appears on a search result page, namely the target content is displayed once, the number of times of displaying the target content in a period of time is called as the display number, and the number of times of clicking the target content in a period of time is called as the click number. The click rate refers to the ratio of the number of clicks to the number of displays of a certain content (news or advertisement) on the recommendation platform, i.e. the probability that a certain content is clicked by a user.
In the existing scheme, a Logistic Regression (LR) model is used as a click-through rate estimation model, so as to push contents with high click-through rate to a user. The LR model algorithm, which has only one centralized model parameter, is trained on each data sample during the training process without considering the distribution of the data samples. In an actual product, the distribution of data samples is not uniform. For example, users may be divided into different feature groups, and the liveness of different feature groups may tend to have different behaviors, thereby causing performance degradation of the click rate prediction model.
Disclosure of Invention
The embodiment of the invention provides a click rate estimation method, a server and a computer readable storage medium, which are used for obtaining a more accurate comprehensive estimation model, improving the performance of the comprehensive estimation model and increasing the accuracy of click rate estimation.
The first aspect of the present invention provides a click rate estimation method, including:
acquiring sample data;
converting the sample data into feature data;
acquiring a comprehensive estimation model according to the characteristic data, wherein the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs;
and carrying out click rate estimation on the characteristic data according to the comprehensive estimation model.
The second aspect of the present invention provides a click rate estimation method, including:
sending preset target data to a recommendation platform, wherein the recommendation platform is used for sending push data to user equipment according to the preset target data;
receiving the pushed data sent by the recommendation platform;
and displaying the push data.
A third aspect of the present invention provides a server, comprising:
a first obtaining unit configured to obtain sample data;
a conversion unit for converting the sample data into feature data;
the second obtaining unit is used for obtaining a comprehensive estimation model according to the characteristic data, the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs;
and the estimation unit is used for estimating the click rate of the characteristic data according to the comprehensive estimation model.
A fourth aspect of the present invention provides a user equipment, comprising:
the device comprises a sending unit, a recommending platform and a sending unit, wherein the sending unit is used for sending preset target data to the recommending platform, and the recommending platform is used for sending push data to user equipment according to the preset target data;
the receiving unit is used for receiving the push data sent by the recommendation platform;
and the display unit is used for displaying the push data.
A fifth aspect of the present invention provides a server, comprising:
a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is configured to execute the program in the memory, and includes the steps of:
acquiring sample data;
converting the sample data into feature data;
acquiring a comprehensive model according to the characteristic data, wherein the comprehensive model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs;
estimating the click rate of the characteristic data according to the comprehensive model;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
A sixth aspect of the present invention provides a computer storage medium that includes instructions, which, when executed on a computer, cause the computer to perform the operations of the above-described aspects.
According to the technical scheme, the embodiment of the invention has the following advantages:
the embodiment of the invention provides a click rate estimation method, which comprises the following steps: acquiring sample data; converting the sample data into feature data; acquiring a comprehensive estimation model according to the characteristic data, wherein the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs; and carrying out click rate estimation on the characteristic data according to the comprehensive estimation model. In the embodiment of the invention, the more accurate comprehensive estimation model is obtained by acquiring the local model corresponding to the category of the feature data and combining the centralized model, so that the performance of the comprehensive estimation model is improved, and the accuracy of click rate estimation is improved.
Drawings
FIG. 1A is a schematic diagram of a network architecture according to an embodiment of the present invention;
FIG. 1B is a schematic structural diagram of a model training platform according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an application scenario of the click rate estimation method in the embodiment of the present invention;
FIG. 3 is a diagram of an embodiment of a click rate estimation method according to an embodiment of the invention;
FIG. 4A is a schematic diagram illustrating a comparison of click rates calculated by different estimation models according to an embodiment of the present invention;
FIG. 4B is a diagram illustrating an example of news content categories according to an embodiment of the present invention;
fig. 4C is a schematic diagram of an embodiment of a correspondence relationship between news categories and cluster centers in the embodiment of the present invention;
FIG. 5 is a diagram of one embodiment of a server in an embodiment of the invention;
FIG. 6 is a diagram of another embodiment of a server in an embodiment of the invention;
FIG. 7 is a diagram of another embodiment of a server in an embodiment of the invention;
FIG. 8 is a schematic diagram of another embodiment of a server in an embodiment of the present invention;
fig. 9 is a schematic diagram of an embodiment of a user equipment in the embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a click rate pre-estimation method, a server and a computer readable storage medium, which are used for obtaining a more accurate comprehensive pre-estimation model, improving the performance of the comprehensive pre-estimation model and increasing the accuracy of click rate pre-estimation.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1A, fig. 1A is a schematic diagram of a network architecture applied in an embodiment of the present invention, as shown in fig. 1A, the network architecture includes a user device and a server, where the server includes a recommendation platform, a content platform, a feature extraction module, a distributed file system, and a model training platform. The feature extraction module performs feature extraction on sample data in the content platform according to target data sent by user equipment to obtain feature data, the feature data are sent to the model training platform through the distributed file system, the distributed file system obtains a comprehensive estimation model for the sample data and sends the comprehensive estimation model to the recommendation platform, the recommendation platform calculates the estimated click rate of the sample data according to the comprehensive estimation model, and content with high estimated click rate is pushed to the user equipment. As shown in fig. 1B, the model training platform may be composed of a plurality of computing nodes, sample data is reasonably distributed to each computing node, and the model parameters and the clustering centers obtained through computation are stored in the parameter server.
It should be understood that the embodiment of the present invention may be applied to a content push scenario, and may be applied to a news exhibition product interacting with a user, for example, in a scenario of advertisement or news pushing to a user device, and a probability that an advertisement or news to be pushed is clicked is calculated, that is, according to a potential demand of the user for the advertisement or news, a targeted advertisement or news (i.e., an advertisement or news with a high click rate) that is interested by the user is pushed to the user, so as to promote traffic of a platform for pushing the advertisement or news. Referring to fig. 2, fig. 2 is a schematic view of an application scenario of a click rate estimation method in an embodiment of the present invention, as shown in the figure, the application scenario includes a server and a plurality of user devices, a user device 1, a user device 2, a user device 3, a user device 4, and a user device 5, the server obtains target data from different user devices each time, the server calculates a target content with a high estimated click rate according to the target data, sample data, and a comprehensive estimation model, and pushes the target content to a user device that sends the target data, for example, pushes news with a high estimated click rate to the user device 1, where the estimated click rate is an estimated click rate of the user device 1 on the pushed news according to historical click information of the user device 1. The push content received by different user devices may be the same or different, for example, the push content received by the user device 1 and the user device 2 is the same, and the push content received by the user device 1 and the user device 3 is different, which is not limited herein.
For convenience of description, a specific flow of the embodiment of the present invention is described below, and referring to fig. 3, when the click rate estimation method provided by the present invention is applied to "fast report everyday", an embodiment of the click rate estimation method in the embodiment of the present invention includes:
301. the user equipment sends preset target data to the server.
The user equipment sends preset target data to the server.
Specifically, the user equipment analyzes and processes data stored in the user equipment to obtain preset target data, where the preset target data is used to indicate a click rate of a user on a specific content through the user equipment.
It can be understood that, different user equipments, the obtained preset target data are not identical, and when content is pushed to one user equipment, the content needs to be pushed in combination with the preset target data corresponding to the user equipment.
302. The server obtains sample data.
In this embodiment, the server obtains sample data from the content platform, where the sample data may be a set of advertisement or news data. In this embodiment and the following embodiments, news content is taken as sample data for explanation, for example, if the sampling range of the sample data is 10, the server randomly selects 10 news contents as sample data from a content platform storing the news contents.
It is understood that the selection mode of the news content may be performed in a random selection mode, or may be performed in a certain preset order according to actual needs, for example, the news content with the later generation time of the news content is targeted for the priority selection.
303. The server converts the sample data into feature data.
The server converts the sample data into feature data.
The server acquires target data of the user equipment, such as historical click records, and acquires related target data of the user equipment, such as news access records of the user, news acquisition frequency of the user, categories of clicked news of the user and the like. And converting the sample data into required characteristic data according to the target data, wherein the characteristic data is used for quantifying the interaction information of the user and the news content and is represented by a multi-dimensional vector, and each characteristic refers to each component in the multi-dimensional vector. Such as the age of the user, the category of the news, etc.
304. And the server acquires a comprehensive estimation model according to the characteristic data, wherein the comprehensive estimation model is obtained by a centralized model and a local model.
The server obtains a comprehensive estimation model according to the characteristic data, the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs.
For example, in the network architecture shown in fig. 1A, the server processes the feature data through a preset algorithm and a model training platform to obtain a comprehensive predictive model, where the comprehensive predictive model is obtained from a centralized model and a localized model, for example, by summing the centralized model and the localized model, or in other combination manners, which is not limited herein. For example, if 4 localized models trained from historical data are stored in the parameter server, and 5 localized models correspond to 5 different categories, it is determined which target category of the 5 categories the sample data belongs to, and the localized model corresponding to the target category is to be selected. The centering model is a centering model in a Logistic Regression (LR) model.
It should be noted that the centralized model and the localized model are obtained by training according to historical information, and the centralized model and the localized model are updated each time new data is obtained.
305. And the server carries out click rate estimation on the characteristic data according to the comprehensive estimation model.
And the server carries out click rate estimation on the characteristic data according to the comprehensive estimation model.
In this embodiment, the click rate of the feature data is estimated through the obtained comprehensive estimation model, so as to obtain the estimated click rate of the feature data.
For example, in the network architecture shown in fig. 1A, the server calculates each feature data through the recommendation platform, and determines the estimated click rate of each news content in the sample data. It can be appreciated that in an actual product, the sample data is not uniformly distributed. For example, users may be divided into populations of different characteristics, and the liveness of populations of different characteristics may tend to have different performance. Therefore, the more types of the local models are, namely the more accurate the sample data is divided, the more accurate the obtained comprehensive estimated model is, and the more accurate the estimated click rate is.
It should be noted that, a Receiver Operating Characteristic (ROC) curve and an area under the ROC curve (AUC) are often used to measure an index of the click rate estimation model performance, where a value of the AUC is between 0 and 1, and the closer to 1, the better the model performance.
Specifically, taking an experimental result of 'fast reporting every day' software as an example, 7 days of sampling data are trained to obtain a plurality of localized models, each localized model corresponds to one clustering center, and the more centers of the localized models, that is, the more clustering centers, the more accurate the calculation result of the model. Using the latest hour of news data as the test data, the result is shown in fig. 4A, for example, when the test model adopts the standard LR model, the AUC takes a value of 0.748266; when the test model comprises 4 localization models, the value of AUC is 0.749511; when the test model included 8 localization models, the AUC was 0.749811.
306. And the server determines the push data in the sample data according to a preset rule.
And the server determines the pushed data in the sample data according to a preset rule, wherein the preset rule is used for determining the data meeting the click rate requirement from the sample data.
Specifically, the push data to be pushed is determined according to the actual content of the sample data, the parameters of the user equipment and the like, and the push data is data with a high estimated click rate in the sample data. It can be understood that the preset rule may be that the sample data are arranged in order according to the corresponding estimated click rate, and the preset number of data ranked in the top is determined as the push data.
307. And the server sends the push data to the user equipment.
The server sends the push data to the user equipment so that the user equipment can display the push data.
308. The user equipment displays the pushed data.
And displaying the received push data on the user equipment. Specifically, the user equipment may sort the data according to the estimated click rate corresponding to each data in the received push data, display the data with the largest estimated click rate at a first position on the display interface, display the data with the second largest estimated click rate at a second position on the display interface, and analogize the situation that the second position is behind the first position to display all the push data.
In this embodiment, the server first obtains sample data, converts the sample data into feature data, obtains an integrated prediction model according to the feature data, obtains the integrated prediction model from the centralized model and the localized model, performs click rate prediction on the feature data according to the integrated prediction model, and finally sends push data with a high click rate to the user equipment, so that the sent push data is displayed on the user equipment. According to the click rate estimation method and device, the click rate estimation accuracy is improved by obtaining the click rate estimation model with better performance.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the click rate estimation method provided in the embodiment of the present invention, the converting the sample data into the feature data includes:
extracting the characteristics of the sample data according to preset target data;
and acquiring the characteristic data of the sample data.
In this embodiment, the server performs targeted feature extraction on the content in the sample data by combining the target data, for example, feature extraction may be performed on news content by combining the age of the user, so that the effectiveness of the obtained feature data is improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the click rate estimation method provided in the embodiment of the present invention, the obtaining a comprehensive estimation model according to the feature data includes:
acquiring a plurality of preset clustering centers, wherein each clustering center corresponds to one category;
clustering the characteristic data according to a plurality of preset clustering centers, and determining a target clustering center closest to the characteristic data;
determining the target category of the characteristic data according to the target clustering center;
acquiring a centralized model and a localized model, wherein the localized model corresponds to a target class;
and acquiring a comprehensive pre-estimation model.
Specifically, a plurality of preset clustering centers exist in the server, each clustering center corresponds to one category, the plurality of clustering centers are obtained by training according to historical data, the specific training process is not repeated in this embodiment, after a new feature data is obtained, the feature data is clustered first, a K-means (K-means) algorithm can be adopted for clustering, n feature data are classified into K clusters, so that the distance from each feature data to the center point of the cluster where the feature data is located is smaller than that of the other cluster center points, the value of n in this embodiment is 10, and the value of K in this embodiment is 5; the category to which the feature data belongs is determined. For example, for news content, news may be divided by category, such as sports news, entertainment news, financial news, scientific news, military news, and so forth, as shown in FIG. 4B. In this embodiment, the 5 news categories are taken as examples for explanation, and it is determined that the feature data belongs to any one of sports news, entertainment news, financial news, scientific news, and military news; and determining a local model of the news category to which the characteristic data belongs and a centralized model of the characteristic data, and finally obtaining a comprehensive estimation model, wherein the centralized model is similar to the centralized model in the LR model and is not repeated here. It should be noted that there are various clustering algorithms, and besides the K-means algorithm adopted in the embodiment of the present invention, a variational auto-encoder (VAE) algorithm may also be adopted, and other clustering algorithms may also be adopted, which is not limited herein. The embodiment of the invention can also adopt a classification method to determine the class of the characteristic data, wherein the class in the classification method is divided in advance.
For example, news content is divided into 4 categories, namely sports news, entertainment news, financial news and military news, each category corresponds to a cluster center, 4 cluster centers are obtained through massive data training, for example, a cluster center 1 corresponding to sports news, a cluster center 2 corresponding to entertainment news, a cluster center 3 corresponding to financial news and a cluster center 4 corresponding to military news are obtained, as shown in fig. 4C, after new feature data is obtained, the feature data is classified, and if the distance from the feature data to the cluster center 1 is smaller than the distance from the feature data to other cluster center points (the cluster center 2, the cluster center 3 and the cluster center 4), the feature data is determined to be the sports news; and then obtaining a localized model corresponding to the sports news and a centralized model of the characteristic data, and finally obtaining a comprehensive estimation model, wherein the centralized model is similar to the centralized model in the LR model, and the detailed description is omitted here.
In the embodiment, the local model of the feature data is obtained through the clustering algorithm, and then the comprehensive estimation model is obtained by combining the centralized model, so that a specific implementation mode is provided, and the realizability and the operability of the embodiment are improved.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the click rate estimating method provided in the embodiment of the present invention, the method further includes:
updating a target clustering center according to the characteristic data;
and updating the centralized model and the localized model according to the characteristic data.
In the embodiment, the process of updating the clustering center, the centralized model and the local model is added, and the performance of the obtained comprehensive estimation model is increased, so that the performance of click rate estimation is more intelligently optimized.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the click rate estimation method provided in the embodiment of the present invention, the updating the target clustering center according to the feature data includes:
and updating the target clustering center according to a preset K mean algorithm and the characteristic data.
It should be noted that, each time a feature data is clustered, the clustering centers of the feature data and the historical data are changed, and the clustering centers need to be determined again, so that the judgment on the next feature data is more accurate. And the target clustering center can be updated by combining a preset VAE algorithm and characteristic data.
In the embodiment, a specific K-means algorithm is provided to update the target clustering center, so that the clustering center is more accurate and reliable.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the click rate estimation method provided in the embodiment of the present invention, the updating the centralized model and the localized model according to the feature data includes:
updating the centralized model and the localized model according to a preset online gradient descent algorithm;
or updating the centralized model and the localized model according to a preset online learning algorithm.
It should be noted that an Online Gradient Descent (OGD) algorithm is a way to update parameters in time one by one as a sample is trained. When new data exist, only one data is used for updating each time, so that the calculation amount can be obviously reduced, and when the method is applied to a large-scale training set, the local optimal solution is easy to converge.
An online learning algorithm (FTRL) iterates the model once for each training sample by using the loss function and gradient generated by the sample, and trains the model one by one, thereby being capable of handling large-data-volume training and online training. The FTRL is trained and updated separately for each dimension, different learning rates are used for each dimension, compared with the method that uniform learning rates are used for all feature dimensions, the method considers the nonuniformity of training samples distributed on different features, if the training samples containing the features of a certain dimension are few, and each sample is very precious, the training rate corresponding to the feature dimension can independently keep a larger value, and each sample containing the features can advance a large step on the gradient of the sample without being forcibly consistent with the advance steps of other feature dimensions.
In this embodiment, when one feature data is clustered, the clustering centers of the feature data and the historical data are changed, and the centering model and the localization model need to be determined again, so that the next feature data is more accurately determined.
Optionally, on the basis of the embodiment corresponding to fig. 3, in an optional embodiment of the click through rate estimating method provided in the embodiment of the present invention, the method further includes:
determining pushed data in the sample data according to a preset rule, wherein the preset rule is used for determining data meeting the click rate requirement from the sample data;
and sending the push data to user equipment so that the user equipment displays the push data.
In the embodiment, the determined push content is sent to the user equipment side so as to be displayed by the user equipment, the visual characteristic of the user equipment side is increased, the targeted push content is provided for the user equipment, and the steps of the embodiment of the invention are further improved.
A click rate estimation method according to the present invention is described above, and a server for executing the click rate estimation method is described below.
Referring to fig. 5, an embodiment of a server according to the embodiment of the present invention includes:
a first obtaining unit 501, configured to obtain sample data;
a converting unit 502, configured to convert the sample data into feature data;
a second obtaining unit 503, configured to obtain a comprehensive estimation model according to the feature data, where the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used to calculate a click rate of the feature data, and the localized model is used to calculate a click rate according to a category to which the feature data belongs;
and the estimation unit 504 is configured to estimate the click rate of the feature data according to the comprehensive estimation model.
Referring to fig. 6, another embodiment of the server according to the embodiment of the present invention includes:
a first obtaining unit 601, configured to obtain sample data;
a conversion unit 602, configured to convert the sample data into feature data;
a second obtaining unit 603, configured to obtain a comprehensive estimation model according to the feature data, where the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used to calculate a click rate of the feature data, and the localized model is used to calculate a click rate according to a category to which the feature data belongs;
and the estimation unit 604 is configured to estimate the click rate of the feature data according to the comprehensive estimation model.
Optionally, in a possible implementation, the converting unit 602 includes:
an extraction module 6021, configured to perform feature extraction on the sample data according to preset target data;
a first obtaining module 6022, configured to obtain feature data of the sample data.
Optionally, in a possible implementation, the second obtaining unit 603 includes:
a second obtaining module 6031, configured to obtain multiple preset clustering centers, where each clustering center corresponds to a category;
a clustering module 6032, configured to cluster the feature data according to the preset multiple clustering centers, and determine a target clustering center closest to the feature data;
a determining module 6033, configured to determine a target category of the feature data according to the target clustering center;
a third obtaining module 6034, configured to obtain the centralized model and the localized model, where the localized model corresponds to the target category;
a fourth obtaining module 6035, configured to obtain the comprehensive estimation model.
Optionally, in a possible implementation, the second obtaining unit 603 further includes:
an updating module 6036, configured to update the target clustering center according to the feature data;
the updating module 6036 is further configured to update the centralized model and the localized model according to the feature data.
Optionally, in a possible implementation, the updating module 6036 is specifically configured to:
and updating the target clustering center according to a preset K-means algorithm and the characteristic data.
Optionally, in a possible implementation, the updating module 6036 is further configured to:
updating the centralized model and the localized model according to a preset online gradient descent algorithm;
or updating the centralized model and the localized model according to a preset online learning algorithm.
Optionally, in a possible implementation, the server further includes:
determining pushed data in the sample data according to a preset rule, wherein the preset rule is used for determining data meeting the click rate requirement from the sample data;
and sending the push data to user equipment so as to enable the user equipment to display the push data.
The server in the embodiment of the present invention is described above from the perspective of the modular functional entity, and the server in the embodiment of the present invention is described below from the perspective of hardware processing. Referring to fig. 7, another embodiment of the server according to the embodiment of the present invention includes:
fig. 7 is a schematic structural diagram of a server 700 according to an embodiment of the present invention, where the server 700 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 701 (e.g., one or more processors), a memory 702, and one or more storage media 703 (e.g., one or more mass storage devices) for storing applications 704 or data 705. Memory 702 and storage medium 703 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 703 may include one or more modules (not shown), each of which may include a sequence of instructions operating on a server. Still further, the central processor 701 may be configured to communicate with the storage medium 703, and execute a series of instruction operations in the storage medium 703 on the server 700.
The Server 700 may also include one or more power supplies 706, one or more wired or wireless network interfaces 707, one or more input-output interfaces 708, and/or one or more operating systems 709, such as Windows Server, mac OS X, unix, linux, freeBSD, etc.
The steps performed by the server in the above embodiment may be based on the structure of the server shown in fig. 7.
For example, the central processing unit 701 may call instructions stored in the storage medium 703 to perform the following operations:
acquiring sample data;
converting the sample data into feature data;
acquiring a comprehensive estimation model according to the characteristic data, wherein the comprehensive estimation model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs;
and carrying out click rate estimation on the characteristic data according to the comprehensive estimation model.
Optionally, the central processing unit 701 is further configured to perform the following steps:
extracting the characteristics of the sample data according to preset target data;
and acquiring the characteristic data of the sample data.
Optionally, the central processing unit 701 is further configured to perform the following steps:
acquiring a plurality of preset clustering centers, wherein each clustering center corresponds to one category;
clustering the characteristic data according to the preset clustering centers, and determining a target clustering center closest to the characteristic data;
determining the target category of the characteristic data according to the target clustering center;
acquiring the centralized model and the localized model, wherein the localized model corresponds to the target class;
and acquiring the comprehensive pre-estimation model.
Optionally, the central processing unit 701 is further configured to perform the following steps:
updating the target clustering center according to the characteristic data;
updating the centralized model and the localized model according to the feature data.
Optionally, the central processing unit 701 is further configured to perform the following steps:
and updating the target clustering center according to a preset K-means algorithm and the characteristic data.
Optionally, the central processing unit 701 is further configured to perform the following steps:
updating the centralized model and the localized model according to a preset online gradient descent algorithm;
or updating the centralized model and the localized model according to a preset online learning algorithm.
Referring to fig. 8, another embodiment of the server according to the embodiment of the present invention includes:
a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory and comprises the following steps:
acquiring sample data;
converting the sample data into feature data;
acquiring a comprehensive model according to the characteristic data, wherein the comprehensive model is obtained by a centralized model and a localized model, the centralized model is used for calculating the click rate of the characteristic data, and the localized model is used for calculating the click rate according to the category to which the characteristic data belongs;
estimating click rate of the feature data according to the comprehensive model;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
As shown in fig. 8, the server 800 includes: memory 801, processor 802, transceiver 803. Optionally, server 800 may also include bus 8044. The transceiver 8033, the processor 8022 and the memory 8011 may be connected to each other through a bus 804; the bus 804 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 804 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 8, but this is not intended to represent only one bus or type of bus.
Fig. 9 is a diagram showing only a portion related to the embodiment of the present invention for convenience of description, and please refer to the method portion in the embodiment of the present invention for details that are not disclosed. The user equipment may be any user equipment including a mobile phone, a tablet computer, a Personal Digital Assistant (PDA), a vehicle-mounted computer, and the like, and the user equipment is taken as the mobile phone as an example.
Fig. 9 is a block diagram illustrating a partial structure of a mobile phone related to a user equipment provided in an embodiment of the present invention. Referring to fig. 9, the handset includes: radio Frequency (RF) circuitry 910, memory 920, input unit 930, display unit 940, sensor 950, audio circuitry 960, wireless fidelity (WiFi) module 970, processor 980, and power supply 990. Those skilled in the art will appreciate that the handset configuration shown in fig. 9 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
The following describes each component of the mobile phone in detail with reference to fig. 9:
the RF circuitry 910 may be used for the reception and transmission of targeted and pushed data, and in particular, pushed data transmitted by a server for processing by the processor 980; in general, the RF circuit 910 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a Low Noise Amplifier (LNA), a duplexer, and the like. In addition, the RF circuit 910 may also communicate with networks and other devices via wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to global system for mobile communications (GSM), general Packet Radio Service (GPRS), code Division Multiple Access (CDMA), wideband Code Division Multiple Access (WCDMA), long Term Evolution (LTE), email, short Message Service (SMS), etc.
The memory 920 may be used to store software programs and modules, and the processor 980 may execute various functional applications and data processing of the mobile phone by operating the software programs and modules stored in the memory 920. The memory 920 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as news data) created according to the use of the cellular phone, and the like. Further, the memory 920 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The display unit 940 may be used to display push data, such as push news, sent by the server. The display unit 940 may include a display panel 941, and optionally, the display panel 941 may be configured in the form of a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), or the like. Further, when processor 980 receives push data sent by the server, processor 980 provides a corresponding visual output on display panel 941. Although in fig. 9, the touch panel 931 and the display panel 941 are two independent components to implement the input and output functions of the mobile phone, in some embodiments, the touch panel 931 and the display panel 941 may be integrated to implement the input and output functions of the mobile phone.
The processor 980 is a control center of the mobile phone, connects various parts of the entire mobile phone using various interfaces and lines, and performs various functions of the mobile phone and processes data by operating or executing software programs and/or modules stored in the memory 920 and calling data stored in the memory 920. Alternatively, processor 980 may include one or more processing units; preferably, the processor 980 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 980.
The handset also includes a power supply 990 (e.g., a battery) for supplying power to the various components, which may preferably be logically connected to the processor 980 via a power management system, thereby providing management of charging, discharging, and power consumption via the power management system.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the apparatus and the module described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one position, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or each module may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer-readable storage medium.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.
The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a server, a data center, etc., that is integrated with one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
The technical solutions provided by the present invention are introduced in detail, and the present invention applies specific examples to explain the principle and the implementation manner of the present invention, and the descriptions of the above examples are only used to help understanding the method and the core ideas of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (15)

1. A click through rate estimation method is characterized by comprising the following steps:
acquiring sample data;
converting the sample data into feature data;
determining a target clustering center closest to the characteristic data from a plurality of preset clustering centers, wherein each clustering center corresponds to one category, and a plurality of pre-stored localized models respectively correspond to a plurality of different categories;
determining a target class of the feature data according to the target clustering center, and selecting a localization model corresponding to the target class from the plurality of localization models; the localization model is used for calculating click rate according to the category to which the feature data belongs;
obtaining a centralization model in a logistic regression model, wherein the centralization model is used for calculating the click rate of the feature data;
combining the localized model and the centralized model to obtain a comprehensive estimation model;
and carrying out click rate estimation on the characteristic data according to the comprehensive estimation model.
2. The method of claim 1, wherein said converting said sample data into feature data comprises:
extracting the characteristics of the sample data according to preset target data;
and acquiring the characteristic data of the sample data.
3. The method of claim 1, wherein the determining a target cluster center closest to the feature data from among a preset plurality of cluster centers comprises:
acquiring a plurality of preset clustering centers;
and clustering the characteristic data according to the preset clustering centers, and determining a target clustering center closest to the characteristic data.
4. The method of claim 3, further comprising:
updating the target clustering center according to the characteristic data;
updating the centralized model and the localized model according to the feature data.
5. The method of claim 4, wherein the updating the target cluster center according to the feature data comprises:
and updating the target clustering center according to a preset K-means algorithm and the characteristic data.
6. The method of claim 4, wherein said updating the centralized model and the localized model based on the feature data comprises:
updating the centralized model and the localized model according to a preset online gradient descent algorithm;
or updating the centralized model and the localized model according to a preset online learning algorithm.
7. The method according to any one of claims 1-6, further comprising:
determining pushed data in the sample data according to a preset rule, wherein the preset rule is used for determining data meeting the click rate requirement from the sample data;
and sending the push data to user equipment so that the user equipment displays the push data.
8. A server, comprising:
a first obtaining unit configured to obtain sample data;
a conversion unit for converting the sample data into feature data;
the second acquisition unit is used for determining a target clustering center closest to the characteristic data from a plurality of preset clustering centers, wherein each clustering center corresponds to one category, and a plurality of pre-stored localized models respectively correspond to a plurality of different categories; determining a target class of the feature data according to the target clustering center, and selecting a localized model corresponding to the target class from the plurality of localized models; the localization model is used for calculating click rate according to the category to which the feature data belongs; acquiring a centralized model in a logistic regression model, wherein the centralized model is used for calculating the click rate of the feature data; combining the localized model and the centralized model to obtain a comprehensive estimation model;
and the estimation unit is used for estimating the click rate of the characteristic data according to the comprehensive estimation model.
9. The server according to claim 8, wherein the conversion unit includes:
the extraction module is used for extracting the characteristics of the sample data according to preset target data;
and the first acquisition module is used for acquiring the characteristic data of the sample data.
10. The server according to claim 8, wherein the second obtaining unit includes:
the second acquisition module is used for acquiring a plurality of preset clustering centers;
and the clustering module is used for clustering the characteristic data according to the preset multiple clustering centers and determining a target clustering center closest to the characteristic data.
11. The server according to claim 10, wherein the second obtaining unit further includes:
the updating module is used for updating the target clustering center according to the characteristic data;
the updating module is further used for updating the centralized model and the localized model according to the characteristic data.
12. The server according to claim 11, wherein the update module is specifically configured to:
and updating the target clustering center according to a preset K-means algorithm and the characteristic data.
13. The server according to claim 11, wherein the update module is further specifically configured to:
updating the centralized model and the localized model according to a preset online gradient descent algorithm;
or updating the centralized model and the localized model according to a preset online learning algorithm.
14. A server, comprising: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor is used for executing the program in the memory and comprises the following steps:
acquiring sample data;
converting the sample data into feature data;
determining a target clustering center closest to the characteristic data from a plurality of preset clustering centers, wherein each clustering center corresponds to one category, and a plurality of pre-stored localized models respectively correspond to a plurality of different categories;
determining a target class of the feature data according to the target clustering center, and selecting a localization model corresponding to the target class from the plurality of localization models; the localization model is used for calculating click rate according to the category to which the feature data belongs;
acquiring a centralized model in a logistic regression model, wherein the centralized model is used for calculating the click rate of the feature data;
combining the localized model and the centralized model to obtain a comprehensive estimation model;
carrying out click rate estimation on the characteristic data according to the comprehensive estimation model;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 7.
CN201810275032.0A 2018-03-29 2018-03-29 Click rate estimation method, server and computer readable storage medium Active CN110322039B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810275032.0A CN110322039B (en) 2018-03-29 2018-03-29 Click rate estimation method, server and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810275032.0A CN110322039B (en) 2018-03-29 2018-03-29 Click rate estimation method, server and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110322039A CN110322039A (en) 2019-10-11
CN110322039B true CN110322039B (en) 2022-12-02

Family

ID=68111391

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810275032.0A Active CN110322039B (en) 2018-03-29 2018-03-29 Click rate estimation method, server and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110322039B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113129046A (en) * 2019-12-31 2021-07-16 上海哔哩哔哩科技有限公司 Click rate prediction method and device and computer equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors
US8117197B1 (en) * 2008-06-10 2012-02-14 Surf Canyon, Inc. Adaptive user interface for real-time search relevance feedback
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute
CN103745225A (en) * 2013-12-27 2014-04-23 北京集奥聚合网络技术有限公司 Method and system for training distributed CTR (Click To Rate) prediction model
CN107622086A (en) * 2017-08-16 2018-01-23 北京京东尚科信息技术有限公司 A kind of clicking rate predictor method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8117197B1 (en) * 2008-06-10 2012-02-14 Surf Canyon, Inc. Adaptive user interface for real-time search relevance feedback
CN102346899A (en) * 2011-10-08 2012-02-08 亿赞普(北京)科技有限公司 Method and device for predicting advertisement click rate based on user behaviors
CN103345512A (en) * 2013-07-06 2013-10-09 北京品友互动信息技术有限公司 Online advertising click-through rate forecasting method and device based on user attribute
CN103745225A (en) * 2013-12-27 2014-04-23 北京集奥聚合网络技术有限公司 Method and system for training distributed CTR (Click To Rate) prediction model
CN107622086A (en) * 2017-08-16 2018-01-23 北京京东尚科信息技术有限公司 A kind of clicking rate predictor method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Model Ensemble for Click Prediction in Bing Search Ads;Xiaoliang Ling等;《Proceedings of the 26th International Conference on World Wide Web Companion》;20170403;689-698页 *
基于多类别特征的在线广告点击率预测研究 ——以腾讯搜搜为例;刘唐;《中国优秀硕士学位论文全文数据库-信息科技辑》;20131115(第11期);I138-999页 *

Also Published As

Publication number Publication date
CN110322039A (en) 2019-10-11

Similar Documents

Publication Publication Date Title
CN109377329B (en) House resource recommendation method and device, storage medium and electronic equipment
CN108280115B (en) Method and device for identifying user relationship
EP2633487B1 (en) Method and system to recommend applications from an application market place to a new device
US20170310533A1 (en) Time-distributed and real-time processing in information recommendation system, method and apparatus
CN110069715B (en) Information recommendation model training method, information recommendation method and device
CN111079022A (en) Personalized recommendation method, device, equipment and medium based on federal learning
WO2020019563A1 (en) Search sequencing method and apparatus, electronic device, and storage medium
CN110019825B (en) Method and device for analyzing data semantics
CN110362750B (en) Target user determination method, device, electronic equipment and computer readable medium
CN111125523B (en) Searching method, searching device, terminal equipment and storage medium
CN110956505B (en) Advertisement inventory estimation method and related device
WO2018223772A1 (en) Content recommendation method and system
CN113392150A (en) Data table display method, device, equipment and medium based on service domain
CN111800513A (en) Method and device for pushing information and computer readable medium of electronic equipment
CN113537685A (en) Data processing method and device
CN106294087B (en) Statistical method and device for operation frequency of business execution operation
CN110322039B (en) Click rate estimation method, server and computer readable storage medium
CN114022196A (en) Advertisement putting method, device, electronic device and storage medium
US20210241171A1 (en) Machine learning feature engineering
CN110309357A (en) Using the method for data recommendation, the method, apparatus of model training and storage medium
CN108491502A (en) A kind of method, terminal, server and the storage medium of news tracking
CN109544241B (en) Click rate estimation model construction method, click rate estimation method and device
CN114430504B (en) Recommendation method and related device for media content
CN108632054B (en) Information transmission quantity prediction method and device
CN107256244B (en) Data processing method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant