CN113487351A - Privacy protection advertisement click rate prediction method, device, server and storage medium - Google Patents
Privacy protection advertisement click rate prediction method, device, server and storage medium Download PDFInfo
- Publication number
- CN113487351A CN113487351A CN202110755722.8A CN202110755722A CN113487351A CN 113487351 A CN113487351 A CN 113487351A CN 202110755722 A CN202110755722 A CN 202110755722A CN 113487351 A CN113487351 A CN 113487351A
- Authority
- CN
- China
- Prior art keywords
- client
- clients
- cluster
- model
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000003860 storage Methods 0.000 title claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 45
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 24
- 238000013135 deep learning Methods 0.000 claims abstract description 17
- 230000006870 function Effects 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 10
- 238000000354 decomposition reaction Methods 0.000 claims description 4
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000009826 distribution Methods 0.000 description 6
- 230000003993 interaction Effects 0.000 description 6
- 230000002776 aggregation Effects 0.000 description 5
- 238000004220 aggregation Methods 0.000 description 5
- 238000004590 computer program Methods 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0242—Determining effectiveness of advertisements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
- G06F21/6245—Protecting personal data, e.g. for financial or medical purposes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0241—Advertisements
- G06Q30/0251—Targeted advertisements
- G06Q30/0269—Targeted advertisements based on user profile or attribute
- G06Q30/0271—Personalized advertisement
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Game Theory and Decision Science (AREA)
- Evolutionary Biology (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Entrepreneurship & Innovation (AREA)
- Medical Informatics (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Databases & Information Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a method, a device, a server and a storage medium for predicting the click rate of privacy protection advertisements, wherein the method comprises the following steps: issuing the global model to each client so that each client trains a local model, and obtaining a weight updating vector through calculating the gradient of a factorization machine component and a deep learning component respectively; calculating the similarity between the clients; clustering all clients by adopting a clustering federal learning algorithm to generate a global model for each cluster; in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached; and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so as to calculate the advertisement click rate of the candidate advertisement of the user. The invention protects the privacy and safety of the client data while maintaining the usability of the federal learning model.
Description
Technical Field
The invention relates to a method and a device for predicting the click rate of an advertisement with privacy protection, a server and a storage medium, and belongs to the technical field of user privacy protection.
Background
The efficient prediction of the advertisement click rate plays a crucial role in improving the efficiency of advertisement putting. In order to provide personalized click-through rate prediction for users and capture interaction relations among different features so as to estimate the correlation between the users and advertisements, deep learning is introduced into the field by the industry and academia. Google corporation proposed a Wide & Deep model that performs feature learning in parallel through a linear algorithm with cross product and a Deep neural network layer, thereby capturing feature interaction relationships for advertisement recommendation.
In order to better capture the high-order interaction between the features, on the basis of Wide & Deep, the Deep FM model combines a decomposition machine and a Deep neural network to model the feature interaction. Compared with other advertisement click rate prediction strategies, the DeepFM not only has the function of FM, can learn the interaction relation of the features in sparse data, but also can use deep learning to construct a neural network for feature learning.
Traditional advertisement click rate prediction directly uploads user data to a central server for centralized training during model training. User data contains a lot of privacy sensitive information, and the original data is uploaded to a server without protection, so that privacy disclosure is caused.
Disclosure of Invention
In view of the above, the present invention provides a method, an apparatus, a server and a storage medium for predicting a privacy protection advertisement click rate, which can balance accuracy and privacy of an advertisement click rate prediction algorithm in different client data non-independent and same-distribution scenarios, that is, protect privacy and security of client data while maintaining availability of a federal learning model.
The invention aims to provide a privacy-protecting advertisement click rate prediction method.
A third object of the present invention is to provide a privacy-preserving advertisement click-through rate prediction apparatus.
A third object of the present invention is to provide a server.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a privacy protection advertisement click rate prediction method is applied to a server and comprises the following steps:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
Further, the clustering all the clients by using a clustering federal learning algorithm to generate a global model for each cluster specifically includes:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
Further, the splitting of the clusters means: the near stagnation point of the federal learning objective function of the current cluster exists, and a stable point that a certain client does not reach the local loss function exists in the cluster.
Further, the method further comprises:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
Further, the gradient of the factorizer component is calculated as follows:
wherein,the parameters of the kth client model are represented, x represents the characteristics of users, each user has n, and theta represents the general name of the model parameters.
Further, the gradient of the deep learning component is calculated as follows:
where t represents the iteration round, DkRepresents user data of the kth client, and SGD () represents a random gradient descent method.
Further, the similarity between the clients is calculated as follows:
wherein alpha isi,jRepresents the cosine similarity between the ith client and the jth client, Delta thetaiWeight update vector, Δ θ, representing the ith clientjA weight update vector representing the jth client.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a privacy-preserving advertisement click-through rate prediction device applied to a server comprises:
the model training module is used for issuing the global model to each client so that each client trains the local model according to local user data, obtains a weight updating vector through the gradients of the calculation factor decomposition machine component and the deep learning component respectively and uploads the weight updating vector to the server;
the similarity calculation module is used for receiving the weight update vectors uploaded by the clients and calculating the similarity between the clients according to the weight update vectors uploaded by the clients;
the clustering module is used for clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients so as to generate a global model for each cluster;
the model updating module is used for issuing the global model to all the clients in each cluster so as to update the local models of all the clients in the cluster until the global model converges or reaches the maximum turn;
and the advertisement click rate prediction module is used for receiving a request sent by a client of a certain user and issuing the global model to the client of the user in the corresponding cluster so as to enable the client of the user to calculate the advertisement click rate of the candidate advertisement of the user through the local model.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a server comprises a processor and a memory for storing a program executable by the processor, and when the processor executes the program stored by the memory, the method for predicting the click rate of the privacy protection advertisement is realized.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program which, when executed by a processor, implements the privacy preserving advertisement click-through rate prediction method described above.
Compared with the prior art, the invention has the following beneficial effects:
1. the method is realized based on a federal factorization machine, and can balance the accuracy and privacy of the advertisement click rate prediction algorithm under different client data non-independent same-distribution scenes, namely, the privacy safety of the client data is protected while the availability of a federal learning model is maintained; the existing centralized factorization machine is optimized, and a federal factorization machine is introduced, specifically, distributed training of federal learning enables a client not to directly upload user original data to a server, and only gradient information of the client is used for updating a model; the idea of clustering federal learning is introduced into a factorization machine, so that the advertisement click rate prediction capable of protecting the privacy of users is realized, and the linear aggregation model loss caused by the heterogeneity of user data can be solved.
2. The method maintains higher model precision, and under a Tencent2019 training set, the final global model precision is improved by 8% compared with the traditional Federal matrix decomposition model, and the Federal learning accuracy is improved by 2.5% compared with that of a single global model.
3. The invention enhances the privacy of the advertisement click rate prediction algorithm and provides a good solution for privacy protection under the federal advertisement recommendation of data heterogeneous scenes.
4. The invention designs a user-level distributed factorization machine which can be applied to a federal learning framework, and ensures that the original data of a user cannot be locally generated during model training, thereby reducing the risk of privacy disclosure of the user.
5. The invention adopts a mechanism of clustering federal learning, and improves the accuracy of the advertisement click rate prediction algorithm when the client data is heterogeneous.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a schematic diagram of a privacy preserving advertisement click-through rate prediction framework according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of a privacy-preserving advertisement click-through rate prediction method according to embodiment 1 of the present invention.
Fig. 3 is a flowchart of clustering all clients according to embodiment 1 of the present invention.
Fig. 4 is a block diagram of an advertisement click-through rate prediction apparatus according to embodiment 2 of the present invention.
Fig. 5 is a block diagram of a server according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
in order to protect user privacy, a federated learning framework is introduced into advertisement click rate prediction, a federated learning security aggregation strategy learns the model parameters of a client, and meanwhile, the original data is guaranteed not to be out of the local, and due to the heterogeneity of user data, a simple model aggregation strategy (such as FedSGD and FedAvg) can cause performance reduction, and even model divergence is caused under the condition of extreme Non-IID of the user data.
In the face of the data heterogeneous challenge in the federal scenario, a security aggregation policy of a data subset shared by clients, adding a near-end item to an objective function, and the like can help solve the above challenge. However, the utility of the technique is not high due to the large time overhead of such calculations. The clustering loss term proposed by Sattler et al uses cosine similarity to overcome the problem of model divergence when clients have different data distributions. On the basis, the embodiment provides an advertisement click-through rate prediction framework based on a federal factorization machine, and the advertisement click-through rate prediction framework reduces the influence of data isomerism on model training by multi-centralized federal learning.
As shown in fig. 1, the privacy protection advertisement click-through rate prediction framework of the present embodiment includes two parts, the first part is an advertisement platform, which is implemented by a server, and has a plurality of clusters, the client data in the same cluster are distributed in the same manner, and the click-through probability of a user is predicted by a global model in the clusters; the second part is the user's client, which collects, analyzes and uploads model fades to the corresponding clusters of the advertising platform, primarily locally.
In the advertising platform, assuming that there is a group of candidate advertisements, which are denoted as D ═ D1, D2.. multidot.dm ], there are multiple advertising clusters in the advertising platform, which are composed of similar users, and the advertising model can be learned from characteristics of the ID, title and the like of the advertisements, when the client of user u sends a request to the advertising platform, the advertising platform will calculate the advertisement click rate of the candidate advertisements of user u in the corresponding clusters, which are denoted as y1, y 2.. multidot.ym, respectively, and send the selected partial advertisement list to the user, so as to realize personalized advertisement recommendation to user u; in the client, the local model training user features contain user personal information and advertisement click behavior data, the local model is trained locally using the user data, and the gradient of the local model is sent to the advertisement platform to update the global model.
As shown in fig. 1 and fig. 2, the present embodiment provides a method for predicting a click-through rate of a privacy-preserving advertisement, which is implemented on the basis of an advertisement platform (i.e., a server) of the above-mentioned frame for predicting a click-through rate of a privacy-preserving advertisement, and includes the following steps:
s201, issuing the global model to each client so that each client trains the local model according to local user data, obtaining a weight updating vector through calculating gradients of the factorization machine component and the deep learning component respectively, and uploading the weight updating vector to a server.
In this embodiment, the global model is a model to be trained, and because the model training requires multiple rounds, in the first round, the model to be trained is an initial global model, and in each subsequent round, the model to be trained is a global model obtained in the previous round.
After the server issues the global model to each client, each client receives the global model, trains the local model according to local user data to obtain a weight updating vector, and uploads the weight updating vector to the server, so that the server obtains a new round of global model through calculation; furthermore, the user characteristics of the local user data comprise user personal information and advertisement clicking behavior data, and since the user characteristics of the original user data are very sparse, the low-order and high-order characteristic interaction can be further learned only by converting the original user data into a dense vector through an embedded layer, and then a new continuous vector is obtained according to the uniform mapping sent by the server; furthermore, compared with the centralized Deep Factorization (Deep FM) model estimation, the model estimation formula in the distributed scenario is more complex, and a weight update vector of each client is obtained by respectively calculating gradients of a Factorization Machine (FM) component and a Deep learning component by using a chain calculation method, wherein the Deep learning component is a Deep Neural Network (DNN) component.
Calculating the gradient of the factorizer component as follows:
wherein,the parameters of the kth client model are represented, x represents the characteristics of users, each user has n, and theta represents the general name of the model parameters.
Calculating the gradient of the deep learning component, and each client executes multiple random gradient drops to iteratively update the local model, wherein the gradient of the kth client in the tth round is as follows:
where t represents the iteration round, DkRepresents user data of the kth client, and SGD () represents a random gradient descent method.
And the gradient of the factorization machine component and the deep learning component obtained through calculation is the weight updating vector.
S202, receiving the weight updating vectors uploaded by the clients, and calculating the similarity between the clients according to the weight updating vectors uploaded by the clients.
In this embodiment, the similarity between the clients is a cosine similarity, which is as follows:
wherein alpha isi,jRepresents the cosine similarity between the ith client and the jth client, Delta thetaiWeight update vector, Δ θ, representing the ith clientjRepresents the jth clientThe vector is updated with the weights of (1).
And S203, clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model.
Further, as shown in fig. 3, the step S203 specifically includes:
s2031, clustering all the clients by adopting a clustering federal learning algorithm, judging whether splitting occurs, if so, entering step S2032, and if not, entering step S2033.
In this embodiment, the client is observed at a fixed point (stagnation point) theta*Gradient changes, when the data distribution within a cluster is inconsistent, the smooth solution of the federal learning objective function in the cluster cannot be smooth in a single client; conversely, if the data distribution is consistent, the objective function optimization in the cluster will be able to reach the optimal solution of the local risk functions for all clients. Thus, as the objective function approaches the stationary point, the norm of the gradient of the client will approach zero, and therefore the occurrence of the split is determined by the following two points:
(1) approximate stagnation point theta of currently clustered Federal learning objective function*The following formula:
wherein D isiRepresenting the user data of the ith client, ∈1Representing a hyper-parameter, the specific value being determined experimentally, gk() Representing the objective function of the kth client.
(2) There is a stable point in the cluster where a client does not reach the local penalty function, as follows:
maxi=1,...,M||gk(θ*)||>ε2>0 (5)
wherein epsilon2Representing a hyper-parameter, the specific value being determined experimentally, gk() Representing the objective function of the kth client.
S2032, dividing all clients into two clusters, and enabling each cluster to generate a global model.
In this embodiment, cluster federation learning recursively divides all clients into two clusters from top to bottom, which may minimize the maximum similarity between clients of different clusters.
S2033, judging whether the global model is converged, if the global model is not converged and does not reach the maximum round, entering step S2034, if the global model is converged or reaches the maximum round, ending the model training, and entering step S205 after the model training is ended.
S2034, regarding all the clients as a cluster, generating a global model for the cluster, and then proceeding to step S204.
And S204, in each cluster, issuing the global model to all the clients in the cluster so that all the clients in the cluster update the local model, uploading the weight update vector to the server, and returning to the step S202 until the global model converges or reaches the maximum round.
S205, receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in a corresponding cluster, so that the client of the user can calculate the advertisement click rate of the candidate advertisement of the user through the local model.
In this embodiment, a client of a certain user sends a request to a server, the server issues a global model to the client of the user in a corresponding cluster, and after receiving the global model, the client of the user calculates an advertisement click rate of a candidate advertisement of the user through a local model by using locally stored original user data (user personal information and click data), thereby realizing personalized click rate prediction.
S206, according to the advertisement click rate of the candidate advertisement of the user, sending the selected partial advertisement list to the client of the user, and realizing the personalized advertisement recommendation of the user.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 4, the present embodiment provides a privacy-preserving advertisement click-through rate prediction apparatus, which is applied to a server, and includes a model training module 401, a similarity calculation module 402, a clustering module 403, a model updating module 404, an advertisement click-through rate prediction module 405, and an advertisement recommendation module 406, where specific functions of each module are as follows:
and the model training module 401 is configured to send the global model to each client, so that each client trains the local model according to the local user data, and obtains a weight update vector through the gradients of the computer factorization machine component and the deep learning component.
A similarity calculation module 402, configured to receive the weight update vector uploaded by each client, and calculate a similarity between the clients according to the weight update vector uploaded by each client;
the clustering module 403 is configured to cluster all the clients by using a clustering federal learning algorithm according to the similarity between the clients, so that each cluster generates a global model;
a model update module 404, configured to send the global model to all clients in each cluster, so that all clients in the cluster update the local model until the global model converges or reaches a maximum turn;
the advertisement click rate prediction module 405 is configured to receive a request sent by a client of a certain user, and issue the global model to the client of the user in a corresponding cluster, so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
And the advertisement recommending module 406 is configured to send the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to implement personalized advertisement recommendation for the user.
The specific implementation of each module in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that, the apparatus provided in this embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be distributed by different functional modules according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
as shown in fig. 5, the present embodiment provides a server, which includes a processor 502, a memory and a network interface 503 connected by a system bus 501, the processor is used for providing computing and control capability, the memory includes a nonvolatile storage medium 504 and an internal memory 505, the nonvolatile storage medium 504 stores an operating system, a computer program and a database, the internal memory 505 provides an environment for the operating system and the computer program in the nonvolatile storage medium to run, and when the processor 502 executes the computer program stored in the memory, the privacy-preserving advertisement click rate prediction method of the above embodiment 1 is implemented, as follows:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
Further, the clustering all the clients by using a clustering federal learning algorithm to generate a global model for each cluster specifically includes:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
Further, the method may further include:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the method for predicting a click rate of a privacy-preserving advertisement according to embodiment 1 above is implemented as follows:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
Further, the clustering all the clients by using a clustering federal learning algorithm to generate a global model for each cluster specifically includes:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
Further, the method may further include:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In conclusion, the method is realized based on the federal factorization machine, and can balance the accuracy and privacy of the advertisement click rate prediction algorithm under different client data non-independent same-distribution scenes, namely, the privacy safety of the client data is protected while the availability of the federal learning model is maintained; the existing centralized factorization machine is optimized, and a federal factorization machine is introduced, specifically, distributed training of federal learning enables a client not to directly upload user original data to a server, and only gradient information of the client is used for updating a model; the idea of clustering federal learning is introduced into a factorization machine, so that the advertisement click rate prediction capable of protecting the privacy of users is realized, and the linear aggregation model loss caused by the heterogeneity of user data can be solved.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.
Claims (10)
1. A privacy protection advertisement click rate prediction method is applied to a server and is characterized by comprising the following steps:
issuing the global model to each client so that each client trains a local model according to local user data, obtaining a weight updating vector through calculating gradients of a factorization machine component and a deep learning component respectively, and uploading the weight updating vector to a server;
receiving weight updating vectors uploaded by each client, and calculating the similarity between the clients according to the weight updating vectors uploaded by each client;
clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients, so that each cluster generates a global model;
in each cluster, issuing the global model to all clients in the cluster so that all clients in the cluster update the local model until the global model converges or the maximum round is reached;
and receiving a request sent by a client of a certain user, and issuing the global model to the client of the user in the corresponding cluster so that the client of the user calculates the advertisement click rate of the candidate advertisement of the user through the local model.
2. The privacy-preserving advertisement click-through rate prediction method according to claim 1, wherein the clustering all clients by using a clustering federal learning algorithm so that each cluster generates a global model specifically comprises:
clustering all clients by adopting a clustering federal learning algorithm, and judging whether splitting occurs;
if the splitting occurs, dividing all the clients into two clusters, and enabling each cluster to generate a global model;
if the split does not occur, judging whether the global model is converged;
and if the global model is not converged and does not reach the maximum round, all the clients are used as a cluster, and the cluster is used for generating a global model.
3. The privacy-preserving advertisement click-through rate prediction method according to claim 2, wherein the splitting of the clusters is: the near stagnation point of the federal learning objective function of the current cluster exists, and a stable point that a certain client does not reach the local loss function exists in the cluster.
4. The privacy preserving advertisement click-through rate prediction method according to claim 1, further comprising:
and sending the selected partial advertisement list to the client of the user according to the advertisement click rate of the candidate advertisement of the user, so as to realize personalized advertisement recommendation to the user.
5. The privacy preserving advertisement click-through rate prediction method according to any one of claims 1-4, wherein a gradient of a factorizer component is calculated as follows:
6. The privacy preserving advertisement click-through rate prediction method according to any one of claims 1-4, wherein a gradient of a deep learning component is calculated as follows:
where t represents the iteration round, DkRepresents user data of the kth client, and SGD () represents a random gradient descent method.
7. The privacy-preserving advertisement click-through rate prediction method according to any one of claims 1-4, wherein the similarity between the clients is calculated as follows:
wherein alpha isi,jRepresents the cosine similarity between the ith client and the jth client, Delta thetaiWeight update vector, Δ θ, representing the ith clientjA weight update vector representing the jth client.
8. A privacy-preserving advertisement click-through rate prediction device applied to a server is characterized by comprising:
the model training module is used for issuing the global model to each client so that each client trains the local model according to local user data, obtains a weight updating vector through the gradients of the calculation factor decomposition machine component and the deep learning component respectively and uploads the weight updating vector to the server;
the similarity calculation module is used for receiving the weight update vectors uploaded by the clients and calculating the similarity between the clients according to the weight update vectors uploaded by the clients;
the clustering module is used for clustering all the clients by adopting a clustering federal learning algorithm according to the similarity among the clients so as to generate a global model for each cluster;
the model updating module is used for issuing the global model to all the clients in each cluster so as to update the local models of all the clients in the cluster until the global model converges or reaches the maximum turn;
and the advertisement click rate prediction module is used for receiving a request sent by a client of a certain user and issuing the global model to the client of the user in the corresponding cluster so as to enable the client of the user to calculate the advertisement click rate of the candidate advertisement of the user through the local model.
9. A server comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the privacy preserving advertisement click-through rate prediction method of any one of claims 1-7.
10. A storage medium storing a program, wherein the program, when executed by a processor, implements the privacy-preserving advertisement click-through rate prediction method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110755722.8A CN113487351A (en) | 2021-07-05 | 2021-07-05 | Privacy protection advertisement click rate prediction method, device, server and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110755722.8A CN113487351A (en) | 2021-07-05 | 2021-07-05 | Privacy protection advertisement click rate prediction method, device, server and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113487351A true CN113487351A (en) | 2021-10-08 |
Family
ID=77939950
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110755722.8A Pending CN113487351A (en) | 2021-07-05 | 2021-07-05 | Privacy protection advertisement click rate prediction method, device, server and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113487351A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988207A (en) * | 2021-11-09 | 2022-01-28 | 长春理工大学 | Article recommendation method and system |
CN113988314A (en) * | 2021-11-09 | 2022-01-28 | 长春理工大学 | Cluster federal learning method and system for selecting client |
CN114595831A (en) * | 2022-03-01 | 2022-06-07 | 北京交通大学 | Federal learning method integrating adaptive weight distribution and personalized differential privacy |
CN115081003A (en) * | 2022-06-29 | 2022-09-20 | 西安电子科技大学 | Gradient leakage attack method under sampling aggregation framework |
CN115311692A (en) * | 2022-10-12 | 2022-11-08 | 深圳大学 | Federal pedestrian re-identification method, system, electronic device and storage medium |
CN117077817A (en) * | 2023-10-13 | 2023-11-17 | 之江实验室 | Personalized federal learning model training method and device based on label distribution |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310047A (en) * | 2020-02-20 | 2020-06-19 | 深圳前海微众银行股份有限公司 | Information recommendation method, device and equipment based on FM model and storage medium |
CN111507765A (en) * | 2020-04-16 | 2020-08-07 | 厦门美图之家科技有限公司 | Advertisement click rate prediction method and device, electronic equipment and readable storage medium |
WO2020229684A1 (en) * | 2019-05-16 | 2020-11-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Concepts for federated learning, client classification and training data similarity measurement |
CN112364943A (en) * | 2020-12-10 | 2021-02-12 | 广西师范大学 | Federal prediction method based on federal learning |
CN112396099A (en) * | 2020-11-16 | 2021-02-23 | 哈尔滨工程大学 | Click rate estimation method based on deep learning and information fusion |
CN112508203A (en) * | 2021-02-08 | 2021-03-16 | 同盾控股有限公司 | Federated data clustering method and device, computer equipment and storage medium |
WO2021115480A1 (en) * | 2020-06-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Federated learning method, device, equipment, and storage medium |
-
2021
- 2021-07-05 CN CN202110755722.8A patent/CN113487351A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020229684A1 (en) * | 2019-05-16 | 2020-11-19 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Concepts for federated learning, client classification and training data similarity measurement |
CN111310047A (en) * | 2020-02-20 | 2020-06-19 | 深圳前海微众银行股份有限公司 | Information recommendation method, device and equipment based on FM model and storage medium |
CN111507765A (en) * | 2020-04-16 | 2020-08-07 | 厦门美图之家科技有限公司 | Advertisement click rate prediction method and device, electronic equipment and readable storage medium |
WO2021115480A1 (en) * | 2020-06-30 | 2021-06-17 | 平安科技(深圳)有限公司 | Federated learning method, device, equipment, and storage medium |
CN112396099A (en) * | 2020-11-16 | 2021-02-23 | 哈尔滨工程大学 | Click rate estimation method based on deep learning and information fusion |
CN112364943A (en) * | 2020-12-10 | 2021-02-12 | 广西师范大学 | Federal prediction method based on federal learning |
CN112508203A (en) * | 2021-02-08 | 2021-03-16 | 同盾控股有限公司 | Federated data clustering method and device, computer equipment and storage medium |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113988207A (en) * | 2021-11-09 | 2022-01-28 | 长春理工大学 | Article recommendation method and system |
CN113988314A (en) * | 2021-11-09 | 2022-01-28 | 长春理工大学 | Cluster federal learning method and system for selecting client |
CN113988314B (en) * | 2021-11-09 | 2024-05-31 | 长春理工大学 | Clustering federation learning method and system for selecting clients |
CN113988207B (en) * | 2021-11-09 | 2024-09-24 | 长春理工大学 | Article recommendation method and system |
CN114595831A (en) * | 2022-03-01 | 2022-06-07 | 北京交通大学 | Federal learning method integrating adaptive weight distribution and personalized differential privacy |
CN115081003A (en) * | 2022-06-29 | 2022-09-20 | 西安电子科技大学 | Gradient leakage attack method under sampling aggregation framework |
CN115081003B (en) * | 2022-06-29 | 2024-04-02 | 西安电子科技大学 | Gradient leakage attack method under sampling aggregation framework |
CN115311692A (en) * | 2022-10-12 | 2022-11-08 | 深圳大学 | Federal pedestrian re-identification method, system, electronic device and storage medium |
CN115311692B (en) * | 2022-10-12 | 2023-07-14 | 深圳大学 | Federal pedestrian re-identification method, federal pedestrian re-identification system, electronic device and storage medium |
CN117077817A (en) * | 2023-10-13 | 2023-11-17 | 之江实验室 | Personalized federal learning model training method and device based on label distribution |
CN117077817B (en) * | 2023-10-13 | 2024-01-30 | 之江实验室 | Personalized federal learning model training method and device based on label distribution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113487351A (en) | Privacy protection advertisement click rate prediction method, device, server and storage medium | |
Yang et al. | A location-based factorization machine model for web service QoS prediction | |
Junaid et al. | Modeling an optimized approach for load balancing in cloud | |
Chen et al. | General functional matrix factorization using gradient boosting | |
CN106933649B (en) | Virtual machine load prediction method and system based on moving average and neural network | |
CN111191709A (en) | Continuous learning framework and continuous learning method of deep neural network | |
CN111382190A (en) | Object recommendation method and device based on intelligence and storage medium | |
Tang et al. | A factorization machine-based QoS prediction approach for mobile service selection | |
Kulkarni et al. | MapReduce framework based big data clustering using fractional integrated sparse fuzzy C means algorithm | |
WO2022043798A1 (en) | Automated query predicate selectivity prediction using machine learning models | |
WO2021062219A1 (en) | Clustering data using neural networks based on normalized cuts | |
Qiu et al. | Novel trajectory privacy protection method against prediction attacks | |
Qiao et al. | Mp-fedcl: Multi-prototype federated contrastive learning for edge intelligence | |
Wang et al. | DeepNetQoE: Self-adaptive QoE optimization framework of deep networks | |
CN116432037A (en) | Online migration learning method, device, equipment and storage medium | |
CN114764469A (en) | Content recommendation method and device, computer equipment and storage medium | |
CN113128526A (en) | Image recognition method and device, electronic equipment and computer-readable storage medium | |
Yaseen et al. | Cloud‐based video analytics using convolutional neural networks | |
Niu et al. | Short‐Term Power Load Point Prediction Based on the Sharp Degree and Chaotic RBF Neural Network | |
Wu et al. | [Retracted] FLOM: Toward Efficient Task Processing in Big Data with Federated Learning | |
CN111581420B (en) | Flink-based medical image real-time retrieval method | |
He | [Retracted] Ideological and Political Teaching Resource Sharing Method Based on Edge Computing | |
Yang et al. | Efficient Edge Data Management Framework for IIoT via Prediction-Based Data Reduction | |
Wang et al. | LearningChain: A Highly Scalable and Applicable Learning-Based Blockchain Performance Optimization Framework | |
Kummer | Dynamic Neural Network Architectural and Topological Adaptation and Related Methods--A Survey |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |