CN114564641A - Personalized multi-view federal recommendation system - Google Patents

Personalized multi-view federal recommendation system Download PDF

Info

Publication number
CN114564641A
CN114564641A CN202210150617.6A CN202210150617A CN114564641A CN 114564641 A CN114564641 A CN 114564641A CN 202210150617 A CN202210150617 A CN 202210150617A CN 114564641 A CN114564641 A CN 114564641A
Authority
CN
China
Prior art keywords
model
user
data
gradient
article
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210150617.6A
Other languages
Chinese (zh)
Inventor
张胜博
高明
束金龙
徐林昊
杜蓓
蔡文渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Hipu Intelligent Information Technology Co ltd
East China Normal University
Original Assignee
Shanghai Hipu Intelligent Information Technology Co ltd
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Hipu Intelligent Information Technology Co ltd, East China Normal University filed Critical Shanghai Hipu Intelligent Information Technology Co ltd
Priority to CN202210150617.6A priority Critical patent/CN114564641A/en
Publication of CN114564641A publication Critical patent/CN114564641A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an individualized multi-view federal recommendation system, which comprises a central server and a plurality of user clients, wherein any user client comprises a training module and a prediction module; the training module comprises a data distribution submodule, a gradient calculation submodule, a gradient aggregation submodule, a model updating submodule, a model fine-tuning submodule, a user data warehouse and an article data warehouse, and the training module is cooperated with the model updating submodule to complete the execution of a training algorithm to obtain a user submodel and an article submodel; the prediction module comprises a semantic computation submodule, an interactive computation submodule, a probability aggregation submodule, a probability sequencing submodule, a recommendation output submodule, a user model warehouse and an article model warehouse, and completes the execution of a prediction algorithm by mutual cooperation to obtain a recommended article sequence corresponding to any user client. The method has stronger scene adaptability, deeper characteristic mining of the bottom layer model, wider data source covered by the original input and better localized fine tuning of the global model.

Description

Personalized multi-view federal recommendation system
Technical Field
The invention belongs to the technical field of data science and big data, and particularly relates to a privacy protection-oriented personalized multi-view federal recommendation system based on multi-view learning, meta-learning and federal learning.
Technical Field
With the rapid development of information technology and internet technology, people step from the times of lack of information to the times of information overload. Taking an e-commerce platform as an example, in order to meet various demands of users, the amount of commodity information is rapidly expanding. On one hand, the user often gets lost in a massive commodity information space and cannot quickly and smoothly find the required commodity; on the other hand, a large number of commodities face the dilemma that the commodities cannot be accurately pushed to target people, and the double problems that a merchant is difficult to obtain customers and a platform is difficult to profit are caused. In this context, recommendation systems have come to mind. After more than twenty years of accumulation and precipitation, recommendation systems are now widely used in many fields such as e-commerce, social networking, online advertising, and streaming media. In recent years, along with the development of machine learning and deep learning technologies, the research enthusiasm of the industry and the academic community on the recommendation system is increased, and the method has obvious value for continuously mining potential problems in the recommendation system and making improvements synchronized with the era.
The recommendation system considers that the user's preference is hidden by the user's attribute and the data such as the historical interaction behavior with the article, and the specific group to which the user points is hidden by the article's attribute and the data such as the descriptive characters. Therefore, by analyzing the data and modeling the user and the articles, the recommendation system can realize the function of predicting the interest degree of the user to a certain article, and then actively recommend the articles to the user according to the interest degree, thereby realizing proper capture of the personality and the preference of the user, better exploring the long-tail information and being beneficial to obtaining more profits of market segments. The development of a recommendation system to date gradually realizes the steps from naive recommendation based on content, medium-order recommendation based on collaborative filtering and high-order recommendation algorithm based on deep learning, and larger-scale and more accurate recommendation becomes possible. At present, diversified data fusion is one of research hotspots of recommendation systems. Many scholars have attempted to introduce "multi-view learning" into the recommendation algorithm, and related experiments also show that effective utilization of multi-source data can significantly improve model prediction accuracy.
The success of the recommendation system benefits from the extensive collection, analysis and centralized storage of massive user data. The recommendation system inevitably uses some sensitive information of the user when implementing the aforementioned recommendation behavior. The information may include attribute data such as age, gender, address, etc. of the user, and interactive data such as browsing records, rating records, travel records, etc. Meanwhile, the rise of the mobile internet enables people to get rid of the constraint of the fixed terminal. By carrying devices such as portable smartphones, wearable devices, tablets, etc., people can work, socialize, shop, and other activities on the internet anytime and anywhere. The volume of user data has increased exponentially, and the form of user data has become more diverse. In addition to the aforementioned user attributes and interaction data, more private data such as health, geographic location, etc. is also collected by the mobile device in real-time and periodically sent to third parties for data mining related services. Indeed, such services can enhance the user experience to some extent, bringing convenience to the user. However, the above information touches the privacy red line of the user, and once the information is abused or leaked, the information causes great trouble and brings immeasurable risks to the user.
In recent years, people pay more attention to personal privacy, the discussion of data ethics in the academic world is getting a lot of warmth, and the protection consciousness of large-scale companies on data security is increasing. How to better protect the privacy of users and enhance the security of data has become a global proposition. At present, laws and regulations for guaranteeing data security and maintaining user privacy are strengthened in all countries in the world. However, the emergence of relevant regulations and regulations has brought a completely new challenge to data interaction in artificial intelligence, i.e. a data barrier is formed between the platform and the user, and data sharing between the platform and third parties will be strictly limited and supervised. How to legally and appropriately solve the problem of data fragmentation and data isolation has become a major challenge for researchers and practitioners in the field of artificial intelligence. Thus, a company in the united states first proposed the concept of "federal learning" in 2016 in an attempt to break the data islanding and privacy protection challenges. Federal learning is a special distributed machine learning framework that requires that any collaborators participating in the federal process have their local raw data unexposed, effectively protecting user privacy and data security.
The prior art has either focused on the underlying design of the federal recommendation system or focused on the essential application of the federal recommendation system. They have made extensive exploration and beneficial realization for the federal recommendation system, but the invented technology itself still has some non-negligible limitations.
Specifically, the problems of the prior art can be summarized in the following four aspects:
(1) the federal recommendation method proposed by part of the technologies is oriented to specific scenarios and applications, and fails to provide a universal technical framework. For example, publication number "CN 113158241A" describes a post recommendation algorithm based on lateral federal learning. The resume characteristics of a plurality of users and the browsed target post characteristics are jointly trained, and finally, post matching is carried out while the resume privacy is protected, but the resume privacy-protecting method is only suitable for the scene of human resource management.
(2) The federal recommendation proposed by some technologies is based on traditional models and algorithms and fails to incorporate the leading edge neural networks. For example, publication number "CN 112287244A" proposes a product recommendation method based on federal learning. The method takes an early collaborative filtering model as a bottom algorithm to train similarity between users and products and between products, and clusters and sorts the products according to the similarity, but does not more fully mine user characteristics.
(3) The federal recommendation methods proposed in some of the technologies are directed to single interactive data and fail to address the user cold start problem. For example, publication number "CN 111339412A" shows a recommendation recall method based on longitudinal federal learning. The method only uses a user behavior data matrix as a training data source to generate an item recall corresponding to user data to be predicted, however, new users often do not have historical data, and the problem of cold start of the users is caused.
(4) The federal recommendation method proposed in some technologies trains a shared global model, but fails to adapt to client variability. For example, publication number "CN 113626687A" implements an online course recommendation system based on federal learning. The method calculates gradient according to local data and sends the gradient to a central server, the server completes gradient aggregation and returns the gradient aggregation to local equipment for updating, and users with large differences still share a set of model parameters to implement recommendation.
Disclosure of Invention
The invention solves the following four problems of the prior art:
(1) the universality of the algorithm framework is as follows: the algorithm framework designed by the prior art is difficult to be suitable for most application scenes and fields;
(2) the depth of the underlying model, namely: the bottom layer model used in the prior art is not properly introduced into a machine learning or deep learning model;
(3) the multi-source of the raw data, namely: the original input adopted by the prior art cannot contain data of different sources and different forms;
(4) the difference in data distribution, namely: the global model generated by the prior art has never completed personalized fine-tuning on the federated client.
The specific technical scheme for realizing the purpose of the invention is as follows:
a personalized multi-view federal recommendation system is characterized by comprising a central server and a plurality of user clients, wherein the internal structures of any one user client are the same, and a training module and a prediction module are contained in any one user client; data stream transmission is carried out inside the central server, between the central server and any one user client and inside any one user client; meanwhile, the transmission of the data stream adopts a synchronous transmission mode, namely, the data exchange among the modules is non-asynchronous and is allocated by a unified clock signal; the training module and the prediction module respectively comprise a plurality of sub-modules which are respectively used for completing a training task and a prediction task;
the central server comprises an updating coordination module and a data calculation module;
the data calculation module is used for respectively executing aggregation operation on the article gradient data and the user gradient data from a plurality of user clients, and the aggregation operation is carried out between the central server and any one user client;
the updating coordination module coordinates the transmission of the single gradient data from any user client and the aggregated gradient data from the data calculation module between the training module in any user client and the updating coordination module in the central server; the coordination is completed in the central server, and a safety aggregation protocol is used for ensuring that single gradient data entering the data calculation module in the data transmission process is remotely and safely aggregated; the remote security aggregation is to encrypt user gradient data or article gradient data from a plurality of user clients under the control of a security aggregation protocol and upload the encrypted user gradient data or article gradient data to a central server, and the central server decrypts the gradient data and then performs aggregation on the gradient data;
the training module in any user client comprises a data distribution sub-module, a gradient calculation sub-module, a gradient aggregation sub-module, a model updating sub-module, a model fine-tuning sub-module, a user data warehouse and an article data warehouse; the sub-modules in the training module and the data warehouse cooperate with each other to complete the execution of the training algorithm;
the user data warehouse and the article data warehouse are used for storing user data and article data in local equipment of any user client side respectively; the user data refers to a historical interaction behavior data set generated by a user in each application view on any user client; the item data refers to an item data set to be recommended, which is distributed to any user client by a recommendation service provider through a central server;
the data distribution submodule interacts with an updating coordination module in the central server and a model updating submodule in the training module and plays a role of bearing a data pivot from top to bottom; on one hand, uploading the gradient data after local security aggregation from the model updating submodule to a central server, and receiving an article data set and the gradient data after remote security aggregation from the central server; on the other hand, the gradient data after remote security aggregation from the central server is transmitted to the model updating submodule; the local security aggregation is to perform random sampling, gradient clipping and Gaussian noise addition on gradient data generated inside any user client and then perform aggregation;
the gradient calculation submodule calculates a gradient descending result after the article submodel and the user submodel in the training algorithm are subjected to iterative fitting according to a target function, and caches a local gradient descending aggregation result from the gradient aggregation submodule;
the gradient aggregation submodule aggregates the gradient descent result generated in the gradient calculation submodule and executes random sampling, gradient cutting and Gaussian noise on the gradient descent result so as to realize local safe aggregation of the gradient descent result;
the model updating submodule updates the current round of model training, namely the data distribution submodule respectively acquires the gradient of the article submodel and the gradient of the user submodel after remote safe aggregation from the central server, and the gradient of the article submodel and the gradient of the user submodel are respectively used for executing gradient descent on the article submodel and the user submodel; once the current training times reach a preset iteration upper limit value or the global model is converged, the model updating sub-module sends the global model to the model fine-tuning sub-module; the global model is a user sub-model and an article sub-model which are obtained after the model updating sub-module performs gradient descent on the user sub-model and the article sub-model by using the remote aggregation gradient;
the model fine-tuning submodule calls local user data and article data and conducts limited local training iteration on the global user submodel and the global article submodel respectively, so that the global model is more consistent with the data distribution of local data of any user, and the personalized fine tuning of the global model on any user client is completed;
the model parameters of the global model after personalized fine tuning are respectively stored in a user data warehouse and an article data warehouse, and are further transmitted to the user model warehouse and the article model warehouse in a prediction module adjacent to the training module through a data pipeline between the training module and the prediction module;
the prediction module in any user client comprises a semantic computation submodule, an interactive computation submodule, a probability aggregation submodule, a probability sequencing submodule, a recommendation output submodule, a user model warehouse and an article model warehouse; the sub-modules in the prediction module and the model warehouse cooperate with each other to complete the execution of the prediction algorithm;
the user model warehouse and the article model warehouse are used for storing the user model and the article model in local equipment of any client respectively;
the user model refers to a group of neural network parameters of a deep semantic matching model related to user data, which are obtained after a user model is trained by a training algorithm by using local user data of any user client;
the article model refers to a group of neural network parameters of a deep semantic matching model related to article data, which are obtained after article model training is carried out on any user client side by using local article data through a training algorithm;
the semantic computation submodule obtains a user semantic vector corresponding to the user model and an article semantic vector corresponding to the article model through a forward propagation process of a deep semantic matching network by respectively utilizing the user model and the article model;
the interaction calculation submodule calculates the posterior probability value of potential interaction between any user semantic vector and the item semantic vector;
the probability aggregation submodule performs aggregation on a plurality of posterior probability values output by the interactive calculation submodule to obtain the posterior probability value of any item to be recommended interacting on the current user client;
the probability sorting submodule sorts the posterior probability values of interaction of the plurality of items to be recommended output by the probability aggregation submodule on the current user client according to descending or ascending order;
and the recommendation output sub-module outputs the to-be-recommended articles corresponding to any probability in the probability sequence to obtain a recommended article sequence, and the personalized multi-view federal recommendation is completed.
The training algorithm specifically comprises:
a) data distribution phase
Distributing an article data set I to be recommended provided by a certain background system of application to be recommended to each user client by a central server S;
b) gradient calculation phase
In any user client view I, calculating the gradient of a user sub-model and an article sub-model according to the private user data of the ith view and a locally shared article data set I;
c) gradient polymerisation stage
The gradients of the user submodels and the article submodels are locally aggregated, and the local user submodels and the local article submodels after the local aggregation are encrypted and transmitted to the central server S to complete the global aggregation;
d) model update phase
The central server S transmits the global user sub-model gradient and the global article sub-model gradient after global aggregation to each user client side for updating the user sub-models and the article sub-models;
e) stage of model fine tuning
And after the global model training converges or reaches the set maximum iteration times, the sub-model on the user client randomly samples the private data of the sub-model, and limited batch training is performed again locally to finally obtain the recommended model which is subjected to multi-party and multi-view federal training and is subjected to personalized adaptation and fine tuning.
The prediction algorithm specifically comprises:
a) semantic computation phase
The user client pre-calculates semantic vectors of all E articles to be recommended by utilizing parameters provided by the article sub-model through a forward propagation process of the deep semantic matching model;
b) interactive computing phase
The user client side sequentially calculates the semantic vector of any user view through the forward propagation process of the deep semantic matching model by using the parameters provided by the user sub-model; then, calculating the posterior probability value of potential interaction between the semantic vector of any user view and the semantic vector of any item to be recommended
Figure BDA0003510287370000061
c) Probabilistic aggregation stage
A posteriori probability values for a number of potential interactions
Figure BDA0003510287370000062
Carrying out local security aggregation to obtain the posterior probability value of any item to be recommended interacting on the user client
Figure BDA0003510287370000063
d) Probability ordering stage
For a plurality of posterior probability values
Figure BDA0003510287370000064
Performing descending or ascending arrangement;
e) recommendation output phase
And taking out the to-be-recommended item sequence corresponding to the previous K probability values, wherein the to-be-recommended item sequence is the item sequence recommended to the user client.
Compared with the prior art, the invention mainly has the following beneficial effects:
(1) the algorithm framework of the invention is applicable to most federal recommendation application scenarios. The industry mostly uses content-based recommendation algorithms as the most basic algorithms because of their better interpretability. Specifically, the essence of the recommendation system is the calculation of similarity, and the content-based recommendation algorithm first constructs the images of the commodity and the user respectively, and then sorts the images according to the calculation results of the similarity of the commodity and the user, so as to generate the recommendation. The recommendation method and the recommendation system are designed and realized based on the recommendation algorithm of the content, so that the recommendation can be realized by using the system algorithm framework provided by the invention for any recommendation system based on the content, and the privacy of the user is effectively protected.
(2) The underlying model of the invention introduces a deep neural network to handle the massive features. With the explosive increase of the data volume on the user side and the commodity side, the industry has had a successful case of accessing the deep neural network model into the recommendation service thereof. The deep neural network can process mass data and can well fit complex relations in the data through various transformations, namely deeply mining the characteristics of commodities and users. According to the invention, a plurality of double-tower models based on the deep semantic matching model are arranged, so that the large-volume data on the two sides of the user side and the commodity side can be quickly processed, and the complex relation in the data on the two sides can be well fitted through various transformations.
(3) The original input of the present invention encompasses data from different sources and in different forms. Taking the user side as an example, portable devices are becoming more intelligent, interconnected and intercommunicating, and users store data of several different views, such as individual attributes, rating information, browsing records and health conditions, locally on the devices by using these devices and several applications hosted by them. For the recommendation system, if the multi-source view data can be safely, effectively and comprehensively utilized, it is bound to bring a considerable improvement and gain to the recommendation accuracy of the recommendation system. The present invention safely, efficiently and comprehensively utilizes a plurality of view data generated within a plurality of applications in a mobile portable device. In particular, the multi-view setting of the present invention alleviates the problem of recommending a cold start to some extent, and even if a user is a completely new user in a view (i.e. never has any historical interaction data), the user still benefits from combining other views with the global recommendation model. In the pre-experiment of the invention, MovieLens-100K (movie recommendation) is used as a training and testing data set, the maximum value of the global federal recommendation precision of single-source data only using multi-user and single-view is 0.8445, while the optimal value of the global federal recommendation precision of the invention using multi-user and multi-view data is 0.8986, which improves 5 percentage points and has obvious gain.
(4) The global model of the invention accomplishes personalized fine tuning on the federated client. The combination of federal learning and recommendation systems is still in the exploration phase and the prior art mostly uses classical federal averaging algorithms to complete the federal learning process. The federal averaging algorithm integrates the weights trained by each user in a simple and feasible manner to obtain a common fusion model. However, when the data on each client is not independently and simultaneously distributed, the global model often cannot meet the requirements of each client, i.e., personalized customization should be performed on each client. The method transplants the classical replay algorithm in the meta-learning into the training stage of multi-view federal recommendation, and realizes the iterative fine tuning of the federated global model on the user client side in a simple and feasible way. In the pre-experiment of the invention, the user client side generating the global optimum 0.8986 is used as an observation object, and after the fine adjustment step after the federal training is carried out, the recommended precision optimum is 0.9107, which is improved by 1.2 percentage points, and is close to 0.9202 of centralized training, and the effect is obvious.
Drawings
FIG. 1 is a diagram of a multi-user and multi-view application scenario;
FIG. 2 is a schematic structural view of the present invention;
FIG. 3 is a schematic diagram of a bottom model of the present invention;
FIG. 4 is a schematic diagram of a training algorithm of the present invention;
FIG. 5 is a schematic diagram of the prediction algorithm of the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
Referring to fig. 1, there are several user clients that are geographically dispersed and communicate via internet access. Within any one user client, there are one or more views, and the data generated within the views is stored on the client local device. Corresponding to the real world, one common application scenario is: the user client is a portable mobile device (e.g., a smart phone, a tablet computer, a smart watch, etc.) of a user, a plurality of application programs running on the device are a plurality of views, and a plurality of data (e.g., attribute fields, browsing records, rating, etc.) generated by the user interacting in the application programs are data generated by the user in the views. Theoretically, the data of the plurality of views can reflect the interest and preference of the user more comprehensively, and the information loss of the user in the current view is made up through the complementarity among the data of other views, so that the 'cold start' problem generally faced by a recommendation system is effectively relieved. Therefore, when article recommendation is performed on massive users, the personalized multi-view federal recommendation system comprehensively utilizes three computer technologies of multi-view learning, federal learning and meta learning, and achieves cooperation between multiple users and multiple views in an effective and safe mode, so that more accurate article recommendation is achieved.
Referring to FIG. 2, a block diagram of the system of the present invention is shown illustrating the module components and data flow of the system of the present invention. The system of the invention is composed of a central server and a plurality of user clients (for simplicity, only one user client is shown in fig. 2), wherein any user client comprises a training module and a prediction module, and data stream transmission is carried out inside the central server, between the central server and any user client and inside any user client. Meanwhile, the data stream is transmitted in a synchronous transmission mode, that is, data exchange among the modules is non-asynchronous and is regulated by a unified clock signal. In addition, the training module and the prediction module respectively comprise a plurality of sub-modules for completing the training task and the prediction task and a data or model warehouse.
The central server is composed of an updating coordination module and a data calculation module. The data calculation module respectively performs an aggregation operation on the item gradient data and the user gradient data from a plurality of user clients, and the aggregation operation is performed between the central server and any one client. The updating coordination module coordinates the transmission of the single gradient data from any user client and the aggregated gradient data from the data calculation module between the training module in any user client and the updating coordination module in the central server, the coordination operation is completed in the central server, and the remote safe aggregation of the single gradient data entering the data calculation module in the data transmission and collection process is ensured through a safe aggregation protocol. The remote security aggregation is to encrypt user gradient data or article gradient data from a plurality of user clients under the control of a security aggregation protocol and upload the encrypted user gradient data or article gradient data to a central server, and the central server decrypts the gradient data and then performs aggregation on the gradient data.
The internal structure of any user client is the same, and the whole user client can be divided into two main functional modules, namely a training module and a prediction module, and the two main modules respectively comprise a plurality of sub-modules for respectively completing a training task and a prediction task. The training module is composed of five submodules, namely a data distribution submodule, a gradiometer submodule, a gradient aggregation submodule, a model updating submodule and a model fine tuning submodule, and corresponds to five processing stages in a subsequent training algorithm. The training module comprises two data warehouses which are used for storing user data and article data on local equipment of any user side respectively; the user data refers to a historical interaction behavior data set generated by a user in each application view on any user client; the item data refers to a data set of items to be recommended, which is distributed to any user client by a recommendation service provider through a central server. The prediction module consists of five submodules, namely a semantic calculation submodule, an interactive calculation submodule, a probability aggregation submodule, a probability sequencing submodule and a recommendation output submodule, and corresponds to five processing stages in a subsequent prediction algorithm. The prediction module comprises two model warehouses which are used for storing a user model and an article model on local equipment of any client side respectively; the user model refers to a group of neural network parameters of a deep semantic matching model related to user data, which are obtained after a user model is trained by a training algorithm by using local user data of any user client; the article model refers to a group of neural network parameters of a deep semantic matching model related to article data, which are obtained after article model training is carried out on any user client side by using local article data through a training algorithm.
More specifically, in the training module of any client, the data distribution submodule interacts with both the update coordination module in the central server and the model update submodule in the training module, and plays a role of bearing the data hub: on one hand, uploading the gradient data after local security aggregation from the model updating submodule to a central server, and receiving an article data set and the gradient data after remote security aggregation from the central server; and on the other hand, the gradient data after remote security aggregation from the central server is transmitted to the model updating submodule. The local security aggregation is to perform random sampling, gradient clipping and gaussian noise addition on gradient data generated inside any user client and then perform aggregation. And the gradient calculation submodule calculates a gradient descending result after the article submodel and the user submodel in the training algorithm are subjected to iterative fitting according to the target function, and caches a local gradient descending aggregation result from the gradient aggregation submodule. And the gradient aggregation sub-module aggregates the gradient descent result generated in the gradient calculation sub-module and performs random sampling, gradient clipping and Gaussian noise on the gradient descent result so as to realize local safe aggregation of the gradient descent result. And the model updating submodule updates the current model training, namely the data distribution submodule respectively acquires the gradient of the article sub-model and the gradient of the user sub-model after the remote security aggregation from the central server, and respectively performs gradient descent on the article sub-model and the user sub-model by using the gradient of the article sub-model and the gradient of the user sub-model. And once the current training times reach a preset iteration upper limit or the global model is converged, the model updating sub-module sends the global user sub-model and the global article sub-model to the model fine-tuning sub-module. The global model is a user sub-model and an article sub-model obtained after gradient descent is performed on the user sub-model and the article sub-model by utilizing a remote aggregation gradient; global means that the new model is generated using gradient information from several user clients in combination. The model fine-tuning submodule calls local user data and article data again, and carries out local training iteration of limited rounds on the overall user submodel and article submodel respectively, so that the overall model is more consistent with the distribution of local data of any user, and the personalized fine tuning of the overall model on any user client is completed. The model parameters after fine tuning are respectively stored in a user data warehouse and an article data warehouse in the training module, and are further transmitted to the user model warehouse and the article model warehouse in the prediction module through a data pipeline between the training module and the prediction module. And a semantic calculation submodule inside the prediction module calculates a user semantic vector corresponding to the user model and an article semantic vector corresponding to the article model by respectively utilizing the user model and the article model through a forward propagation process of the deep semantic matching network. The two semantic vectors enter an interactive computation submodule together to obtain the posterior probability value of potential interaction between any user semantic vector and the item semantic vector. Further, the probability values are aggregated in a probability aggregation submodule to obtain a posterior probability value of interaction of any item to be recommended on the current user client. And then, sorting the probability of interaction of the plurality of items to be recommended on the current user client according to a descending order or an ascending order. And finally, outputting the object to be recommended corresponding to any probability in the probability sequence, obtaining the recommended object sequence on the current user client, and finishing personalized multi-view federal recommendation.
The system mainly comprises the sequential execution of two tasks: one is a training task, namely: the system completes the generation of the user model and the article model in the recommendation system in a mode of combining a plurality of user clients and a plurality of application views in the clients on the premise of practically protecting the privacy of the user; secondly, a prediction task is that: the system of the invention completes the generation of a recommendation list aiming at a specific user according to the user characteristics, the item characteristics, the corresponding user model and the item model on the premise of giving the item set to be recommended.
Next, three important components in the technical solution of the present invention will be described with emphasis. The first is the basic model and the basic method adopted by the invention, which mainly introduces: the existing depth model and parameter aggregation methods used in the present invention; the second is a training algorithm designed and implemented by the invention, which mainly describes: how to combine a plurality of clients and a plurality of views in the clients to complete the generation of a user model and an article model in a recommendation system in a safe and effective mode; the third is the prediction algorithm designed and implemented by the invention, which mainly explains: how to sort the item list to be recommended for a certain user in the recommendation system according to the user characteristics and the item characteristics and the user model and the item model.
First, basic model and basic method
(1) Deep semantic matching model
A Deep Semantic matching model (DSSM) is originally designed for search engines, and can extract Semantic vectors from a user's query word and candidate documents through a multi-layer neural network, and then measure the relevance of the query word and the candidate documents in the same Semantic space by using cosine similarity. In the technical scheme of the invention, DSSM is adopted as a basic model of a bottom layer and is extended to multi-view DSSM under the federal scene. In short, the DSSM model is an implicit semantic model with a multi-layer neural network structure that trains learning by maximizing the conditional probability given to documents clicked on by search keywords in training data by injecting search keywords and documents into a low-dimensional space and calculating the similarity of the two. The original paper of the DSSM model please review: https:// dl.acm.org/doi/abs/10.1145/2505515.2505665.
Referring to fig. 3, in the design of the recommendation system of the present invention, the DSSM model commonly used by search engines is transplanted into the recommendation algorithm and is extended to the multi-view DSSM under the federal recommendation scenario, thereby serving as the bottom model of the system of the present invention. As shown, the DSSM model can be viewed as a "double tower" structure, where the left tower represents the user's query and the right tower represents the document to be matched. The invention modifies the user query and the candidate document, namely: the user query of the DSSM is equivalent to the ith view U in the user client side in the inventioniAnd the candidate document in the present invention corresponds to the item set I (item data) to be recommended. The essence of the DSSM model is a multilayer neural network with two-way input and one-way output, which can convert any query or document corpus into corresponding semantic vectors, thereby judging whether correlation exists between the query semantic vectors and the document semantic vectors by calculating cosine similarity between the query semantic vectors and the document semantic vectors. This is contrary to the goal of recommendation systems, which are to measure the degree of correlation between a user and an item, to form preferences and to give recommendations. Referring to fig. 4, there are several DSSM models in any user client of the system of the present invention, whose double towers correspond to a certain user view and fixed item data, and the training algorithm of the system of the present invention aims to maximize the cosine similarity output by their tower tips.
More specifically, if x is the original feature vector of the query term and the candidate document, y is their semantic vector, li(i-2, 3, … N-1) is a hidden layer located in the middle of the DSSM model, WiIs the ith weight matrix, biF is a mapping function of the DSSM model for the ith bias term; note that, when the DSSM model is set to N layers, the 1 st layer is an input layer, the 2 nd to (N-1) th layers are hidden layers, and the N th layer is an output layer. Then, the forward propagation procedure of DSSM can be defined as:
l1=W1x,
li=f(Wili-1+bi)(i∈{2,3,…,N-1}),
y=f(WNlN-1+bN).
and the semantic relevance R between the query term Q and the candidate document D can be measured by the following formula:
Figure BDA0003510287370000111
wherein, yQAnd yDRespectively, a semantic vector, cosine (y), of the query term Q and a candidate document DQ,yD) Representation pair vector yQAnd vector yDPerforming a cosine similarity calculation, yQ TRepresentation pair vector yQTransposing, | | yQI denotes the vector yQDie length, | yDII denotes the vector yDDie length of (2).
It is assumed that a query is positively correlated with a document clicked after the query, and the parameters of the DSSM (i.e., the weight matrix W) are optimized based on this assumption, i.e., the conditional likelihood estimate that a document is clicked under a certain query is maximized. Therefore, it is necessary to obtain the posterior probability that a certain document is clicked under a certain query, which can be obtained by calculating the semantic correlation between the query and the document and applying the softmax function:
Figure BDA0003510287370000112
wherein, R (Q, D) refers to semantic correlation between the vector Q and the vector D, exp (Gamma R (Q, D ')) refers to a Gamma R (Q, D') index with a natural constant e as a base, Gamma refers to a smoothing coefficient, R (Q, D ') refers to semantic correlation between the vector Q and the vector D', Q refers to a certain query vector, D refers to a certain document vector*Refers to all candidate documents (including clicked positive examples and un-clicked negative examples, these positive and negative examples are collectively referred to as documents D'),
Figure BDA0003510287370000113
which refers to the conditional probability that vector D matches vector Q on the premise that it appears.
In the multi-view federal recommendation scenario applicable to the present invention, a multi-view data set on each distributed node (referred to as a "federal client") can be represented as Dn=(U1,I),…,(Ui,I),…,(UnI). Wherein all user view data sets
Figure BDA0003510287370000121
From n different views Ui(i-1, 2, … n) generating a data set of items to be recommended
Figure BDA0003510287370000122
And downloading from a server side, for example, a back-end service platform of a mobile application company providing recommendation service for the user. The present invention uses a depth model such as DSSM to separate user data sets from each application view level
Figure BDA0003510287370000123
And extracting corresponding semantic vectors from the article data set I. The objective of the training algorithm provided by the technical scheme of the invention is to find a non-linear mapping f (-) for each user view, so that the similarity sum of the mappings between all user view data sets U and item data sets I is maximized on each client in the same semantic space.
Specifically, the objective (loss) function for federal recommended training on any federal client is defined as follows:
Figure BDA0003510287370000124
wherein R (y)I,yi,j) Is a vector yIAnd vector yi,jSemantic correlation between, exp (γ R (y)I,yi,j) Gamma R (y) based on natural constant eI,yi,j) The exponent, γ, is the smoothing coefficient, R (y)I,fi(X′,Wi) Is a vector yISum vector fi(X′,Wi) Semantic relatedness between them, f (X', W)i) Is to map vector X' to vector WiA space of (W)iFor the ith weight matrix in the DSSM propagation process, exp (γ R (y)I,fi(X′,Wi) Y) is based on a natural constant eI,fi(X′,Wi) Is) an index, S represents a (user-item) pair
Figure BDA0003510287370000125
Is the number of positive samples (meaning that the user has interacted implicitly or explicitly with the item, e.g., "click" implicitly, "score" or "comment" explicitly), Λ represents the set of parameters that train the neural network, and i is the view U in sample jiThe subscript of (a), I is a sample of the item data set to be recommended, X' is a sample of the user data set, y is a projection result of the non-linear mapping f (X, y), and argmax refers to finding a variable value set that achieves the maximum function value.
(2) Local and remote security aggregation
Under the technical framework of federal learning, the invention can unite a plurality of participants to jointly train a globally shared federal model without exposing local raw data. Specifically, in the process of carrying out federal learning training, iterative updating of the sub-model in each participant locally or aggregate updating of the sub-model in the global model is completed by the sub-model in the central server, and the iterative updating and the aggregate updating are carried out by relying on intermediate result parameters such as gradients. However, there is a potential risk of simply transmitting such intermediate results. In one aspect, in a conventional federal learning setting, a federal client or central server may deviate from a preset federal learning protocol. For example, sending an error message to an honest user, or sharing view information with other users, etc.; on the other hand, in the federated multi-view learning setting of the present invention, some user views may be dishonest or completely malicious. For example, the malicious view acts as an application program which may monitor the network traffic or sub-model changes of other friendly views, and make null updates to its own local item model to deduce data updates of other friendly views, and even reverse the original data. Therefore, the invention provides the following two safety aggregation ideas and methods to better protect the privacy of the user and guarantee the data safety:
safe local polymerization method
The method is mainly used on a user client (namely a mobile device of a user) participating in the federal recommendation training, and two tasks are completed: firstly, safely aggregating a plurality of article sub-model gradients or user sub-model gradients of a plurality of user views; secondly, safely aggregating a plurality of posterior probabilities, each probability representing according to the user view UiThe extent to which an item matches the interests of a user. Specifically, before aggregating numerical data such as gradients or a posterior probabilities, a differential privacy protection technique is used to add gaussian noise to the gradients or the posterior probabilities one by one, thereby protecting the original gradients and probability values. Taking the gradient of the sub-model of the article as an example, the main steps of the local security aggregation comprise:
Figure BDA0003510287370000131
step 1: and (4) randomly sub-sampling. For N views on each user client, randomly sampling to obtain a view subset B (| B | N) during each round of federal training<N), where | B | refers to the size of the subset B.
Figure BDA0003510287370000132
Step 2: and (5) gradient cutting. Clipping is performed according to the L2 norm of each gradient. For example, sub-model gradients of an article
Figure BDA0003510287370000133
Is transformed into
Figure BDA0003510287370000134
Where C is the clipping threshold value, C is,
Figure BDA0003510287370000135
represents the selection 1 and
Figure BDA0003510287370000136
the maximum of the two.
Figure BDA0003510287370000137
And step 3: gaussian plus noise. Updating the sum to the gradient using a Gaussian mechanism
Figure BDA0003510287370000138
Adding random Gaussian noise to obtain a noise-added sub-model gradient
Figure BDA0003510287370000139
The calculation formula used is:
Figure BDA00035102873700001310
where σ is the standard deviation before and after the gradient clipping in step 2,
Figure BDA00035102873700001311
representing randomly generated real numbers 0 and σ2C2Noise within the range, | B | refers to the size of view subset B.
② remote safe polymerization method
The system is mainly used on a central server participating in federal recommendation training, and two tasks are completed: one is to securely aggregate several user sub-model gradients from several user clients
Figure BDA00035102873700001312
Second, securely aggregating a number of item sub-model gradients from a number of user clients
Figure BDA00035102873700001313
The parameter aggregation method at the central server side has been extensively studied and implemented in a traditional and generalized federal learning framework. Of these, the most classical is that of the method described by Bonawitz et al at 2017A Secure Aggregation Protocol (SAP) proposed in the CCS conference of the year, and the present invention also uses the SAP to implement remote Secure Aggregation. The protocol aims to realize that the aggregation server can only see the gradient after the aggregation is completed, and cannot know the true gradient value private to each user. The protocol is suitable for the situation that large-scale mobile terminals (such as mobile phones) jointly calculate the sum of respective inputs through a central server, but the input of a specific terminal is not leaked to the server or any terminal, so that the protocol also meets the application scene required by the invention. The protocol mainly utilizes four cryptographic methods of secret sharing, key agreement, authentication encryption and digital signature. In particular, since the steps and derivation related to the protocol are complicated, the description is only briefly made here, and please refer to the original paper for more details: https:// doi.org/10.1145/3133956.3133982.
Second, the training algorithm of the invention
Note that even though some users are new users in some views, i.e., without any interactive data, they can benefit from being joined to other views, and that collaboration and isolation between all N views in any user client has been achieved. The data set I of items to be recommended of a certain recommendation system has been distributed to the individual user clients by a reliable central server S. In any user client view I, calculating gradients of a user sub-model and an article sub-model according to private user data of the ith view and a locally shared article data set I, encrypting and transmitting the local gradients to a central server D after local aggregation to complete global aggregation, and then transmitting the aggregated global gradients back to each user client to update the models. And after the global model training converges or reaches the set maximum iteration times, randomly sampling local private data by the sub-model on the user client, and carrying out limited batch training locally to finally obtain the recommendation model which is subjected to multi-party multi-view federal training and personalized fine tuning.
Specifically, the steps of the training algorithm of the present invention are as follows:
user client of the present invention
Figure BDA0003510287370000141
Step 1: inputting a system: total number of views N in client, user data set U of ith viewiInteraction data y of the user and the item, item data set I to be recommended, and local data set D { (X)iY), i ∈ {1,2, …, N } } (where X isi=(UiI)), a total number of rounds of federal training T, a learning rate of federal training η, a number of meta-learning trimming iterations P, a total number of randomly sampled subsets within view data H, and a learning rate of meta-learning trimming e.
Figure BDA0003510287370000142
Step 2: initializing N user submodels:
Figure BDA0003510287370000143
initializing the article submodel: w is a group ofI 0. Where W represents a sub-model parameter.
Figure BDA0003510287370000144
And step 3: and (4) judging whether the k-th round of federal training is executed (k is initially set to be 1), if k is larger than T or the model is converged, skipping to the step 19, and if not, executing the step 4.
Figure BDA0003510287370000145
And 4, step 4: and (5) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, skipping to the step 9, otherwise, executing the step 5.
Figure BDA0003510287370000146
And 5: calculating the usage of the ith view after the kth round of trainingGradient of house model
Figure BDA0003510287370000147
The calculation formula used is:
Figure BDA0003510287370000148
where L refers to the loss function in the "base model" section.
Figure BDA0003510287370000149
Step 6: calculating the article sub-model gradient of the ith view after the k round of training
Figure BDA00035102873700001410
The calculation formula used is:
Figure BDA00035102873700001411
Figure BDA00035102873700001412
and 7: storing two gradients obtained after the ith view passes through the steps 5-6
Figure BDA00035102873700001413
And
Figure BDA00035102873700001414
Figure BDA00035102873700001415
and 8: and executing i to i +1, and repeating the step 4.
Figure BDA0003510287370000151
And step 9: performing local aggregation on a plurality of article sub-model gradients obtained after the step 4-8 is performed for a plurality of times
Figure BDA0003510287370000152
To obtain
Figure BDA0003510287370000153
The calculation formula used is:
Figure BDA0003510287370000154
Figure BDA0003510287370000155
step 10: remote aggregation of self-item sub-model gradients
Figure BDA0003510287370000156
And from other federated clients
Figure BDA0003510287370000157
To obtain
Figure BDA0003510287370000158
The calculation formula used is:
Figure BDA0003510287370000159
Figure BDA00035102873700001510
step 11: and (5) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, jumping to a step 14, otherwise, executing a step 12.
Figure BDA00035102873700001511
Step 12: user sub-model gradient for remote aggregation of ith view of user sub-model
Figure BDA00035102873700001512
And from other federated clients
Figure BDA00035102873700001513
To obtain
Figure BDA00035102873700001514
The calculation formula used is:
Figure BDA00035102873700001515
Figure BDA00035102873700001516
step 13: i +1 is executed and step 11 is repeated.
Figure BDA00035102873700001517
Step 14: updating the sub-model of the article to obtain the new sub-model parameter of the article in the k +1 th round
Figure BDA00035102873700001518
The calculation formula used is:
Figure BDA00035102873700001519
Figure BDA00035102873700001520
step 15: and (4) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, jumping to the step 18, otherwise, executing the step 16.
Figure BDA00035102873700001521
Step 16: updating the user submodel of the ith view to obtain the new user submodel parameters of the ith view in the (k + 1) th round
Figure BDA00035102873700001522
The calculation formula used is:
Figure BDA00035102873700001523
Figure BDA00035102873700001524
and step 17: carry out i ═ i +1, heavyAnd (5) repeating the step 15.
Figure BDA00035102873700001525
Step 18: and executing k to k +1, and repeating the step 3.
Figure BDA00035102873700001526
Step 19: and (4) judging whether the P-th round element learning fine tuning iteration is executed (P is initially set to be 1), if P is larger than P, skipping to the step 35, otherwise, executing the step 20. Wherein, when p is 1,
Figure BDA00035102873700001527
Figure BDA00035102873700001528
step 20: and (5) judging whether to enter the jth view (j is initially set to be 1), if j is larger than N, jumping to a step 32, and if not, executing the step 21.
Figure BDA00035102873700001529
Step 21: randomly sampling data in the jth view to obtain H subsets { S }1,S2,…,SHAnd H random data in the jth view are contained in each subset.
Figure BDA00035102873700001530
Step 22: and (4) judging whether the H-th round element learning iterative updating is executed or not (H initial is set to be 1), if H is larger than H, jumping to a step 27, and if not, executing a step 23.
Figure BDA00035102873700001531
Step 23: calculating the gradient of the user sub-model of the jth view after the h iteration round
Figure BDA00035102873700001532
The calculation formula used is:
Figure BDA00035102873700001533
wherein when the current execution is the p-th round element iteration, WI hIs initialized to WI T+p
Figure BDA00035102873700001534
Is initialized to
Figure BDA00035102873700001535
Xj=(ShI), y is ShThe interaction data in (1).
Figure BDA0003510287370000161
Step 24: calculating the article sub-model gradient of the jth view after the h iteration round
Figure BDA0003510287370000162
The calculation formula used is:
Figure BDA0003510287370000163
Figure BDA0003510287370000164
step 25: storing two gradients obtained after the jth view passes through the steps 23-24
Figure BDA0003510287370000165
And
Figure BDA0003510287370000166
Figure BDA0003510287370000167
step 26: h +1 is executed and step 22 is repeated.
Figure BDA0003510287370000168
Step 27: locally polymerizing a plurality of article sub-model gradients obtained after the step 22-26 is executed for a plurality of times
Figure BDA0003510287370000169
Obtaining (g)I)jThe calculation formula used is: (g)I)jLocal secure syndication
Figure BDA00035102873700001610
Figure BDA00035102873700001611
Step 28: likewise, a number of user sub-model gradients are locally aggregated
Figure BDA00035102873700001612
To obtain
Figure BDA00035102873700001613
The calculation formula used is:
Figure BDA00035102873700001614
Figure BDA00035102873700001615
step 29: updating the article sub-model of the jth view after the iteration h round to obtain the new article sub-model parameter of the jth view in the pth round
Figure BDA00035102873700001616
The calculation formula used is:
Figure BDA00035102873700001617
Figure BDA00035102873700001618
step 30: similarly, updating the user sub-model of the jth view after the iteration h round to obtain the jth viewNew user sub-model parameters mapped in the p-th round
Figure BDA00035102873700001619
The calculation formula used is:
Figure BDA00035102873700001620
Figure BDA00035102873700001621
Figure BDA00035102873700001622
step 31: j +1 is executed and step 20 is repeated.
Figure BDA00035102873700001623
Step 32: updating the sub-model of the article to obtain the new sub-model parameter W of the article after the p-th round of local fine tuning iteration is executedI T+p+1The calculation formula used is:
Figure BDA00035102873700001624
Figure BDA00035102873700001625
step 33: updating the user sub-model of the jth view to obtain new user sub-model parameters of the jth view after executing the pth round of local fine tuning iteration
Figure BDA00035102873700001626
The calculation formula used is:
Figure BDA00035102873700001627
Figure BDA00035102873700001628
Figure BDA00035102873700001629
step 34: execute p +1 and repeat step 19.
Figure BDA00035102873700001630
Step 35: and (3) system output: n user submodels:
Figure BDA00035102873700001631
an article sub-model: wI T+P
[ Central Server of the present invention ]
Figure BDA00035102873700001632
Step 1: inputting a system: the total number M of user clients participating in the federal recommendation process, the total number T of federal training rounds, and the gradient of model uploaded by any client after the k-th round of training
Figure BDA00035102873700001633
Figure BDA00035102873700001634
Step 2: and (4) judging whether the model gradient uploaded by each client after the kth round of training is processed or not (k is initially set to be 1), if k is larger than T, skipping to the step 6, and if not, executing the step 3.
Figure BDA0003510287370000171
And step 3: locally aggregating item or user sub-model gradients for several clients
Figure BDA0003510287370000172
To obtain
Figure BDA0003510287370000173
The formula used is:
Figure BDA0003510287370000174
Figure BDA0003510287370000175
and 4, step 4: performing k-k +1, aggregating the safely aggregated global model gradient
Figure BDA0003510287370000176
And transmitting the data back to each user client.
Figure BDA0003510287370000177
And 5: and judging whether the model is converged or not, wherein the judgment method is that the model can be considered to be converged if the loss function of the model does not decrease any more and tends to be stable. If the model is not converged, repeating the step 2; and if the model is converged, sending a convergence signal to each user client, and executing the step 6.
Figure BDA0003510287370000178
Step 6: the maximum number of training rounds has been exceeded or the model has converged, terminating the system program.
Thirdly, the prediction algorithm of the invention
Referring to fig. 5, it is assumed that the training phase of the personalized multi-view federal recommendation is successfully ended, and each user client has successfully acquired the locally deployed recommendation model. The recommendation model is composed of an item sub-model and several user sub-models, which will participate together in the local prediction phase. Firstly, a user client calculates semantic vectors of all E items to be recommended through a forward propagation process of a DSSM model; then, the user client side sequentially calculates the semantic vector of any view and the posterior probability value of interaction between any view and any object to be recommended
Figure BDA0003510287370000179
Then, these probability values are compared
Figure BDA00035102873700001710
Carrying out local security aggregation to obtain any item to be recommended in the local security aggregationPosterior probability value of interaction on user client
Figure BDA00035102873700001711
Finally, for these probability values
Figure BDA00035102873700001712
And performing descending (from large to small) arrangement, and outputting the item sequences to be recommended corresponding to the first K probability values in the arrangement to obtain an item list to be recommended by the user client.
Specifically, the steps of the prediction algorithm of the present invention are as follows:
[ user client of the present invention ]
Figure BDA00035102873700001713
Step 1: inputting a system: the total number N of the views in the client, the total number E of the items to be recommended, the number K of the recommended items and the user sub-model set
Figure BDA00035102873700001714
Article submodel WI T+PUser characteristics of ith view, item characteristics of jth item
Figure BDA00035102873700001715
Wherein
Figure BDA00035102873700001716
Is the user sub-model for the ith view.
Figure BDA00035102873700001717
And 2, step: and (4) judging whether the jth item to be recommended is processed or not (j is initially set to be 1), if j is larger than E, skipping to the step 5, and if not, executing the step 3.
Figure BDA00035102873700001718
And step 3: calculating semantic vector of jth item to be recommended
Figure BDA00035102873700001719
Wherein
Figure BDA00035102873700001720
The forward propagation computation procedure used is: { l1=W1x,li=f(Wili-1+bi)(i∈{2,3,…,N-1}),y=f(WNlN-1+bN) The specific explanation is shown in the basic model part.
Figure BDA00035102873700001721
And 4, step 4: and executing j to j +1, and repeating the step 2.
Figure BDA00035102873700001722
And 5: and (4) judging whether the jth item to be recommended is processed or not (j is initially set to be 1), if j is larger than E, skipping to the step 13, and if not, executing the step 6.
Figure BDA0003510287370000181
Step 6: and (4) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, jumping to the step 10, otherwise, executing the step 7.
Figure BDA0003510287370000182
And 7: computing semantic vectors for the ith user view
Figure BDA0003510287370000183
Wherein
Figure BDA0003510287370000184
The forward propagation calculation procedure used is as described in step 3.
Figure BDA0003510287370000185
And 8: computing user characteristics at a given ith view
Figure BDA0003510287370000186
On the premise of (1), article characteristics of the jth article
Figure BDA0003510287370000187
A posteriori probability values of interaction therewith
Figure BDA0003510287370000188
The calculation formula used is:
Figure BDA0003510287370000189
Figure BDA00035102873700001810
Figure BDA00035102873700001811
and step 9: and executing i to i +1, and repeating the step 6.
Figure BDA00035102873700001812
Step 10: performing local aggregation for a plurality of posterior probability values obtained after the step 6-9 is performed for a plurality of times
Figure BDA00035102873700001813
To obtain
Figure BDA00035102873700001814
The calculation formula used is:
Figure BDA00035102873700001815
Figure BDA00035102873700001816
Figure BDA00035102873700001817
step 11: storing posterior probability value of j to-be-recommended item interaction on client U
Figure BDA00035102873700001818
Figure BDA00035102873700001819
Step 12: and executing j to j +1, and repeating the step 5.
Figure BDA00035102873700001820
Step 13: a plurality of posterior probability values obtained by executing the steps 5-12 for a plurality of times
Figure BDA00035102873700001821
Figure BDA00035102873700001822
Arranged in descending order (from large to small).
Figure BDA00035102873700001823
Step 15: and (3) system output: and arranging the article sequences to be recommended corresponding to the first K probability values to obtain an article list to be recommended to the user client.
Examples
An application program A for providing streaming media, an application program B for providing film review book reviews and an application program C for providing interactive social interaction are arranged, are installed on smart phones of N users, and historical data of interaction with the users are generated. The application program A tries to further improve the accuracy and intelligence degree of the existing movie recommendation algorithm, so view cooperation is achieved with the application program B and the application program C; meanwhile, on the basis of the technical scheme provided by the invention, a personalized federal movie recommendation system combining three-party views is built. The three federally trained participants may each provide data as: the application a view may provide a record of the fields of movies to be recommended and movies clicked and viewed by the user, the application B view may provide a user's rating and rating of certain movies, and the application C may provide a series of personal information about the user's age, gender, location, occupation, hobbies, education, etc.
The specific implementation mode of the invention mainly comprises the following three steps: first, application a, application B, and application C each perform pre-processing of training labels and features within their respective views, where the features include user features and item features. For the recommendation service provider in this embodiment, the application a needs to perform preprocessing of the training tag, for example: will any explicit or implicit interaction take place<User, movie>The record pair is marked as 1, and the record pair without interaction is marked as 0; meanwhile, application a needs to perform preprocessing of user features, such as: performing singular value decomposition on a click matrix or a score matrix between a user and a movie to obtain a user characteristic vector; in addition, application a also needs to perform pre-processing of the article characteristics, such as: the title and the belonging type of the movie to be recommended in the database are encoded into a number of bits of item feature vectors using an N-Gram model. For the participants in the federal recommended training in this embodiment, for example, application C, the preprocessing of the user characteristics needs to be performed, such as: standardizing the age characteristics to 0-1 interval, setting the gender characteristics to 0 or 1, and constructing the occupation characteristics into a vector of one-hot coding. After the preprocessing, the tags and features will be stored in the view isolated from each other application. When the subsequent training and prediction stage needs to enter the view and ask for data, the aforementioned trusted execution environment can be constructed in advance and operated in the environment. And secondly, starting a central server (which can be played by a government agency or a reliable third party), downloading movie data sets I to be recommended to smart phones of N users, wherein each user device uses local view data of an application program A, an application program B and an application program C as user data, and starts federal movie recommendation training by using a training algorithm of the personalized multi-view federal recommendation system. Thirdly, completing the training and closing the central serverA film recommendation model I obtained by preorder training is deployed on each user equipment*And taking the local view data of the application program A, the application program B and the application program C as basic data, outputting the first K movies to be recommended which are most likely to be interested by the user by using the prediction algorithm of the personalized multi-view federal recommendation system, displaying the movies on a page of the application program A software, and finishing the recommendation.
It is further noted that, on a certain user equipment, the movie recommendation model I obtained by the invention*The model is subjected to T rounds of global federal training and P rounds of local fine tuning iteration. On the premise that user data cannot be local, the method jointly utilizes rich data of 3 views (an application program A, an application program B and an application program C) of N users, so that the method is suitable for any one of three parties, namely the method can be shared and used in the views of the application program A, the application program B and the application program C. In addition, only the example that application A tries to build the federal movie recommendation system is given here, and actually, application B or application C can also build the recommendation system required by itself. For example, the application B provides book item data to be recommended in advance, distributes a book item sub-model to each user device participating in federal training, and obtains the federal book recommendation system by imitating the training and prediction algorithm. This is possible because there is some similarity in the tastes of the users to the movies and books, and this similarity can be measured by similarity calculation of semantic vectors output by the movie features and book features via the DSSM model.
In particular, when the technical scheme of the invention is implemented, attention should be paid to control the quantity and quality of views participating in the federal recommendation training in a single user client, because related researches show that too many views can aggravate communication overhead and do not improve recommendation precision. In addition, in engineering practice, semantic vector calculation of the to-be-recommended articles can be performed offline in advance in a background of a recommendation service provider, and the to-be-recommended articles are directly distributed to each client by the central server after calculation is completed. Therefore, the calculation cost of the user terminal equipment can be greatly reduced, and the requirement of a recommendation service provider on updating the data set of the item to be recommended in real time or intermittently can be better met. In addition, in order to ensure that the technical scheme of the invention can produce favorable effects when being specifically implemented, a plurality of views participating in federal recommendation training should play a positive role in a final training target, and the training and sharing of the global model are completed in a mutual cooperation mode, so that mutual benefits and win-win are realized; in other words, too much redundant information, error information, noise information, and the like should not be included in the view data participating in a certain federal recommended training because related studies indicate that too much invalid data tends to negatively affect the model training. Finally, in order to further prevent some user views from being dishonest or even completely malicious, when the technical scheme of the invention is specifically implemented, view level mutual isolation is considered to be established, namely a feasible execution environment is established before any user client operates the system of the invention, and a safe area is opened in an operating system built by equipment, so that the situation that the dishonest or malicious view maliciously accesses private data of other friendly views is avoided.

Claims (3)

1. A personalized multi-view federal recommendation system is characterized by comprising a central server and a plurality of user clients, wherein the internal structures of any one user client are the same, and a training module and a prediction module are contained in any one user client; data stream transmission is carried out inside the central server, between the central server and any one user client and inside any one user client; meanwhile, the transmission of the data stream adopts a synchronous transmission mode, namely, the data exchange among the modules is non-asynchronous and is allocated by a unified clock signal; the training module and the prediction module respectively comprise a plurality of sub-modules which are respectively used for completing a training task and a prediction task;
the central server comprises an updating coordination module and a data calculation module;
the data calculation module is used for respectively executing aggregation operation on the article gradient data and the user gradient data from a plurality of user clients, and the aggregation operation is carried out between the central server and any one user client;
the updating coordination module coordinates the transmission of the single gradient data from any user client and the aggregated gradient data from the data calculation module between the training module in any user client and the updating coordination module in the central server; the coordination is completed in the central server, and a safety aggregation protocol is used for ensuring that single gradient data entering the data calculation module in the data transmission process is remotely and safely aggregated; the remote security aggregation is to encrypt user gradient data or article gradient data from a plurality of user clients under the control of a security aggregation protocol and upload the encrypted user gradient data or article gradient data to a central server, and the central server decrypts the gradient data and then performs aggregation on the gradient data;
the training module in any user client comprises a data distribution sub-module, a gradient calculation sub-module, a gradient aggregation sub-module, a model updating sub-module, a model fine-tuning sub-module, a user data warehouse and an article data warehouse; the sub-modules in the training module and the data warehouse cooperate with each other to complete the execution of the training algorithm;
the user data warehouse and the article data warehouse are used for storing user data and article data in local equipment of any user client side respectively; the user data refers to a historical interaction behavior data set generated by a user in each application view on any user client; the item data refers to an item data set to be recommended, which is distributed to any user client by a recommendation service provider through a central server;
the data distribution submodule interacts with an updating coordination module in the central server and a model updating submodule in the training module and plays a role of bearing a data pivot from top to bottom; on one hand, uploading the gradient data after local security aggregation from the model updating submodule to a central server, and receiving an article data set and the gradient data after remote security aggregation from the central server; on the other hand, the gradient data after remote security aggregation from the central server is transmitted to the model updating submodule; the local security aggregation is to perform random sampling, gradient clipping and Gaussian noise addition on gradient data generated inside any user client and then perform aggregation;
the gradient calculation submodule calculates a gradient descending result after the article submodel and the user submodel in the training algorithm are subjected to iterative fitting according to a target function, and caches a local gradient descending aggregation result from the gradient aggregation submodule;
the gradient aggregation sub-module aggregates the gradient descent result generated in the gradient calculation sub-module, and performs random sampling, gradient clipping and Gaussian noise on the gradient descent result, so that local safe aggregation of the gradient descent result is realized;
the model updating submodule updates the current model training, namely the data distribution submodule respectively acquires the gradient of the article sub-model and the gradient of the user sub-model after the remote safe aggregation from the central server, and respectively performs gradient descent on the article sub-model and the user sub-model by using the gradient of the article sub-model and the gradient of the user sub-model; once the current training times reach a preset iteration upper limit value or the global model is converged, the model updating sub-module sends the global model to the model fine-tuning sub-module; the global model is a user submodel and an article submodel which are obtained after the model updating submodule performs gradient descent on the user submodel and the article submodel by using the remote aggregation gradient;
the model fine-tuning sub-module calls local user data and article data, and performs local training iteration of limited rounds on the global user sub-model and the global article sub-model respectively, so that the global model is more consistent with the data distribution of local data of any user, and the personalized fine tuning of the global model on any user client is completed;
the model parameters of the global model after personalized fine tuning are respectively stored in a user data warehouse and an article data warehouse, and are further transmitted to the user model warehouse and the article model warehouse in a prediction module adjacent to the training module through a data pipeline between the training module and the prediction module;
the prediction module in any user client comprises a semantic computation submodule, an interactive computation submodule, a probability aggregation submodule, a probability sequencing submodule, a recommendation output submodule, a user model warehouse and an article model warehouse; the sub-modules in the prediction module and the model warehouse cooperate with each other to complete the execution of the prediction algorithm;
the user model warehouse and the article model warehouse are used for storing the user model and the article model in local equipment of any client respectively;
the user model refers to a group of neural network parameters of a deep semantic matching model related to user data, which are obtained after a user model is trained by a training algorithm by using local user data of any user client;
the article model refers to a group of neural network parameters of a deep semantic matching model related to article data, which are obtained after article model training is carried out on any user client side by using local article data through a training algorithm;
the semantic computation submodule obtains a user semantic vector corresponding to the user model and an article semantic vector corresponding to the article model through a forward propagation process of a deep semantic matching network by respectively utilizing the user model and the article model;
the interaction calculation submodule calculates the posterior probability value of potential interaction between any user semantic vector and the item semantic vector;
the probability aggregation submodule carries out aggregation on the posterior probability values output by the interactive computation submodule to obtain the posterior probability value of any item to be recommended, which is interacted on the current user client;
the probability sorting submodule sorts the posterior probability values of interaction of the plurality of items to be recommended on the current user client, which are output by the probability aggregation submodule, according to a descending order or an ascending order;
and the recommendation output sub-module outputs the to-be-recommended articles corresponding to any probability in the probability sequence to obtain a recommended article sequence, and the personalized multi-view federal recommendation is completed.
2. The personalized multi-view federated recommendation system of claim 1, wherein the training algorithm specifically comprises:
data distribution phase
Applying a certain item to be recommended provided by a background system to be recommended to a data set of the item to be recommended
Figure DEST_PATH_IMAGE002
By a central server
Figure DEST_PATH_IMAGE004
Distributing the data to each user client;
gradient calculation phase
At any one of the user client end views
Figure DEST_PATH_IMAGE006
In will be according to
Figure 786846DEST_PATH_IMAGE006
Private user data for individual views and locally shared item data sets
Figure 766304DEST_PATH_IMAGE002
Calculating the gradient of the user sub-model and the article sub-model;
gradient polymerisation stage
The gradients of the user sub-model and the article sub-model are respectively aggregated locally, and the local user sub-model gradient and the local article sub-model gradient after the local aggregation are respectively encrypted and transmitted to the central server
Figure 873937DEST_PATH_IMAGE004
Completing the global aggregation;
model update phase
Central server
Figure 534725DEST_PATH_IMAGE004
Returning the global user sub-model gradient and the global article sub-model gradient after the global aggregation is finished to each user client side for carrying out user sub-model and articleUpdating the model of the product;
stage of model fine tuning
And after the global model training converges or reaches the set maximum iteration times, the sub-model on the user client randomly samples the private data of the sub-model, and limited batch training is performed again locally to finally obtain the recommended model which is subjected to multi-party and multi-view federal training and is subjected to personalized adaptation and fine tuning.
3. The personalized multi-view federal recommendation system of claim 1, wherein the predictive algorithm specifically comprises:
semantic computation phase
The user client pre-calculates all parameters by using the parameters provided by the article sub-model through the forward propagation process of the deep semantic matching model
Figure DEST_PATH_IMAGE008
Semantic vectors of the items to be recommended;
interactive computing phase
The user client side sequentially calculates the semantic vector of any user view through the forward propagation process of the deep semantic matching model by using the parameters provided by the user sub-model; then, calculating the posterior probability value of potential interaction between the semantic vector of any user view and the semantic vector of any item to be recommended
Figure DEST_PATH_IMAGE010
Probabilistic aggregation stage
A posteriori probability values for a number of potential interactions
Figure 99830DEST_PATH_IMAGE010
Carrying out local security aggregation to obtain the posterior probability value of any item to be recommended interacting on the user client
Figure DEST_PATH_IMAGE012
Probability ordering stage
For a plurality of posterior probability values
Figure 811041DEST_PATH_IMAGE012
Performing descending or ascending arrangement;
recommendation output phase
Before taking out
Figure DEST_PATH_IMAGE014
And the object sequence to be recommended corresponds to the probability value, and the object sequence to be recommended is the object sequence recommended by the user client.
CN202210150617.6A 2022-02-18 2022-02-18 Personalized multi-view federal recommendation system Pending CN114564641A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210150617.6A CN114564641A (en) 2022-02-18 2022-02-18 Personalized multi-view federal recommendation system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210150617.6A CN114564641A (en) 2022-02-18 2022-02-18 Personalized multi-view federal recommendation system

Publications (1)

Publication Number Publication Date
CN114564641A true CN114564641A (en) 2022-05-31

Family

ID=81714225

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210150617.6A Pending CN114564641A (en) 2022-02-18 2022-02-18 Personalized multi-view federal recommendation system

Country Status (1)

Country Link
CN (1) CN114564641A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741611A (en) * 2022-06-08 2022-07-12 杭州金智塔科技有限公司 Federal recommendation model training method and system
CN116246749A (en) * 2023-05-11 2023-06-09 西南医科大学附属医院 Endocrine patient personalized health management system integrating electronic medical records
CN117454185A (en) * 2023-12-22 2024-01-26 深圳市移卡科技有限公司 Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741611A (en) * 2022-06-08 2022-07-12 杭州金智塔科技有限公司 Federal recommendation model training method and system
CN114741611B (en) * 2022-06-08 2022-10-14 杭州金智塔科技有限公司 Federal recommendation model training method and system
CN116246749A (en) * 2023-05-11 2023-06-09 西南医科大学附属医院 Endocrine patient personalized health management system integrating electronic medical records
CN116246749B (en) * 2023-05-11 2023-07-21 西南医科大学附属医院 Endocrine patient personalized health management system integrating electronic medical records
CN117454185A (en) * 2023-12-22 2024-01-26 深圳市移卡科技有限公司 Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium
CN117454185B (en) * 2023-12-22 2024-03-12 深圳市移卡科技有限公司 Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium

Similar Documents

Publication Publication Date Title
Zhu et al. Federated learning on non-IID data: A survey
Duan et al. JointRec: A deep-learning-based joint cloud video recommendation framework for mobile IoT
CN114564641A (en) Personalized multi-view federal recommendation system
Khrouf et al. Hybrid event recommendation using linked data and user diversity
Gao et al. A survey on heterogeneous federated learning
CN108446964B (en) User recommendation method based on mobile traffic DPI data
CN112836130A (en) Context-aware recommendation system and method based on federated learning
CN111553744A (en) Federal product recommendation method, device, equipment and computer storage medium
Yan et al. FedCDR: Privacy-preserving federated cross-domain recommendation
Zhang et al. Data quality in big data processing: Issues, solutions and open problems
Jagtap et al. Homogenizing social networking with smart education by means of machine learning and Hadoop: A case study
Zhang et al. Field-aware matrix factorization for recommender systems
Wu et al. A hybrid approach to service recommendation based on network representation learning
Wang et al. MuKGB-CRS: guarantee privacy and authenticity of cross-domain recommendation via multi-feature knowledge graph integrated blockchain
Anande et al. Generative adversarial networks for network traffic feature generation
CN114004363A (en) Method, device and system for jointly updating model
Peng et al. Tdsrc: A task-distributing system of crowdsourcing based on social relation cognition
Li et al. Federated low-rank tensor projections for sequential recommendation
Liu et al. A review of federated meta-learning and its application in cyberspace security
Lv et al. Dsmn: An improved recommendation model for capturing the multiplicity and dynamics of consumer interests
CN116467415A (en) Bidirectional cross-domain session recommendation method based on GCNsformer hybrid network and multi-channel semantics
Sah et al. Aggregation techniques in federated learning: Comprehensive survey, challenges and opportunities
Shen et al. Artificial Intelligence for Web 3.0: A Comprehensive Survey
Yan et al. Personalized POI recommendation based on subway network features and users’ historical behaviors
Xing et al. Distributed Model Interpretation for Vertical Federated Learning with Feature Discrepancy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination