CN114564641A

CN114564641A - Personalized multi-view federal recommendation system

Info

Publication number: CN114564641A
Application number: CN202210150617.6A
Authority: CN
Inventors: 张胜博; 高明; 束金龙; 徐林昊; 杜蓓; 蔡文渊
Original assignee: Shanghai Hipu Intelligent Information Technology Co ltd; East China Normal University
Current assignee: Shanghai Hipu Intelligent Information Technology Co ltd; East China Normal University
Priority date: 2022-02-18
Filing date: 2022-02-18
Publication date: 2022-05-31

Abstract

The invention discloses an individualized multi-view federal recommendation system, which comprises a central server and a plurality of user clients, wherein any user client comprises a training module and a prediction module; the training module comprises a data distribution submodule, a gradient calculation submodule, a gradient aggregation submodule, a model updating submodule, a model fine-tuning submodule, a user data warehouse and an article data warehouse, and the training module is cooperated with the model updating submodule to complete the execution of a training algorithm to obtain a user submodel and an article submodel; the prediction module comprises a semantic computation submodule, an interactive computation submodule, a probability aggregation submodule, a probability sequencing submodule, a recommendation output submodule, a user model warehouse and an article model warehouse, and completes the execution of a prediction algorithm by mutual cooperation to obtain a recommended article sequence corresponding to any user client. The method has stronger scene adaptability, deeper characteristic mining of the bottom layer model, wider data source covered by the original input and better localized fine tuning of the global model.

Description

Personalized multi-view federal recommendation system

Technical Field

The invention belongs to the technical field of data science and big data, and particularly relates to a privacy protection-oriented personalized multi-view federal recommendation system based on multi-view learning, meta-learning and federal learning.

Technical Field

With the rapid development of information technology and internet technology, people step from the times of lack of information to the times of information overload. Taking an e-commerce platform as an example, in order to meet various demands of users, the amount of commodity information is rapidly expanding. On one hand, the user often gets lost in a massive commodity information space and cannot quickly and smoothly find the required commodity; on the other hand, a large number of commodities face the dilemma that the commodities cannot be accurately pushed to target people, and the double problems that a merchant is difficult to obtain customers and a platform is difficult to profit are caused. In this context, recommendation systems have come to mind. After more than twenty years of accumulation and precipitation, recommendation systems are now widely used in many fields such as e-commerce, social networking, online advertising, and streaming media. In recent years, along with the development of machine learning and deep learning technologies, the research enthusiasm of the industry and the academic community on the recommendation system is increased, and the method has obvious value for continuously mining potential problems in the recommendation system and making improvements synchronized with the era.

The recommendation system considers that the user's preference is hidden by the user's attribute and the data such as the historical interaction behavior with the article, and the specific group to which the user points is hidden by the article's attribute and the data such as the descriptive characters. Therefore, by analyzing the data and modeling the user and the articles, the recommendation system can realize the function of predicting the interest degree of the user to a certain article, and then actively recommend the articles to the user according to the interest degree, thereby realizing proper capture of the personality and the preference of the user, better exploring the long-tail information and being beneficial to obtaining more profits of market segments. The development of a recommendation system to date gradually realizes the steps from naive recommendation based on content, medium-order recommendation based on collaborative filtering and high-order recommendation algorithm based on deep learning, and larger-scale and more accurate recommendation becomes possible. At present, diversified data fusion is one of research hotspots of recommendation systems. Many scholars have attempted to introduce "multi-view learning" into the recommendation algorithm, and related experiments also show that effective utilization of multi-source data can significantly improve model prediction accuracy.

The success of the recommendation system benefits from the extensive collection, analysis and centralized storage of massive user data. The recommendation system inevitably uses some sensitive information of the user when implementing the aforementioned recommendation behavior. The information may include attribute data such as age, gender, address, etc. of the user, and interactive data such as browsing records, rating records, travel records, etc. Meanwhile, the rise of the mobile internet enables people to get rid of the constraint of the fixed terminal. By carrying devices such as portable smartphones, wearable devices, tablets, etc., people can work, socialize, shop, and other activities on the internet anytime and anywhere. The volume of user data has increased exponentially, and the form of user data has become more diverse. In addition to the aforementioned user attributes and interaction data, more private data such as health, geographic location, etc. is also collected by the mobile device in real-time and periodically sent to third parties for data mining related services. Indeed, such services can enhance the user experience to some extent, bringing convenience to the user. However, the above information touches the privacy red line of the user, and once the information is abused or leaked, the information causes great trouble and brings immeasurable risks to the user.

In recent years, people pay more attention to personal privacy, the discussion of data ethics in the academic world is getting a lot of warmth, and the protection consciousness of large-scale companies on data security is increasing. How to better protect the privacy of users and enhance the security of data has become a global proposition. At present, laws and regulations for guaranteeing data security and maintaining user privacy are strengthened in all countries in the world. However, the emergence of relevant regulations and regulations has brought a completely new challenge to data interaction in artificial intelligence, i.e. a data barrier is formed between the platform and the user, and data sharing between the platform and third parties will be strictly limited and supervised. How to legally and appropriately solve the problem of data fragmentation and data isolation has become a major challenge for researchers and practitioners in the field of artificial intelligence. Thus, a company in the united states first proposed the concept of "federal learning" in 2016 in an attempt to break the data islanding and privacy protection challenges. Federal learning is a special distributed machine learning framework that requires that any collaborators participating in the federal process have their local raw data unexposed, effectively protecting user privacy and data security.

The prior art has either focused on the underlying design of the federal recommendation system or focused on the essential application of the federal recommendation system. They have made extensive exploration and beneficial realization for the federal recommendation system, but the invented technology itself still has some non-negligible limitations.

Specifically, the problems of the prior art can be summarized in the following four aspects:

(1) the federal recommendation method proposed by part of the technologies is oriented to specific scenarios and applications, and fails to provide a universal technical framework. For example, publication number "CN 113158241A" describes a post recommendation algorithm based on lateral federal learning. The resume characteristics of a plurality of users and the browsed target post characteristics are jointly trained, and finally, post matching is carried out while the resume privacy is protected, but the resume privacy-protecting method is only suitable for the scene of human resource management.

(2) The federal recommendation proposed by some technologies is based on traditional models and algorithms and fails to incorporate the leading edge neural networks. For example, publication number "CN 112287244A" proposes a product recommendation method based on federal learning. The method takes an early collaborative filtering model as a bottom algorithm to train similarity between users and products and between products, and clusters and sorts the products according to the similarity, but does not more fully mine user characteristics.

(3) The federal recommendation methods proposed in some of the technologies are directed to single interactive data and fail to address the user cold start problem. For example, publication number "CN 111339412A" shows a recommendation recall method based on longitudinal federal learning. The method only uses a user behavior data matrix as a training data source to generate an item recall corresponding to user data to be predicted, however, new users often do not have historical data, and the problem of cold start of the users is caused.

(4) The federal recommendation method proposed in some technologies trains a shared global model, but fails to adapt to client variability. For example, publication number "CN 113626687A" implements an online course recommendation system based on federal learning. The method calculates gradient according to local data and sends the gradient to a central server, the server completes gradient aggregation and returns the gradient aggregation to local equipment for updating, and users with large differences still share a set of model parameters to implement recommendation.

Disclosure of Invention

The invention solves the following four problems of the prior art:

(1) the universality of the algorithm framework is as follows: the algorithm framework designed by the prior art is difficult to be suitable for most application scenes and fields;

(2) the depth of the underlying model, namely: the bottom layer model used in the prior art is not properly introduced into a machine learning or deep learning model;

(3) the multi-source of the raw data, namely: the original input adopted by the prior art cannot contain data of different sources and different forms;

(4) the difference in data distribution, namely: the global model generated by the prior art has never completed personalized fine-tuning on the federated client.

The specific technical scheme for realizing the purpose of the invention is as follows:

a personalized multi-view federal recommendation system is characterized by comprising a central server and a plurality of user clients, wherein the internal structures of any one user client are the same, and a training module and a prediction module are contained in any one user client; data stream transmission is carried out inside the central server, between the central server and any one user client and inside any one user client; meanwhile, the transmission of the data stream adopts a synchronous transmission mode, namely, the data exchange among the modules is non-asynchronous and is allocated by a unified clock signal; the training module and the prediction module respectively comprise a plurality of sub-modules which are respectively used for completing a training task and a prediction task;

the central server comprises an updating coordination module and a data calculation module;

the data calculation module is used for respectively executing aggregation operation on the article gradient data and the user gradient data from a plurality of user clients, and the aggregation operation is carried out between the central server and any one user client;

the updating coordination module coordinates the transmission of the single gradient data from any user client and the aggregated gradient data from the data calculation module between the training module in any user client and the updating coordination module in the central server; the coordination is completed in the central server, and a safety aggregation protocol is used for ensuring that single gradient data entering the data calculation module in the data transmission process is remotely and safely aggregated; the remote security aggregation is to encrypt user gradient data or article gradient data from a plurality of user clients under the control of a security aggregation protocol and upload the encrypted user gradient data or article gradient data to a central server, and the central server decrypts the gradient data and then performs aggregation on the gradient data;

the training module in any user client comprises a data distribution sub-module, a gradient calculation sub-module, a gradient aggregation sub-module, a model updating sub-module, a model fine-tuning sub-module, a user data warehouse and an article data warehouse; the sub-modules in the training module and the data warehouse cooperate with each other to complete the execution of the training algorithm;

the user data warehouse and the article data warehouse are used for storing user data and article data in local equipment of any user client side respectively; the user data refers to a historical interaction behavior data set generated by a user in each application view on any user client; the item data refers to an item data set to be recommended, which is distributed to any user client by a recommendation service provider through a central server;

the data distribution submodule interacts with an updating coordination module in the central server and a model updating submodule in the training module and plays a role of bearing a data pivot from top to bottom; on one hand, uploading the gradient data after local security aggregation from the model updating submodule to a central server, and receiving an article data set and the gradient data after remote security aggregation from the central server; on the other hand, the gradient data after remote security aggregation from the central server is transmitted to the model updating submodule; the local security aggregation is to perform random sampling, gradient clipping and Gaussian noise addition on gradient data generated inside any user client and then perform aggregation;

the gradient calculation submodule calculates a gradient descending result after the article submodel and the user submodel in the training algorithm are subjected to iterative fitting according to a target function, and caches a local gradient descending aggregation result from the gradient aggregation submodule;

the gradient aggregation submodule aggregates the gradient descent result generated in the gradient calculation submodule and executes random sampling, gradient cutting and Gaussian noise on the gradient descent result so as to realize local safe aggregation of the gradient descent result;

the model updating submodule updates the current round of model training, namely the data distribution submodule respectively acquires the gradient of the article submodel and the gradient of the user submodel after remote safe aggregation from the central server, and the gradient of the article submodel and the gradient of the user submodel are respectively used for executing gradient descent on the article submodel and the user submodel; once the current training times reach a preset iteration upper limit value or the global model is converged, the model updating sub-module sends the global model to the model fine-tuning sub-module; the global model is a user sub-model and an article sub-model which are obtained after the model updating sub-module performs gradient descent on the user sub-model and the article sub-model by using the remote aggregation gradient;

the model fine-tuning submodule calls local user data and article data and conducts limited local training iteration on the global user submodel and the global article submodel respectively, so that the global model is more consistent with the data distribution of local data of any user, and the personalized fine tuning of the global model on any user client is completed;

the model parameters of the global model after personalized fine tuning are respectively stored in a user data warehouse and an article data warehouse, and are further transmitted to the user model warehouse and the article model warehouse in a prediction module adjacent to the training module through a data pipeline between the training module and the prediction module;

the prediction module in any user client comprises a semantic computation submodule, an interactive computation submodule, a probability aggregation submodule, a probability sequencing submodule, a recommendation output submodule, a user model warehouse and an article model warehouse; the sub-modules in the prediction module and the model warehouse cooperate with each other to complete the execution of the prediction algorithm;

the user model warehouse and the article model warehouse are used for storing the user model and the article model in local equipment of any client respectively;

the user model refers to a group of neural network parameters of a deep semantic matching model related to user data, which are obtained after a user model is trained by a training algorithm by using local user data of any user client;

the article model refers to a group of neural network parameters of a deep semantic matching model related to article data, which are obtained after article model training is carried out on any user client side by using local article data through a training algorithm;

the semantic computation submodule obtains a user semantic vector corresponding to the user model and an article semantic vector corresponding to the article model through a forward propagation process of a deep semantic matching network by respectively utilizing the user model and the article model;

the interaction calculation submodule calculates the posterior probability value of potential interaction between any user semantic vector and the item semantic vector;

the probability aggregation submodule performs aggregation on a plurality of posterior probability values output by the interactive calculation submodule to obtain the posterior probability value of any item to be recommended interacting on the current user client;

the probability sorting submodule sorts the posterior probability values of interaction of the plurality of items to be recommended output by the probability aggregation submodule on the current user client according to descending or ascending order;

and the recommendation output sub-module outputs the to-be-recommended articles corresponding to any probability in the probability sequence to obtain a recommended article sequence, and the personalized multi-view federal recommendation is completed.

The training algorithm specifically comprises:

a) data distribution phase

Distributing an article data set I to be recommended provided by a certain background system of application to be recommended to each user client by a central server S;

b) gradient calculation phase

In any user client view I, calculating the gradient of a user sub-model and an article sub-model according to the private user data of the ith view and a locally shared article data set I;

c) gradient polymerisation stage

The gradients of the user submodels and the article submodels are locally aggregated, and the local user submodels and the local article submodels after the local aggregation are encrypted and transmitted to the central server S to complete the global aggregation;

d) model update phase

The central server S transmits the global user sub-model gradient and the global article sub-model gradient after global aggregation to each user client side for updating the user sub-models and the article sub-models;

e) stage of model fine tuning

And after the global model training converges or reaches the set maximum iteration times, the sub-model on the user client randomly samples the private data of the sub-model, and limited batch training is performed again locally to finally obtain the recommended model which is subjected to multi-party and multi-view federal training and is subjected to personalized adaptation and fine tuning.

The prediction algorithm specifically comprises:

a) semantic computation phase

The user client pre-calculates semantic vectors of all E articles to be recommended by utilizing parameters provided by the article sub-model through a forward propagation process of the deep semantic matching model;

b) interactive computing phase

The user client side sequentially calculates the semantic vector of any user view through the forward propagation process of the deep semantic matching model by using the parameters provided by the user sub-model; then, calculating the posterior probability value of potential interaction between the semantic vector of any user view and the semantic vector of any item to be recommended

c) Probabilistic aggregation stage

A posteriori probability values for a number of potential interactions

Carrying out local security aggregation to obtain the posterior probability value of any item to be recommended interacting on the user client

d) Probability ordering stage

For a plurality of posterior probability values

Performing descending or ascending arrangement;

e) recommendation output phase

And taking out the to-be-recommended item sequence corresponding to the previous K probability values, wherein the to-be-recommended item sequence is the item sequence recommended to the user client.

Compared with the prior art, the invention mainly has the following beneficial effects:

(1) the algorithm framework of the invention is applicable to most federal recommendation application scenarios. The industry mostly uses content-based recommendation algorithms as the most basic algorithms because of their better interpretability. Specifically, the essence of the recommendation system is the calculation of similarity, and the content-based recommendation algorithm first constructs the images of the commodity and the user respectively, and then sorts the images according to the calculation results of the similarity of the commodity and the user, so as to generate the recommendation. The recommendation method and the recommendation system are designed and realized based on the recommendation algorithm of the content, so that the recommendation can be realized by using the system algorithm framework provided by the invention for any recommendation system based on the content, and the privacy of the user is effectively protected.

(2) The underlying model of the invention introduces a deep neural network to handle the massive features. With the explosive increase of the data volume on the user side and the commodity side, the industry has had a successful case of accessing the deep neural network model into the recommendation service thereof. The deep neural network can process mass data and can well fit complex relations in the data through various transformations, namely deeply mining the characteristics of commodities and users. According to the invention, a plurality of double-tower models based on the deep semantic matching model are arranged, so that the large-volume data on the two sides of the user side and the commodity side can be quickly processed, and the complex relation in the data on the two sides can be well fitted through various transformations.

(3) The original input of the present invention encompasses data from different sources and in different forms. Taking the user side as an example, portable devices are becoming more intelligent, interconnected and intercommunicating, and users store data of several different views, such as individual attributes, rating information, browsing records and health conditions, locally on the devices by using these devices and several applications hosted by them. For the recommendation system, if the multi-source view data can be safely, effectively and comprehensively utilized, it is bound to bring a considerable improvement and gain to the recommendation accuracy of the recommendation system. The present invention safely, efficiently and comprehensively utilizes a plurality of view data generated within a plurality of applications in a mobile portable device. In particular, the multi-view setting of the present invention alleviates the problem of recommending a cold start to some extent, and even if a user is a completely new user in a view (i.e. never has any historical interaction data), the user still benefits from combining other views with the global recommendation model. In the pre-experiment of the invention, MovieLens-100K (movie recommendation) is used as a training and testing data set, the maximum value of the global federal recommendation precision of single-source data only using multi-user and single-view is 0.8445, while the optimal value of the global federal recommendation precision of the invention using multi-user and multi-view data is 0.8986, which improves 5 percentage points and has obvious gain.

(4) The global model of the invention accomplishes personalized fine tuning on the federated client. The combination of federal learning and recommendation systems is still in the exploration phase and the prior art mostly uses classical federal averaging algorithms to complete the federal learning process. The federal averaging algorithm integrates the weights trained by each user in a simple and feasible manner to obtain a common fusion model. However, when the data on each client is not independently and simultaneously distributed, the global model often cannot meet the requirements of each client, i.e., personalized customization should be performed on each client. The method transplants the classical replay algorithm in the meta-learning into the training stage of multi-view federal recommendation, and realizes the iterative fine tuning of the federated global model on the user client side in a simple and feasible way. In the pre-experiment of the invention, the user client side generating the global optimum 0.8986 is used as an observation object, and after the fine adjustment step after the federal training is carried out, the recommended precision optimum is 0.9107, which is improved by 1.2 percentage points, and is close to 0.9202 of centralized training, and the effect is obvious.

Drawings

FIG. 1 is a diagram of a multi-user and multi-view application scenario;

FIG. 2 is a schematic structural view of the present invention;

FIG. 3 is a schematic diagram of a bottom model of the present invention;

FIG. 4 is a schematic diagram of a training algorithm of the present invention;

FIG. 5 is a schematic diagram of the prediction algorithm of the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

Referring to fig. 1, there are several user clients that are geographically dispersed and communicate via internet access. Within any one user client, there are one or more views, and the data generated within the views is stored on the client local device. Corresponding to the real world, one common application scenario is: the user client is a portable mobile device (e.g., a smart phone, a tablet computer, a smart watch, etc.) of a user, a plurality of application programs running on the device are a plurality of views, and a plurality of data (e.g., attribute fields, browsing records, rating, etc.) generated by the user interacting in the application programs are data generated by the user in the views. Theoretically, the data of the plurality of views can reflect the interest and preference of the user more comprehensively, and the information loss of the user in the current view is made up through the complementarity among the data of other views, so that the 'cold start' problem generally faced by a recommendation system is effectively relieved. Therefore, when article recommendation is performed on massive users, the personalized multi-view federal recommendation system comprehensively utilizes three computer technologies of multi-view learning, federal learning and meta learning, and achieves cooperation between multiple users and multiple views in an effective and safe mode, so that more accurate article recommendation is achieved.

Referring to FIG. 2, a block diagram of the system of the present invention is shown illustrating the module components and data flow of the system of the present invention. The system of the invention is composed of a central server and a plurality of user clients (for simplicity, only one user client is shown in fig. 2), wherein any user client comprises a training module and a prediction module, and data stream transmission is carried out inside the central server, between the central server and any user client and inside any user client. Meanwhile, the data stream is transmitted in a synchronous transmission mode, that is, data exchange among the modules is non-asynchronous and is regulated by a unified clock signal. In addition, the training module and the prediction module respectively comprise a plurality of sub-modules for completing the training task and the prediction task and a data or model warehouse.

The central server is composed of an updating coordination module and a data calculation module. The data calculation module respectively performs an aggregation operation on the item gradient data and the user gradient data from a plurality of user clients, and the aggregation operation is performed between the central server and any one client. The updating coordination module coordinates the transmission of the single gradient data from any user client and the aggregated gradient data from the data calculation module between the training module in any user client and the updating coordination module in the central server, the coordination operation is completed in the central server, and the remote safe aggregation of the single gradient data entering the data calculation module in the data transmission and collection process is ensured through a safe aggregation protocol. The remote security aggregation is to encrypt user gradient data or article gradient data from a plurality of user clients under the control of a security aggregation protocol and upload the encrypted user gradient data or article gradient data to a central server, and the central server decrypts the gradient data and then performs aggregation on the gradient data.

The internal structure of any user client is the same, and the whole user client can be divided into two main functional modules, namely a training module and a prediction module, and the two main modules respectively comprise a plurality of sub-modules for respectively completing a training task and a prediction task. The training module is composed of five submodules, namely a data distribution submodule, a gradiometer submodule, a gradient aggregation submodule, a model updating submodule and a model fine tuning submodule, and corresponds to five processing stages in a subsequent training algorithm. The training module comprises two data warehouses which are used for storing user data and article data on local equipment of any user side respectively; the user data refers to a historical interaction behavior data set generated by a user in each application view on any user client; the item data refers to a data set of items to be recommended, which is distributed to any user client by a recommendation service provider through a central server. The prediction module consists of five submodules, namely a semantic calculation submodule, an interactive calculation submodule, a probability aggregation submodule, a probability sequencing submodule and a recommendation output submodule, and corresponds to five processing stages in a subsequent prediction algorithm. The prediction module comprises two model warehouses which are used for storing a user model and an article model on local equipment of any client side respectively; the user model refers to a group of neural network parameters of a deep semantic matching model related to user data, which are obtained after a user model is trained by a training algorithm by using local user data of any user client; the article model refers to a group of neural network parameters of a deep semantic matching model related to article data, which are obtained after article model training is carried out on any user client side by using local article data through a training algorithm.

More specifically, in the training module of any client, the data distribution submodule interacts with both the update coordination module in the central server and the model update submodule in the training module, and plays a role of bearing the data hub: on one hand, uploading the gradient data after local security aggregation from the model updating submodule to a central server, and receiving an article data set and the gradient data after remote security aggregation from the central server; and on the other hand, the gradient data after remote security aggregation from the central server is transmitted to the model updating submodule. The local security aggregation is to perform random sampling, gradient clipping and gaussian noise addition on gradient data generated inside any user client and then perform aggregation. And the gradient calculation submodule calculates a gradient descending result after the article submodel and the user submodel in the training algorithm are subjected to iterative fitting according to the target function, and caches a local gradient descending aggregation result from the gradient aggregation submodule. And the gradient aggregation sub-module aggregates the gradient descent result generated in the gradient calculation sub-module and performs random sampling, gradient clipping and Gaussian noise on the gradient descent result so as to realize local safe aggregation of the gradient descent result. And the model updating submodule updates the current model training, namely the data distribution submodule respectively acquires the gradient of the article sub-model and the gradient of the user sub-model after the remote security aggregation from the central server, and respectively performs gradient descent on the article sub-model and the user sub-model by using the gradient of the article sub-model and the gradient of the user sub-model. And once the current training times reach a preset iteration upper limit or the global model is converged, the model updating sub-module sends the global user sub-model and the global article sub-model to the model fine-tuning sub-module. The global model is a user sub-model and an article sub-model obtained after gradient descent is performed on the user sub-model and the article sub-model by utilizing a remote aggregation gradient; global means that the new model is generated using gradient information from several user clients in combination. The model fine-tuning submodule calls local user data and article data again, and carries out local training iteration of limited rounds on the overall user submodel and article submodel respectively, so that the overall model is more consistent with the distribution of local data of any user, and the personalized fine tuning of the overall model on any user client is completed. The model parameters after fine tuning are respectively stored in a user data warehouse and an article data warehouse in the training module, and are further transmitted to the user model warehouse and the article model warehouse in the prediction module through a data pipeline between the training module and the prediction module. And a semantic calculation submodule inside the prediction module calculates a user semantic vector corresponding to the user model and an article semantic vector corresponding to the article model by respectively utilizing the user model and the article model through a forward propagation process of the deep semantic matching network. The two semantic vectors enter an interactive computation submodule together to obtain the posterior probability value of potential interaction between any user semantic vector and the item semantic vector. Further, the probability values are aggregated in a probability aggregation submodule to obtain a posterior probability value of interaction of any item to be recommended on the current user client. And then, sorting the probability of interaction of the plurality of items to be recommended on the current user client according to a descending order or an ascending order. And finally, outputting the object to be recommended corresponding to any probability in the probability sequence, obtaining the recommended object sequence on the current user client, and finishing personalized multi-view federal recommendation.

The system mainly comprises the sequential execution of two tasks: one is a training task, namely: the system completes the generation of the user model and the article model in the recommendation system in a mode of combining a plurality of user clients and a plurality of application views in the clients on the premise of practically protecting the privacy of the user; secondly, a prediction task is that: the system of the invention completes the generation of a recommendation list aiming at a specific user according to the user characteristics, the item characteristics, the corresponding user model and the item model on the premise of giving the item set to be recommended.

Next, three important components in the technical solution of the present invention will be described with emphasis. The first is the basic model and the basic method adopted by the invention, which mainly introduces: the existing depth model and parameter aggregation methods used in the present invention; the second is a training algorithm designed and implemented by the invention, which mainly describes: how to combine a plurality of clients and a plurality of views in the clients to complete the generation of a user model and an article model in a recommendation system in a safe and effective mode; the third is the prediction algorithm designed and implemented by the invention, which mainly explains: how to sort the item list to be recommended for a certain user in the recommendation system according to the user characteristics and the item characteristics and the user model and the item model.

First, basic model and basic method

(1) Deep semantic matching model

A Deep Semantic matching model (DSSM) is originally designed for search engines, and can extract Semantic vectors from a user's query word and candidate documents through a multi-layer neural network, and then measure the relevance of the query word and the candidate documents in the same Semantic space by using cosine similarity. In the technical scheme of the invention, DSSM is adopted as a basic model of a bottom layer and is extended to multi-view DSSM under the federal scene. In short, the DSSM model is an implicit semantic model with a multi-layer neural network structure that trains learning by maximizing the conditional probability given to documents clicked on by search keywords in training data by injecting search keywords and documents into a low-dimensional space and calculating the similarity of the two. The original paper of the DSSM model please review: https:// dl.acm.org/doi/abs/10.1145/2505515.2505665.

Referring to fig. 3, in the design of the recommendation system of the present invention, the DSSM model commonly used by search engines is transplanted into the recommendation algorithm and is extended to the multi-view DSSM under the federal recommendation scenario, thereby serving as the bottom model of the system of the present invention. As shown, the DSSM model can be viewed as a "double tower" structure, where the left tower represents the user's query and the right tower represents the document to be matched. The invention modifies the user query and the candidate document, namely: the user query of the DSSM is equivalent to the ith view U in the user client side in the invention_iAnd the candidate document in the present invention corresponds to the item set I (item data) to be recommended. The essence of the DSSM model is a multilayer neural network with two-way input and one-way output, which can convert any query or document corpus into corresponding semantic vectors, thereby judging whether correlation exists between the query semantic vectors and the document semantic vectors by calculating cosine similarity between the query semantic vectors and the document semantic vectors. This is contrary to the goal of recommendation systems, which are to measure the degree of correlation between a user and an item, to form preferences and to give recommendations. Referring to fig. 4, there are several DSSM models in any user client of the system of the present invention, whose double towers correspond to a certain user view and fixed item data, and the training algorithm of the system of the present invention aims to maximize the cosine similarity output by their tower tips.

More specifically, if x is the original feature vector of the query term and the candidate document, y is their semantic vector, l_i(i-2, 3, … N-1) is a hidden layer located in the middle of the DSSM model, W_iIs the ith weight matrix, b_iF is a mapping function of the DSSM model for the ith bias term; note that, when the DSSM model is set to N layers, the 1 st layer is an input layer, the 2 nd to (N-1) th layers are hidden layers, and the N th layer is an output layer. Then, the forward propagation procedure of DSSM can be defined as:

l₁＝W₁x,

l_i＝f(W_il_i-1+b_i)(i∈{2,3,…,N-1}),

y＝f(W_Nl_N-1+b_N).

and the semantic relevance R between the query term Q and the candidate document D can be measured by the following formula:

wherein, y_QAnd y_DRespectively, a semantic vector, cosine (y), of the query term Q and a candidate document D_Q,y_D) Representation pair vector y_QAnd vector y_DPerforming a cosine similarity calculation, y_Q ^TRepresentation pair vector y_QTransposing, | | y_QI denotes the vector y_QDie length, | y_DII denotes the vector y_DDie length of (2).

It is assumed that a query is positively correlated with a document clicked after the query, and the parameters of the DSSM (i.e., the weight matrix W) are optimized based on this assumption, i.e., the conditional likelihood estimate that a document is clicked under a certain query is maximized. Therefore, it is necessary to obtain the posterior probability that a certain document is clicked under a certain query, which can be obtained by calculating the semantic correlation between the query and the document and applying the softmax function:

wherein, R (Q, D) refers to semantic correlation between the vector Q and the vector D, exp (Gamma R (Q, D ')) refers to a Gamma R (Q, D') index with a natural constant e as a base, Gamma refers to a smoothing coefficient, R (Q, D ') refers to semantic correlation between the vector Q and the vector D', Q refers to a certain query vector, D refers to a certain document vector^*Refers to all candidate documents (including clicked positive examples and un-clicked negative examples, these positive and negative examples are collectively referred to as documents D'),

which refers to the conditional probability that vector D matches vector Q on the premise that it appears.

In the multi-view federal recommendation scenario applicable to the present invention, a multi-view data set on each distributed node (referred to as a "federal client") can be represented as Dⁿ＝(U₁,I),…,(U_i,I),…,(U_nI). Wherein all user view data sets

From n different views U_i(i-1, 2, … n) generating a data set of items to be recommended

And downloading from a server side, for example, a back-end service platform of a mobile application company providing recommendation service for the user. The present invention uses a depth model such as DSSM to separate user data sets from each application view level

And extracting corresponding semantic vectors from the article data set I. The objective of the training algorithm provided by the technical scheme of the invention is to find a non-linear mapping f (-) for each user view, so that the similarity sum of the mappings between all user view data sets U and item data sets I is maximized on each client in the same semantic space.

Specifically, the objective (loss) function for federal recommended training on any federal client is defined as follows:

wherein R (y)_I,y_i,_j) Is a vector y_IAnd vector y_i,jSemantic correlation between, exp (γ R (y)_I,y_i,j) Gamma R (y) based on natural constant e_I,y_i,j) The exponent, γ, is the smoothing coefficient, R (y)_I,f_i(X′,W_i) Is a vector y_ISum vector f_i(X′,W_i) Semantic relatedness between them, f (X', W)_i) Is to map vector X' to vector W_iA space of (W)_iFor the ith weight matrix in the DSSM propagation process, exp (γ R (y)_I,f_i(X′,W_i) Y) is based on a natural constant e_I,f_i(X′,W_i) Is) an index, S represents a (user-item) pair

Is the number of positive samples (meaning that the user has interacted implicitly or explicitly with the item, e.g., "click" implicitly, "score" or "comment" explicitly), Λ represents the set of parameters that train the neural network, and i is the view U in sample j_iThe subscript of (a), I is a sample of the item data set to be recommended, X' is a sample of the user data set, y is a projection result of the non-linear mapping f (X, y), and argmax refers to finding a variable value set that achieves the maximum function value.

(2) Local and remote security aggregation

Under the technical framework of federal learning, the invention can unite a plurality of participants to jointly train a globally shared federal model without exposing local raw data. Specifically, in the process of carrying out federal learning training, iterative updating of the sub-model in each participant locally or aggregate updating of the sub-model in the global model is completed by the sub-model in the central server, and the iterative updating and the aggregate updating are carried out by relying on intermediate result parameters such as gradients. However, there is a potential risk of simply transmitting such intermediate results. In one aspect, in a conventional federal learning setting, a federal client or central server may deviate from a preset federal learning protocol. For example, sending an error message to an honest user, or sharing view information with other users, etc.; on the other hand, in the federated multi-view learning setting of the present invention, some user views may be dishonest or completely malicious. For example, the malicious view acts as an application program which may monitor the network traffic or sub-model changes of other friendly views, and make null updates to its own local item model to deduce data updates of other friendly views, and even reverse the original data. Therefore, the invention provides the following two safety aggregation ideas and methods to better protect the privacy of the user and guarantee the data safety:

safe local polymerization method

The method is mainly used on a user client (namely a mobile device of a user) participating in the federal recommendation training, and two tasks are completed: firstly, safely aggregating a plurality of article sub-model gradients or user sub-model gradients of a plurality of user views; secondly, safely aggregating a plurality of posterior probabilities, each probability representing according to the user view U_iThe extent to which an item matches the interests of a user. Specifically, before aggregating numerical data such as gradients or a posterior probabilities, a differential privacy protection technique is used to add gaussian noise to the gradients or the posterior probabilities one by one, thereby protecting the original gradients and probability values. Taking the gradient of the sub-model of the article as an example, the main steps of the local security aggregation comprise:

step 1: and (4) randomly sub-sampling. For N views on each user client, randomly sampling to obtain a view subset B (| B | N) during each round of federal training<N), where | B | refers to the size of the subset B.

Step 2: and (5) gradient cutting. Clipping is performed according to the L2 norm of each gradient. For example, sub-model gradients of an article

Is transformed into

Where C is the clipping threshold value, C is,

represents the selection 1 and

the maximum of the two.

And step 3: gaussian plus noise. Updating the sum to the gradient using a Gaussian mechanism

Adding random Gaussian noise to obtain a noise-added sub-model gradient

The calculation formula used is:

where σ is the standard deviation before and after the gradient clipping in step 2,

representing randomly generated real numbers 0 and σ²C²Noise within the range, | B | refers to the size of view subset B.

② remote safe polymerization method

The system is mainly used on a central server participating in federal recommendation training, and two tasks are completed: one is to securely aggregate several user sub-model gradients from several user clients

Second, securely aggregating a number of item sub-model gradients from a number of user clients

The parameter aggregation method at the central server side has been extensively studied and implemented in a traditional and generalized federal learning framework. Of these, the most classical is that of the method described by Bonawitz et al at 2017A Secure Aggregation Protocol (SAP) proposed in the CCS conference of the year, and the present invention also uses the SAP to implement remote Secure Aggregation. The protocol aims to realize that the aggregation server can only see the gradient after the aggregation is completed, and cannot know the true gradient value private to each user. The protocol is suitable for the situation that large-scale mobile terminals (such as mobile phones) jointly calculate the sum of respective inputs through a central server, but the input of a specific terminal is not leaked to the server or any terminal, so that the protocol also meets the application scene required by the invention. The protocol mainly utilizes four cryptographic methods of secret sharing, key agreement, authentication encryption and digital signature. In particular, since the steps and derivation related to the protocol are complicated, the description is only briefly made here, and please refer to the original paper for more details: https:// doi.org/10.1145/3133956.3133982.

Second, the training algorithm of the invention

Note that even though some users are new users in some views, i.e., without any interactive data, they can benefit from being joined to other views, and that collaboration and isolation between all N views in any user client has been achieved. The data set I of items to be recommended of a certain recommendation system has been distributed to the individual user clients by a reliable central server S. In any user client view I, calculating gradients of a user sub-model and an article sub-model according to private user data of the ith view and a locally shared article data set I, encrypting and transmitting the local gradients to a central server D after local aggregation to complete global aggregation, and then transmitting the aggregated global gradients back to each user client to update the models. And after the global model training converges or reaches the set maximum iteration times, randomly sampling local private data by the sub-model on the user client, and carrying out limited batch training locally to finally obtain the recommendation model which is subjected to multi-party multi-view federal training and personalized fine tuning.

Specifically, the steps of the training algorithm of the present invention are as follows:

user client of the present invention

Step 1: inputting a system: total number of views N in client, user data set U of ith view_iInteraction data y of the user and the item, item data set I to be recommended, and local data set D { (X)_iY), i ∈ {1,2, …, N } } (where X is_i＝(U_iI)), a total number of rounds of federal training T, a learning rate of federal training η, a number of meta-learning trimming iterations P, a total number of randomly sampled subsets within view data H, and a learning rate of meta-learning trimming e.

Step 2: initializing N user submodels:

initializing the article submodel: w is a group of_I ⁰. Where W represents a sub-model parameter.

And step 3: and (4) judging whether the k-th round of federal training is executed (k is initially set to be 1), if k is larger than T or the model is converged, skipping to the step 19, and if not, executing the step 4.

And 4, step 4: and (5) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, skipping to the step 9, otherwise, executing the step 5.

And 5: calculating the usage of the ith view after the kth round of trainingGradient of house model

The calculation formula used is:

where L refers to the loss function in the "base model" section.

Step 6: calculating the article sub-model gradient of the ith view after the k round of training

The calculation formula used is:

and 7: storing two gradients obtained after the ith view passes through the steps 5-6

And

and 8: and executing i to i +1, and repeating the step 4.

And step 9: performing local aggregation on a plurality of article sub-model gradients obtained after the step 4-8 is performed for a plurality of times

To obtain

The calculation formula used is:

step 10: remote aggregation of self-item sub-model gradients

And from other federated clients

To obtain

The calculation formula used is:

step 11: and (5) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, jumping to a step 14, otherwise, executing a step 12.

Step 12: user sub-model gradient for remote aggregation of ith view of user sub-model

And from other federated clients

To obtain

The calculation formula used is:

step 13: i +1 is executed and step 11 is repeated.

Step 14: updating the sub-model of the article to obtain the new sub-model parameter of the article in the k +1 th round

The calculation formula used is:

step 15: and (4) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, jumping to the step 18, otherwise, executing the step 16.

Step 16: updating the user submodel of the ith view to obtain the new user submodel parameters of the ith view in the (k + 1) th round

The calculation formula used is:

and step 17: carry out i ═ i +1, heavyAnd (5) repeating the step 15.

Step 18: and executing k to k +1, and repeating the step 3.

Step 19: and (4) judging whether the P-th round element learning fine tuning iteration is executed (P is initially set to be 1), if P is larger than P, skipping to the step 35, otherwise, executing the step 20. Wherein, when p is 1,

step 20: and (5) judging whether to enter the jth view (j is initially set to be 1), if j is larger than N, jumping to a step 32, and if not, executing the step 21.

Step 21: randomly sampling data in the jth view to obtain H subsets { S }₁,S₂,…,S_HAnd H random data in the jth view are contained in each subset.

Step 22: and (4) judging whether the H-th round element learning iterative updating is executed or not (H initial is set to be 1), if H is larger than H, jumping to a step 27, and if not, executing a step 23.

Step 23: calculating the gradient of the user sub-model of the jth view after the h iteration round

The calculation formula used is:

wherein when the current execution is the p-th round element iteration, W_I ^hIs initialized to W_I ^T+p，

Is initialized to

X_j＝(S_hI), y is S_hThe interaction data in (1).

Step 24: calculating the article sub-model gradient of the jth view after the h iteration round

The calculation formula used is:

step 25: storing two gradients obtained after the jth view passes through the steps 23-24

And

step 26: h +1 is executed and step 22 is repeated.

Step 27: locally polymerizing a plurality of article sub-model gradients obtained after the step 22-26 is executed for a plurality of times

Obtaining (g)_I)_jThe calculation formula used is: (g)_I)_jLocal secure syndication

Step 28: likewise, a number of user sub-model gradients are locally aggregated

To obtain

The calculation formula used is:

step 29: updating the article sub-model of the jth view after the iteration h round to obtain the new article sub-model parameter of the jth view in the pth round

The calculation formula used is:

step 30: similarly, updating the user sub-model of the jth view after the iteration h round to obtain the jth viewNew user sub-model parameters mapped in the p-th round

The calculation formula used is:

step 31: j +1 is executed and step 20 is repeated.

Step 32: updating the sub-model of the article to obtain the new sub-model parameter W of the article after the p-th round of local fine tuning iteration is executed_I ^T+p+1The calculation formula used is:

step 33: updating the user sub-model of the jth view to obtain new user sub-model parameters of the jth view after executing the pth round of local fine tuning iteration

The calculation formula used is:

step 34: execute p +1 and repeat step 19.

Step 35: and (3) system output: n user submodels:

an article sub-model: w_I ^T+P。

[ Central Server of the present invention ]

Step 1: inputting a system: the total number M of user clients participating in the federal recommendation process, the total number T of federal training rounds, and the gradient of model uploaded by any client after the k-th round of training

Step 2: and (4) judging whether the model gradient uploaded by each client after the kth round of training is processed or not (k is initially set to be 1), if k is larger than T, skipping to the step 6, and if not, executing the step 3.

And step 3: locally aggregating item or user sub-model gradients for several clients

To obtain

The formula used is:

and 4, step 4: performing k-k +1, aggregating the safely aggregated global model gradient

And transmitting the data back to each user client.

And 5: and judging whether the model is converged or not, wherein the judgment method is that the model can be considered to be converged if the loss function of the model does not decrease any more and tends to be stable. If the model is not converged, repeating the step 2; and if the model is converged, sending a convergence signal to each user client, and executing the step 6.

Step 6: the maximum number of training rounds has been exceeded or the model has converged, terminating the system program.

Thirdly, the prediction algorithm of the invention

Referring to fig. 5, it is assumed that the training phase of the personalized multi-view federal recommendation is successfully ended, and each user client has successfully acquired the locally deployed recommendation model. The recommendation model is composed of an item sub-model and several user sub-models, which will participate together in the local prediction phase. Firstly, a user client calculates semantic vectors of all E items to be recommended through a forward propagation process of a DSSM model; then, the user client side sequentially calculates the semantic vector of any view and the posterior probability value of interaction between any view and any object to be recommended

Then, these probability values are compared

Carrying out local security aggregation to obtain any item to be recommended in the local security aggregationPosterior probability value of interaction on user client

Finally, for these probability values

And performing descending (from large to small) arrangement, and outputting the item sequences to be recommended corresponding to the first K probability values in the arrangement to obtain an item list to be recommended by the user client.

Specifically, the steps of the prediction algorithm of the present invention are as follows:

[ user client of the present invention ]

Step 1: inputting a system: the total number N of the views in the client, the total number E of the items to be recommended, the number K of the recommended items and the user sub-model set

Article submodel W_I ^T+PUser characteristics of ith view, item characteristics of jth item

Wherein

Is the user sub-model for the ith view.

And 2, step: and (4) judging whether the jth item to be recommended is processed or not (j is initially set to be 1), if j is larger than E, skipping to the step 5, and if not, executing the step 3.

And step 3: calculating semantic vector of jth item to be recommended

Wherein

The forward propagation computation procedure used is: { l₁＝W₁x,l_i＝f(W_il_i-1+b_i)(i∈{2,3,…,N-1}),y＝f(W_Nl_N-1+b_N) The specific explanation is shown in the basic model part.

And 4, step 4: and executing j to j +1, and repeating the step 2.

And 5: and (4) judging whether the jth item to be recommended is processed or not (j is initially set to be 1), if j is larger than E, skipping to the step 13, and if not, executing the step 6.

Step 6: and (4) judging whether the view is in the ith view (i is initially set to be 1), if i is larger than N, jumping to the step 10, otherwise, executing the step 7.

And 7: computing semantic vectors for the ith user view

Wherein

The forward propagation calculation procedure used is as described in step 3.

And 8: computing user characteristics at a given ith view

On the premise of (1), article characteristics of the jth article

A posteriori probability values of interaction therewith

The calculation formula used is:

and step 9: and executing i to i +1, and repeating the step 6.

Step 10: performing local aggregation for a plurality of posterior probability values obtained after the step 6-9 is performed for a plurality of times

To obtain

The calculation formula used is:

step 11: storing posterior probability value of j to-be-recommended item interaction on client U

Step 12: and executing j to j +1, and repeating the step 5.

Step 13: a plurality of posterior probability values obtained by executing the steps 5-12 for a plurality of times

Arranged in descending order (from large to small).

Step 15: and (3) system output: and arranging the article sequences to be recommended corresponding to the first K probability values to obtain an article list to be recommended to the user client.

Examples

An application program A for providing streaming media, an application program B for providing film review book reviews and an application program C for providing interactive social interaction are arranged, are installed on smart phones of N users, and historical data of interaction with the users are generated. The application program A tries to further improve the accuracy and intelligence degree of the existing movie recommendation algorithm, so view cooperation is achieved with the application program B and the application program C; meanwhile, on the basis of the technical scheme provided by the invention, a personalized federal movie recommendation system combining three-party views is built. The three federally trained participants may each provide data as: the application a view may provide a record of the fields of movies to be recommended and movies clicked and viewed by the user, the application B view may provide a user's rating and rating of certain movies, and the application C may provide a series of personal information about the user's age, gender, location, occupation, hobbies, education, etc.

The specific implementation mode of the invention mainly comprises the following three steps: first, application a, application B, and application C each perform pre-processing of training labels and features within their respective views, where the features include user features and item features. For the recommendation service provider in this embodiment, the application a needs to perform preprocessing of the training tag, for example: will any explicit or implicit interaction take place<User, movie>The record pair is marked as 1, and the record pair without interaction is marked as 0; meanwhile, application a needs to perform preprocessing of user features, such as: performing singular value decomposition on a click matrix or a score matrix between a user and a movie to obtain a user characteristic vector; in addition, application a also needs to perform pre-processing of the article characteristics, such as: the title and the belonging type of the movie to be recommended in the database are encoded into a number of bits of item feature vectors using an N-Gram model. For the participants in the federal recommended training in this embodiment, for example, application C, the preprocessing of the user characteristics needs to be performed, such as: standardizing the age characteristics to 0-1 interval, setting the gender characteristics to 0 or 1, and constructing the occupation characteristics into a vector of one-hot coding. After the preprocessing, the tags and features will be stored in the view isolated from each other application. When the subsequent training and prediction stage needs to enter the view and ask for data, the aforementioned trusted execution environment can be constructed in advance and operated in the environment. And secondly, starting a central server (which can be played by a government agency or a reliable third party), downloading movie data sets I to be recommended to smart phones of N users, wherein each user device uses local view data of an application program A, an application program B and an application program C as user data, and starts federal movie recommendation training by using a training algorithm of the personalized multi-view federal recommendation system. Thirdly, completing the training and closing the central serverA film recommendation model I obtained by preorder training is deployed on each user equipment^*And taking the local view data of the application program A, the application program B and the application program C as basic data, outputting the first K movies to be recommended which are most likely to be interested by the user by using the prediction algorithm of the personalized multi-view federal recommendation system, displaying the movies on a page of the application program A software, and finishing the recommendation.

It is further noted that, on a certain user equipment, the movie recommendation model I obtained by the invention^*The model is subjected to T rounds of global federal training and P rounds of local fine tuning iteration. On the premise that user data cannot be local, the method jointly utilizes rich data of 3 views (an application program A, an application program B and an application program C) of N users, so that the method is suitable for any one of three parties, namely the method can be shared and used in the views of the application program A, the application program B and the application program C. In addition, only the example that application A tries to build the federal movie recommendation system is given here, and actually, application B or application C can also build the recommendation system required by itself. For example, the application B provides book item data to be recommended in advance, distributes a book item sub-model to each user device participating in federal training, and obtains the federal book recommendation system by imitating the training and prediction algorithm. This is possible because there is some similarity in the tastes of the users to the movies and books, and this similarity can be measured by similarity calculation of semantic vectors output by the movie features and book features via the DSSM model.

In particular, when the technical scheme of the invention is implemented, attention should be paid to control the quantity and quality of views participating in the federal recommendation training in a single user client, because related researches show that too many views can aggravate communication overhead and do not improve recommendation precision. In addition, in engineering practice, semantic vector calculation of the to-be-recommended articles can be performed offline in advance in a background of a recommendation service provider, and the to-be-recommended articles are directly distributed to each client by the central server after calculation is completed. Therefore, the calculation cost of the user terminal equipment can be greatly reduced, and the requirement of a recommendation service provider on updating the data set of the item to be recommended in real time or intermittently can be better met. In addition, in order to ensure that the technical scheme of the invention can produce favorable effects when being specifically implemented, a plurality of views participating in federal recommendation training should play a positive role in a final training target, and the training and sharing of the global model are completed in a mutual cooperation mode, so that mutual benefits and win-win are realized; in other words, too much redundant information, error information, noise information, and the like should not be included in the view data participating in a certain federal recommended training because related studies indicate that too much invalid data tends to negatively affect the model training. Finally, in order to further prevent some user views from being dishonest or even completely malicious, when the technical scheme of the invention is specifically implemented, view level mutual isolation is considered to be established, namely a feasible execution environment is established before any user client operates the system of the invention, and a safe area is opened in an operating system built by equipment, so that the situation that the dishonest or malicious view maliciously accesses private data of other friendly views is avoided.

Claims

1. A personalized multi-view federal recommendation system is characterized by comprising a central server and a plurality of user clients, wherein the internal structures of any one user client are the same, and a training module and a prediction module are contained in any one user client; data stream transmission is carried out inside the central server, between the central server and any one user client and inside any one user client; meanwhile, the transmission of the data stream adopts a synchronous transmission mode, namely, the data exchange among the modules is non-asynchronous and is allocated by a unified clock signal; the training module and the prediction module respectively comprise a plurality of sub-modules which are respectively used for completing a training task and a prediction task;

the gradient aggregation sub-module aggregates the gradient descent result generated in the gradient calculation sub-module, and performs random sampling, gradient clipping and Gaussian noise on the gradient descent result, so that local safe aggregation of the gradient descent result is realized;

the model updating submodule updates the current model training, namely the data distribution submodule respectively acquires the gradient of the article sub-model and the gradient of the user sub-model after the remote safe aggregation from the central server, and respectively performs gradient descent on the article sub-model and the user sub-model by using the gradient of the article sub-model and the gradient of the user sub-model; once the current training times reach a preset iteration upper limit value or the global model is converged, the model updating sub-module sends the global model to the model fine-tuning sub-module; the global model is a user submodel and an article submodel which are obtained after the model updating submodule performs gradient descent on the user submodel and the article submodel by using the remote aggregation gradient;

the model fine-tuning sub-module calls local user data and article data, and performs local training iteration of limited rounds on the global user sub-model and the global article sub-model respectively, so that the global model is more consistent with the data distribution of local data of any user, and the personalized fine tuning of the global model on any user client is completed;

the probability aggregation submodule carries out aggregation on the posterior probability values output by the interactive computation submodule to obtain the posterior probability value of any item to be recommended, which is interacted on the current user client;

the probability sorting submodule sorts the posterior probability values of interaction of the plurality of items to be recommended on the current user client, which are output by the probability aggregation submodule, according to a descending order or an ascending order;

2. The personalized multi-view federated recommendation system of claim 1, wherein the training algorithm specifically comprises:

data distribution phase

Applying a certain item to be recommended provided by a background system to be recommended to a data set of the item to be recommended

By a central server

Distributing the data to each user client;

gradient calculation phase

At any one of the user client end views

In will be according to

Private user data for individual views and locally shared item data sets

Calculating the gradient of the user sub-model and the article sub-model;

gradient polymerisation stage

The gradients of the user sub-model and the article sub-model are respectively aggregated locally, and the local user sub-model gradient and the local article sub-model gradient after the local aggregation are respectively encrypted and transmitted to the central server

Completing the global aggregation;

model update phase

Central server

Returning the global user sub-model gradient and the global article sub-model gradient after the global aggregation is finished to each user client side for carrying out user sub-model and articleUpdating the model of the product;

stage of model fine tuning

3. The personalized multi-view federal recommendation system of claim 1, wherein the predictive algorithm specifically comprises:

semantic computation phase

The user client pre-calculates all parameters by using the parameters provided by the article sub-model through the forward propagation process of the deep semantic matching model

Semantic vectors of the items to be recommended;

interactive computing phase

；

Probabilistic aggregation stage

A posteriori probability values for a number of potential interactions

；

Probability ordering stage

For a plurality of posterior probability values

Performing descending or ascending arrangement;

recommendation output phase

Before taking out

And the object sequence to be recommended corresponds to the probability value, and the object sequence to be recommended is the object sequence recommended by the user client.