CN114723067A - Federal mixed filtering recommendation method based on user privacy protection - Google Patents

Federal mixed filtering recommendation method based on user privacy protection Download PDF

Info

Publication number
CN114723067A
CN114723067A CN202210379463.8A CN202210379463A CN114723067A CN 114723067 A CN114723067 A CN 114723067A CN 202210379463 A CN202210379463 A CN 202210379463A CN 114723067 A CN114723067 A CN 114723067A
Authority
CN
China
Prior art keywords
user
article
encoder
noise reduction
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210379463.8A
Other languages
Chinese (zh)
Other versions
CN114723067B (en
Inventor
张幸林
卢正东
卢艺灵
卢沁旖
周志炫
谢文灏
林泽蓬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210379463.8A priority Critical patent/CN114723067B/en
Publication of CN114723067A publication Critical patent/CN114723067A/en
Application granted granted Critical
Publication of CN114723067B publication Critical patent/CN114723067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Computational Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioethics (AREA)
  • Medical Informatics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Algebra (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a federal mixed filtering recommendation method based on user privacy protection, which comprises the following steps: 1) collecting the scoring information, user attributes and article attributes of the disclosed user-article to form a scoring information matrix, user edge information and article edge information; 2) two noise reduction self-encoders are set up, and self-supervision pre-training is carried out on the two noise reduction self-encoders through a random gradient descent method; 3) constructing a total model, and loading encoder parameters in two noise reduction self-encoders obtained by pre-training into two encoders of the total model; 4) and iteratively updating the total model parameters by using a FedAvg algorithm of federal learning, and generating probability values of implicit feedback of different users to the articles by using the updated total model so as to generate a recommended article list of the corresponding user. The method and the device can reasonably extract and utilize the public characteristic information of the user and the article, and predict the probability of the possible interaction behavior of the user while protecting the personal privacy information of the user to generate a high-hit recommendation list.

Description

Federal mixed filtering recommendation method based on user privacy protection
Technical Field
The invention relates to the technical field of federal learning and mixed collaborative filtering, in particular to a federal mixed filtering recommendation method based on user privacy protection.
Background
Federal learning is a distributed machine learning framework proposed based on a deep learning idea, and aims to perform data use and model training on the basis of ensuring data privacy safety and legal compliance. Federal learning is used as a distributed machine learning paradigm, the problem of data islanding can be effectively solved, participants jointly model on the basis of not sharing data, the data islanding can be broken while the data privacy of the participants is protected technically, and AI cooperation is achieved.
The characteristics of federal learning are characterized by the following: 1. the data of each party is guaranteed to be kept locally, and privacy and legal compliance are not disclosed; 2. establishing a virtual common model by combining data of a plurality of participants; 3. under the condition of user alignment or feature alignment of data of each party, the modeling effect of federal learning is not weaker than that of the traditional modeling mode based on large data volume; 4. under the condition that users or features are not aligned, the effect of knowledge migration can be achieved by exchanging encryption parameters among data.
Among the common recommendation algorithms, two types of methods are widely used, which are: 1. based on content recommendation, the method recommends favorite products for users with similar attribute values by respectively calculating the similarity of the inherent user attribute and the product attribute in respective dimensions, and the extensibility is weak in practical application; 2. the method carries out user preference prediction by analyzing user behavior information including explicit feedback such as scoring and commenting and implicit feedback such as browsing records, is influenced by data sparsity, and is poor in performance under the condition of cold start.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides a federal mixed filtering recommendation method based on user privacy protection, which can reasonably extract and utilize public characteristic information of users and articles, utilizes a neural network structure to perform modeling, predicts the probability of possible interaction behaviors of the users while protecting personal privacy information of the users, and further generates a high-hit recommendation list.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: a federal mixed filtering recommendation method based on user privacy protection comprises the following steps:
1) collecting public user-article scoring information to form a scoring information matrix as explicit feedback, obtaining implicit feedback through a binary scoring information matrix, and collecting public user attributes as user edge information and public article attributes as article edge information; the explicit feedback is an index which clearly shows the preference degree of the user to the articles, namely the user scores the articles, and the implicit feedback is an index which cannot clearly reflect the preference degree of the user, namely whether the user scores the articles or not;
2) respectively building noise reduction self-encoders aiming at user edge information and article edge information, wherein the built two noise reduction self-encoders have the same structure and respectively comprise three parts of a noise adding part, an encoder and a decoder, and performing self-supervised pre-training on the two noise reduction self-encoders respectively through a random gradient descent method to obtain two pre-trained noise reduction self-encoder parameters;
3) constructing a total model consisting of two encoders, a full connection layer and an activation function layer, wherein the structures of the two encoders are respectively equal to the respective encoder parts of the two noise reduction self-encoders and are defined as a user encoder and an article encoder, and after the construction of the total model is finished, loading respective encoder parameters of the two noise reduction self-encoders obtained by pre-training into the user encoder and the article encoder of the total model respectively for initialization;
4) and iteratively updating the total model parameters by using a FedAvg algorithm of federal learning, and generating probability values of implicit feedback of different users to the articles by using the updated total model so as to generate a recommended article list of the corresponding user.
Further, the step 1) comprises the following steps:
1.1) collecting the public user-article grading information to form a grading information matrix R, wherein the grading information matrix is a user-article two-dimensional matrix and is expressed as follows:
Figure BDA0003592119790000031
in the formula, rijThe scoring information of the ith user for the jth article is shared by m users and n articles;
1.2) use the self-defined binarization function Bin (r)ij) Acting on the scoring information matrix R to obtain an implicit feedback matrix defined as a user-article interaction matrix
Figure BDA0003592119790000032
That is, for each user-item pair, there is an implicit feedback tag with a value of 0 or 1, where a value of 0 indicates that scoring has not occurred and a value of 1 indicates that scoring has occurred; binarization function Bin (r)ij) Is defined as follows:
Figure BDA0003592119790000033
1.3) collecting the disclosed user attribute and article attribute, and respectively generating a vector composed of user binary characteristic values and a vector composed of article binary characteristic values for representing the attribute by using one-hot coding, namely the edge information of the userInformation vector and edge information vector of article, and further form user edge information matrix SuserAnd an article edge information matrix SitemExpressed as:
Figure BDA0003592119790000034
Figure BDA0003592119790000035
Figure BDA0003592119790000036
Figure BDA0003592119790000037
in the formula (I), the compound is shown in the specification,
Figure BDA0003592119790000038
an edge information vector for the ith user, comprising duserBinary characteristic value of individual user, buser,xThe xth user binary feature value representing the user,
Figure BDA0003592119790000039
denotes duserA vector space of dimensions;
Figure BDA00035921197900000310
an edge information vector for the j-th article, comprising ditemBinary characteristic value of individual article, bitem,yA y-th item binary characteristic value representing the item,
Figure BDA0003592119790000041
denotes ditemVector space of dimensions.
Further, the step 2) comprises the following steps:
2.1) respectively constructing a noise reduction self-encoder aiming at the user edge information and the article edge information, defining the noise reduction self-encoder for the user and the article, wherein the constructed noise reduction self-encoder structure is expressed by a formula as follows:
Figure BDA0003592119790000042
in the formula, s is an input vector of the noise reduction self-encoder, namely an edge information vector;
Figure BDA0003592119790000043
is a reconstructed vector of a noise reduction self-encoder; function dnoiAdding for noise; function fencIs an encoder; function gdecIs a decoder;
2.2) performing self-supervision pre-training on the two noise reduction self-encoders respectively by a random gradient descent method to obtain two pre-trained noise reduction self-encoder parameters, wherein the two noise reduction self-encoder parameters respectively comprise an encoder parameter and a decoder parameter; to measure the reconstructed vector
Figure BDA0003592119790000044
Introducing cross entropy as a loss function of two noise reduction self-encoders according to the difference with the input vector s to obtain a loss function LDAE
Figure BDA0003592119790000045
In the formula, N is the total number of input vectors in a batch, p refers to the p-th input vector, d represents the total dimensionality of the input vectors, and k represents the k-th dimensionality of the input vectors;
defining the user edge information vector and the article edge information vector as follows:
Figure BDA0003592119790000046
Figure BDA0003592119790000047
in the formula (I), the compound is shown in the specification,
Figure BDA0003592119790000048
the edge information vector of the ith user has m users,
Figure BDA0003592119790000049
including duserBinary characteristic value of individual user, with buser,xAn xth user binary feature value representing a user;
Figure BDA00035921197900000410
n articles are total for the edge information vector of the jth article,
Figure BDA0003592119790000051
including ditemBinary characteristic value of individual article, bitem,yA y item binary characteristic value representing the item;
will be provided with
Figure BDA0003592119790000052
And
Figure BDA0003592119790000053
respectively as input vectors of a user noise reduction self-encoder and an article noise reduction self-encoder, and obtaining a reconstructed vector by forward propagation
Figure BDA0003592119790000054
And
Figure BDA0003592119790000055
calculating loss values respectively by loss functions
Figure BDA0003592119790000056
And
Figure BDA0003592119790000057
after counter-propagation to obtainGradient of user noise reduction auto-encoder parameters
Figure BDA0003592119790000058
And gradient of object noise reduction auto-encoder parameters
Figure BDA0003592119790000059
Wherein theta iscRepresenting respective parameters in two noise-reducing autocoders;
for the method for updating the parameters of the two noise reduction self-coders, a random gradient descent method is adopted: and randomly selecting N different user edge information vectors or article edge information vectors as unit batches, and totally performing Q-turn batch input selection and updating of parameters of the user noise reduction self-encoders or parameters of the article noise reduction self-encoders to obtain parameters of two noise reduction self-encoders used for the next stage.
Further, the step 3) comprises the following steps:
3.1) constructing a total model consisting of two encoders, a full connection layer and an activation function layer, wherein the two encoder structures are respectively equal to the respective encoder parts of the two noise reduction self-encoders and are defined as a user encoder and an article encoder, after output results of the two encoders are fused by Hadamard products, the output results are sequentially transmitted into the full connection layer with the output dimensionality of 1 and the Sigmoid activation function layer to obtain the probability v of implicit feedback with the value in the (0,1) interval, and the total model uses a binary cross entropy as a loss function Ltotal
And 3.2) respectively loading the parameters of the respective encoders in the two noise reduction self-encoders obtained by pre-training into a user encoder and an article encoder in the total model for initialization, wherein the initialization mode of the parameters of the full connection layer of the total model is set to be zero, and then the initialized total model parameters are obtained.
Further, the step 4) comprises the following steps:
4.1) taking the user edge information vector, the article edge information vector and the implicit feedback label as training data of the total model, and performing iterative updating on total model parameters by using FedAvg algorithm of Federal learning to obtain a trained total model;
4.2) for each user, randomly extracting M unscored articles respectively for prediction, wherein the prediction mode is as follows: respectively taking a user edge information vector corresponding to each user and an article edge information vector of any one of the extracted unscored articles as input vectors of a user encoder and an article encoder of a total model, and obtaining a probability value v of implicit feedback of the user and the article after forward propagation, thereby obtaining a probability value of the implicit feedback of each user to M unscored articles;
4.3) for each user, the probability values fed back implicitly are sorted in descending order according to the value size, and a recommended item list which is sorted from high to low in recommendation degree is generated.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. compared with other centralized recommendation systems, the method can centralize the training process by using a federal learning method, protect the data privacy locally, and has no obvious loss of recommendation precision compared with a non-federal mode.
2. Compared with other collaborative filtering methods based on matrix decomposition, the method can additionally utilize the edge information of the user and the article as a part of the characteristics, better mine the user preference and the article characteristics, and simultaneously relieve the cold start problem of the user.
3. Compared with other collaborative filtering methods based on the neural network, the method takes the noise reduction self-encoder as the model encoding structure, so that the characteristics of the user and the article are better extracted, and the recommendation precision is improved.
4. Compared with other conventional recommendation methods, the method supports pre-training through the user and the article noise reduction self-encoder, so that the time of model training is reduced.
Drawings
FIG. 1 is a schematic logic flow diagram of the method of the present invention.
Fig. 2 is a diagram of a general model structure.
FIG. 3 is a table of experimental results on the MovieLen-100K data set.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
As shown in fig. 1, the embodiment provides a federal hybrid filtering recommendation method based on user privacy protection, which collects public user attributes and article attributes as edge information vectors, builds a total model, and breaks a data island by using a federal learning framework to realize collaborative recommendation, and includes the following steps:
1) and collecting the grading information of the disclosed user-article to form a grading information matrix as explicit feedback, obtaining implicit feedback through a binary grading information matrix, and collecting the disclosed user attribute as user edge information and the disclosed article attribute as article edge information. The explicit feedback is a behavior that the user explicitly expresses the preference of the user to the article, namely the user scores the article, and the implicit feedback is a behavior that the preference of the user cannot be explicitly reflected, namely whether the behavior that the user scores the article occurs or not, and the method comprises the following steps:
1.1) collecting the scoring information of the public user-goods to form a scoring information matrix R. The scoring information matrix R is a two-dimensional user-item matrix, represented as:
Figure BDA0003592119790000071
in the formula, rijThe scoring information of the ith user for the jth article is shared by m users and n articles;
1.2) use the self-defined binarization function Bin (r)ij) Acting on the scoring information matrix R to obtain an implicit feedback matrix defined as a user-article interaction matrix
Figure BDA0003592119790000072
That is, for each user-item pair, there is an implicit feedback tag with a value of 0 or 1, where a value of 0 indicates that scoring has not occurred and a value of 1 indicates that scoring has occurred. The binarization function Bin (r) hereij) Is defined as follows:
Figure BDA0003592119790000081
1.3) collecting the disclosed user attribute and article attribute, respectively generating a vector composed of user binary characteristic values and a vector composed of article binary characteristic values for representing the attribute by using one-hot coding, and further forming a user edge information matrix SuserAnd an article edge information matrix SitemExpressed as:
Figure BDA0003592119790000082
Figure BDA0003592119790000083
Figure BDA0003592119790000084
Figure BDA0003592119790000085
in the formula (I), the compound is shown in the specification,
Figure BDA0003592119790000086
an edge information vector for the ith user, comprising duserBinary characteristic value of individual user, buser,xThe xth user binary feature value representing the user,
Figure BDA0003592119790000087
denotes duserA vector space of dimensions;
Figure BDA0003592119790000088
an edge information vector for the j-th article, comprising ditemBinary characteristic value of individual article, bitem,yA y-th item binary characteristic value representing the item,
Figure BDA0003592119790000089
denotes ditemVector space of dimensions.
2) Build respectively to user edge information and article edge information and fall the autoencoder that makes an uproar, two of building fall the autoencoder structure of making an uproar the same, all include noise addition, encoder and decoder three, fall the autoencoder through the random gradient descent method and carry out the pre-training of self-supervision respectively to two, two after obtaining the pre-training fall make an uproar from the encoder parameter, including following step:
2.1) respectively constructing a noise reduction self-encoder aiming at the user edge information and the article edge information, defining the noise reduction self-encoder for the user and the article, wherein the constructed noise reduction self-encoder structure is expressed by a formula as follows:
Figure BDA00035921197900000810
in the structural formula of the noise reduction self-encoder, s is an input vector of the noise reduction self-encoder and corresponds to an edge information vector of an ith user
Figure BDA0003592119790000091
Or edge information vector of the jth article
Figure BDA0003592119790000092
Wherein
Figure BDA0003592119790000093
And
Figure BDA0003592119790000094
the definition is as follows:
Figure BDA0003592119790000095
Figure BDA0003592119790000096
in the formula (I), the compound is shown in the specification,
Figure BDA0003592119790000097
the edge information vector of the ith user has m users,
Figure BDA0003592119790000098
including duserBinary characteristic value of individual user, with buser,xAn xth user binary feature value representing a user;
Figure BDA0003592119790000099
n articles are total for the edge information vector of the jth article,
Figure BDA00035921197900000910
including ditemBinary characteristic value of individual article, bitem,yRepresenting the y-th item binary characteristic value of the item. The user binary characteristic value and the article binary characteristic value are respectively made of the collected user attributes and article attributes by utilizing unique hot codes.
In the formula of the noise reduction self-encoder structure,
Figure BDA00035921197900000911
for the reconstructed vector of the noise-reduced self-encoder, function dnoiFor noise addition, additive noise is added to the input vector s, and range pruning is performed to obtain a vector
Figure BDA00035921197900000912
The way of incorporating additive noise is: for each element in the vector, a product of the noise factor t and a random number q following a gaussian distribution is added to obtain a vector z:
z(p,k)=s(p,k)+t·q,q~N(0,1),t∈[0,1]
where p denotes different input vectors, s(p,k)Is the element of dimension k in some input vector.
The range pruning is formulated as:
Figure BDA00035921197900000913
where z refers to a vector that incorporates additive noise.
Finally, obtaining the vector which is blended with the additive noise and subjected to range pruning
Figure BDA0003592119790000101
In the formula of the noise reduction self-encoder structure, the function fencFor the encoder, after noise addition
Figure BDA0003592119790000102
Encoding as hidden vectors
Figure BDA0003592119790000103
Here, the first and second liquid crystal display panels are,
Figure BDA0003592119790000104
denotes dhiddenVector space of dimensions. Encoder fencIs formulated as:
Figure BDA0003592119790000105
Figure BDA0003592119790000106
in the above formula
Figure BDA0003592119790000107
Denotes d2×d1The vector space of the dimensions is such that,
Figure BDA0003592119790000108
to represent
Figure BDA0003592119790000109
Vector space of dimensions. Here, the encoder portions of the two noise-reduced autoencoders each have a set of learnable parameters, which include a matrix W1The matrix W2The matrix W3Vector b1Vector b2Sum vector b3. In particular, for an encoder in a user denoised auto-encoder, the learnable parameters include a matrix Wuser,1The matrix Wuser,2The matrix Wuser,3Vector buser,1Vector buser,2And vector buser,3,Wuser,1Has a dimension of d1×duser(ii) a For an encoder in an article denoising autoencoder, the learnable parameters include a matrix Witem,1The matrix Witem,2The matrix Witem,3Vector bitem,1Vector bitem,2Sum vector bitem,3,Witem,1Has a dimension of d1×ditem. Tanh and ReLU are two activation functions, and Dropout is a random inactivation function;
function g in the structural formula of the noise reduction self-encoderdecFor the decoder, the implicit vector h is decoded into a reconstructed vector that approximates the input vector s
Figure BDA00035921197900001010
Decoder gdecIs formulated as:
Figure BDA00035921197900001011
Figure BDA00035921197900001012
here, the decoder portions of two noise-reduced autoencoders each have a set of learnable parameters, which include a matrix W4The matrix W5The matrix W6Vector b4Vector b5And vector b6. Specifically, for a decoder in a user denoised self-encoder,the learnable parameters include a matrix Wuser,4The matrix Wuser,5The matrix Wuser,6Vector buser,4Vector buser,5And vector buser,6,Wuser,6Has a dimension of duser×d5,buser,6Has a dimension of duser(ii) a For a decoder in an article denoising autoencoder, the learnable parameters include a matrix Witem,4The matrix Witem,5The matrix Witem,6Vector bitem,4Vector bitem,5And vector bitem,6,Witem,6Has a dimension of ditem×d5,bitem,6Has a dimension of ditemSigmoid is an activation function;
2.2) respectively carrying out self-supervision pre-training on the two noise reduction self-encoders by a random gradient descent method to obtain parameters of the two noise reduction self-encoders after pre-training, wherein the detailed steps are as follows:
to measure the reconstructed vector
Figure BDA0003592119790000111
Introducing cross entropy as a loss function of the noise reduction self-encoder according to the difference with the input vector s to obtain a loss function LDAE
Figure BDA0003592119790000112
In the formula, N is the total number of input vectors in a batch, p refers to the p-th input vector, d represents the total dimensionality of the input vectors, and k represents the k-th dimensionality of the input vectors;
will be provided with
Figure BDA0003592119790000113
And
Figure BDA0003592119790000114
respectively as input vectors of a user noise reduction self-encoder and an article noise reduction self-encoder, and obtaining a reconstructed vector by forward propagation
Figure BDA0003592119790000115
And
Figure BDA0003592119790000116
calculating loss values respectively by loss functions
Figure BDA0003592119790000117
And
Figure BDA0003592119790000118
obtaining the gradient of the user noise reduction self-encoder parameter after back propagation
Figure BDA0003592119790000119
And gradient of object noise reduction auto-encoder parameters
Figure BDA00035921197900001110
Wherein theta iscRepresenting respective parameters in two noise-reduced self-encoders;
for the method for updating the parameters of the two noise reduction self-coders, a random gradient descent method is adopted: randomly selecting N different user edge information vectors or different article edge information vectors as a unit batch input total model, and calculating the average value of the parameter gradients of the respective noise reduction self-encoders in the batch according to the input
Figure BDA00035921197900001111
The model parameters are updated using the learning rate η as follows:
Figure BDA00035921197900001112
and selecting and updating model parameters in Q-turn batch input to obtain parameters of two noise reduction self-encoders used in the next stage.
3) The method comprises the following steps of constructing a total model consisting of two encoders, a full connection layer and an activation function layer, wherein the two encoder structures are respectively equal to respective encoder parts of two noise reduction self-encoders and are defined as a user encoder and an article encoder, and after the total model is constructed, respective encoder parameters in the two noise reduction self-encoders obtained by pre-training are respectively loaded into the user encoder and the article encoder of the total model for initialization, and the method comprises the following steps:
3.1) construct the total model composed of two coders, full-link layer and activation function layer, the two coder structures here are respectively equal to the respective coder parts of two noise-reduction self-coders, defined as user coder and article coder, the output results of the two coders are fused by Hadamard product, and then are sequentially transmitted into the full-link layer with output dimension 1 and the Sigmoid activation function layer, and the probability v of implicit feedback with value in (0,1) interval is obtained, and is expressed by formula:
Figure BDA0003592119790000121
in the formula (I), the compound is shown in the specification,
Figure BDA0003592119790000122
representing the ith user hidden vector, wherein m users are provided, each user corresponds to one user hidden vector,
Figure BDA0003592119790000123
and (4) representing the jth article hidden vector, wherein n articles are provided, and each article corresponds to one article hidden vector, wherein the user hidden vector is the output result of a user encoder, and the article hidden vector is the output result of an article encoder. W is a group of7Is 1 × hhidden,b7Is 1;
to estimate the prediction error of the overall model, the error magnitude between the predicted probability value and the implicit feedback label is measured using binary cross entropy as a loss function.
And 3.2) respectively loading the parameters of the respective encoders in the two noise reduction self-encoders obtained by pre-training into a user encoder and an article encoder in the total model for initialization. And the initialization mode of the parameters of the full connection layer of the total model is set to be zero, so that the initialized parameters of the total model are obtained.
4) Iteratively updating the total model parameters by using a FedAvg algorithm of federal learning, generating probability values of implicit feedback of different users to the objects by using the updated total model, and further generating a recommended object list of the corresponding user, wherein the method comprises the following steps of:
4.1) taking the user edge information vector, the article edge information vector and the implicit feedback label as training data of the total model, and carrying out iterative updating on parameters of the total model by using FedAvg algorithm of Federal learning to obtain the total model after training.
The FedAvg algorithm for federal learning refers to a model training algorithm for updating parameters by using protected user data when data relates to user privacy, and the algorithm comprises the following steps:
a. the central server issues the initialized or updated cloud total model parameters and all article data sets to each client side which is randomly selected and has a ratio of C, each selected client side divides an internal user data set into pieces according to a batch size B, and E rounds of training are respectively carried out on a local total model of which the initialized parameter is the cloud total model parameter to obtain different updated local total model parameters, wherein the total model parameters positioned on the central server are called cloud total model parameters, the total model parameters respectively positioned on each client side are called local total model parameters, and each client side has one local total model parameter;
b. taking the ratio of the internal user data volume of the client to the total user data volume of all selected clients as weight, uploading the updated local total model parameters of each selected client to a server, using weighting and aggregation to obtain new cloud total model parameters, and recording the new cloud total model parameters as a big turn;
c. and (c) terminating if the algorithm performs T rounds, otherwise, turning to the step a for execution.
Here, considering the communication overhead, the training convergence speed, and the total model accuracy in combination, the round T is set to 5, the client occupation ratio C selected per round is set to 0.4, the batch size B is set to 512, and the intra-client round E is set to 4.
In order to evaluate the accuracy of the total model, the MovieLen-100K data set is used, the Hit ratio @ K index and the NDCG @ K index are used for testing and comparing the total model (marked as hybrid CF in the table) with the existing MLP model, GMF model and NeuCF model, and the experimental results are shown in FIG. 3.
4.2) randomly drawing M pieces of unscored goods for each user respectively to predict. The prediction method is as follows: and respectively taking the user edge information vector corresponding to each user and the extracted article edge information vector of any article in the unscored articles as input vectors of a total model user encoder and an article encoder, and obtaining the probability value v of implicit feedback of the user and the article after forward propagation. Therefore, each user respectively predicts the M randomly extracted articles to obtain M implicit feedback probability values;
4.3) for each user, the probability values of the implicit feedback are sorted in a descending order according to the value size, the input article of each probability value of the implicit feedback in the sorted sequence is a recommended article, and therefore the recommended article sequence with the recommendation degree ranging from high to low is generated.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (5)

1. A federal mixed filtering recommendation method based on user privacy protection is characterized by comprising the following steps:
1) collecting the scoring information of the public user-article to form a scoring information matrix as explicit feedback, obtaining implicit feedback through a binary scoring information matrix, and collecting the public user attribute as user edge information and the public article attribute as article edge information; the explicit feedback is an index which clearly represents the preference degree of the user on the article, namely the user scores the article, and the implicit feedback is an index which cannot clearly reflect the preference degree of the user, namely whether the user scores the article or not;
2) respectively building noise reduction self-encoders aiming at user edge information and article edge information, wherein the built two noise reduction self-encoders have the same structure and respectively comprise three parts of a noise adding part, an encoder and a decoder, and performing self-supervised pre-training on the two noise reduction self-encoders respectively through a random gradient descent method to obtain two pre-trained noise reduction self-encoder parameters;
3) constructing a total model consisting of two encoders, a full connection layer and an activation function layer, wherein the structures of the two encoders are respectively equal to the respective encoder parts of the two noise reduction self-encoders and are defined as a user encoder and an article encoder, and after the construction of the total model is finished, loading respective encoder parameters of the two noise reduction self-encoders obtained by pre-training into the user encoder and the article encoder of the total model respectively for initialization;
4) and iteratively updating the total model parameters by using a FedAvg algorithm of federal learning, and generating probability values of implicit feedback of different users to the articles by using the updated total model so as to generate a recommended article list of the corresponding user.
2. The federal mixed filtering recommendation method based on user privacy protection as claimed in claim 1, wherein said step 1) comprises the following steps:
1.1) collecting the public user-article grading information to form a grading information matrix R, wherein the grading information matrix is a user-article two-dimensional matrix and is expressed as follows:
Figure FDA0003592119780000021
in the formula, rijThe scoring information of the ith user for the jth article is shared by m users and n articles;
1.2) use the self-defined binarization function Bin (r)ij) Acting on the scoring information matrix R to obtain an implicit feedback matrix defined as a user-article interaction matrix
Figure FDA0003592119780000022
That is, for each user-item pair, there is an implicit feedback tag with a value of 0 or 1, where a value of 0 indicates that scoring has not occurred and a value of 1 indicates that scoring has occurred; binarization function Bin (r)ij) Is defined as follows:
Figure FDA0003592119780000023
1.3) collecting the disclosed user attribute and article attribute, respectively generating a vector which represents the attribute and consists of user binary characteristic values and a vector which represents the attribute and consists of article binary characteristic values by using one-hot coding, namely an edge information vector of the user and an edge information vector of the article, and further forming a user edge information matrix SuserAnd an article edge information matrix SitemExpressed as:
Figure FDA0003592119780000024
Figure FDA0003592119780000025
Figure FDA0003592119780000026
Figure FDA0003592119780000027
in the formula (I), the compound is shown in the specification,
Figure FDA0003592119780000028
an edge information vector for the ith user, comprising duserIndividual user binary eigenvalue,buser,xThe xth user binary feature value representing the user,
Figure FDA0003592119780000029
denotes duserA vector space of dimensions;
Figure FDA00035921197800000210
an edge information vector for the j-th article, comprising ditemBinary characteristic value of individual article, bitem,yA y-th item binary characteristic value representing the item,
Figure FDA0003592119780000031
denotes ditemVector space of dimensions.
3. The federal mixed filtering recommendation method based on user privacy protection as claimed in claim 1, wherein said step 2) comprises the following steps:
2.1) respectively constructing a noise reduction self-encoder aiming at the user edge information and the article edge information, defining the noise reduction self-encoder for the user and the article, wherein the constructed noise reduction self-encoder structure is expressed by a formula as follows:
Figure FDA0003592119780000032
in the formula, s is an input vector of the noise reduction self-encoder, namely an edge information vector;
Figure FDA0003592119780000033
is a reconstructed vector of a noise reduction self-encoder; function dnoiAdding for noise; function fencIs an encoder; function gdecIs a decoder;
2.2) respectively carrying out self-supervision pre-training on the two noise reduction self-encoders by a random gradient descent method to obtain two noise reduction self-encoder parameters after pre-training, wherein the two noise reduction self-encoder parameters comprise encodingTwo parts, decoder parameters and decoder parameters; to measure the reconstructed vector
Figure FDA0003592119780000034
Introducing cross entropy as a loss function of two noise reduction self-encoders according to the difference with the input vector s to obtain a loss function LDAE
Figure FDA0003592119780000035
In the formula, N is the total number of input vectors in a batch, p refers to the p-th input vector, d represents the total dimensionality of the input vectors, and k represents the k-th dimensionality of the input vectors;
defining the user edge information vector and the article edge information vector as follows:
Figure FDA0003592119780000036
Figure FDA0003592119780000037
in the formula (I), the compound is shown in the specification,
Figure FDA0003592119780000041
the edge information vector of the ith user has m users,
Figure FDA0003592119780000042
including duserBinary characteristic value of individual user, with buser,xAn xth user binary feature value representing a user;
Figure FDA0003592119780000043
n articles are total for the edge information vector of the jth article,
Figure FDA0003592119780000044
including ditemBinary characteristic value of individual article, bitem,yA y item binary characteristic value representing the item;
will be provided with
Figure FDA0003592119780000045
And
Figure FDA0003592119780000046
respectively as input vectors of a user noise reduction self-encoder and an article noise reduction self-encoder, and obtaining a reconstructed vector by forward propagation
Figure FDA0003592119780000047
And
Figure FDA0003592119780000048
calculating loss values respectively by loss functions
Figure FDA0003592119780000049
And
Figure FDA00035921197800000410
obtaining the gradient of the user noise reduction self-encoder parameter after back propagation
Figure FDA00035921197800000411
And gradient of object noise reduction auto-encoder parameters
Figure FDA00035921197800000412
Wherein theta iscRepresenting respective parameters in two noise-reducing autocoders;
for the method for updating the parameters of the two noise reduction self-coders, a random gradient descent method is adopted: and randomly selecting N different user edge information vectors or article edge information vectors as unit batches, and totally performing Q-turn batch input selection and updating of parameters of the user noise reduction self-encoder or the article noise reduction self-encoder to obtain parameters of two noise reduction self-encoders for the next stage.
4. The federal mixed filtering recommendation method based on user privacy protection as claimed in claim 1, wherein said step 3) comprises the following steps:
3.1) constructing a total model consisting of two encoders, a full connection layer and an activation function layer, wherein the two encoder structures are respectively equal to the respective encoder parts of the two noise reduction self-encoders and are defined as a user encoder and an article encoder, after output results of the two encoders are fused by Hadamard products, the output results are sequentially transmitted into the full connection layer with the output dimensionality of 1 and the Sigmoid activation function layer to obtain the probability v of implicit feedback with the value in the (0,1) interval, and the total model uses a binary cross entropy as a loss function Ltotal
And 3.2) respectively loading the parameters of the respective encoders in the two noise reduction self-encoders obtained by pre-training into a user encoder and an article encoder in the total model for initialization, wherein the initialization mode of the parameters of the full connection layer of the total model is set to be zero, and then the initialized total model parameters are obtained.
5. The federal mixed filtering recommendation method based on user privacy protection as claimed in claim 1, wherein said step 4) comprises the following steps:
4.1) taking the user edge information vector, the article edge information vector and the implicit feedback label as training data of the total model, and performing iterative updating on total model parameters by using FedAvg algorithm of Federal learning to obtain a trained total model;
4.2) for each user, randomly extracting M unscored articles respectively for prediction, wherein the prediction mode is as follows: respectively taking a user edge information vector corresponding to each user and an article edge information vector of any article in the extracted unscored articles as input vectors of a user encoder and an article encoder of a total model, and obtaining a probability value v of implicit feedback of the user and the article after forward propagation, thereby obtaining the probability value of the implicit feedback of each user to M unscored articles;
4.3) for each user, the probability values fed back implicitly are sorted in descending order according to the value size, and a recommended item list which is sorted from high to low in recommendation degree is generated.
CN202210379463.8A 2022-04-12 2022-04-12 Federal mixed filtering recommendation method based on user privacy protection Active CN114723067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210379463.8A CN114723067B (en) 2022-04-12 2022-04-12 Federal mixed filtering recommendation method based on user privacy protection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210379463.8A CN114723067B (en) 2022-04-12 2022-04-12 Federal mixed filtering recommendation method based on user privacy protection

Publications (2)

Publication Number Publication Date
CN114723067A true CN114723067A (en) 2022-07-08
CN114723067B CN114723067B (en) 2023-05-23

Family

ID=82243698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210379463.8A Active CN114723067B (en) 2022-04-12 2022-04-12 Federal mixed filtering recommendation method based on user privacy protection

Country Status (1)

Country Link
CN (1) CN114723067B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454185A (en) * 2023-12-22 2024-01-26 深圳市移卡科技有限公司 Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109783739A (en) * 2019-01-23 2019-05-21 北京工业大学 A kind of collaborative filtering recommending method based on the sparse noise reduction self-encoding encoder enhancing of stacking
CN110297848A (en) * 2019-07-09 2019-10-01 深圳前海微众银行股份有限公司 Recommended models training method, terminal and storage medium based on federation's study
US20190303838A1 (en) * 2018-03-30 2019-10-03 Atlassian Pty Ltd Using a productivity index and collaboration index for validation of recommendation models in federated collaboration systems
CN111553744A (en) * 2020-05-08 2020-08-18 深圳前海微众银行股份有限公司 Federal product recommendation method, device, equipment and computer storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190303838A1 (en) * 2018-03-30 2019-10-03 Atlassian Pty Ltd Using a productivity index and collaboration index for validation of recommendation models in federated collaboration systems
CN109783739A (en) * 2019-01-23 2019-05-21 北京工业大学 A kind of collaborative filtering recommending method based on the sparse noise reduction self-encoding encoder enhancing of stacking
CN110297848A (en) * 2019-07-09 2019-10-01 深圳前海微众银行股份有限公司 Recommended models training method, terminal and storage medium based on federation's study
CN111553744A (en) * 2020-05-08 2020-08-18 深圳前海微众银行股份有限公司 Federal product recommendation method, device, equipment and computer storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
MUHAMMAD AMMAD-UD-DIN等: "Federated collaborative filtering for privacy-preserving personalized recommendation system" *
PASCAL VINCENT等: "Extracting and composing robust features with denoising autoencoders" *
李康康等: "联邦个性化学习推荐系统研究" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117454185A (en) * 2023-12-22 2024-01-26 深圳市移卡科技有限公司 Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium
CN117454185B (en) * 2023-12-22 2024-03-12 深圳市移卡科技有限公司 Federal model training method, federal model training device, federal model training computer device, and federal model training storage medium

Also Published As

Publication number Publication date
CN114723067B (en) 2023-05-23

Similar Documents

Publication Publication Date Title
CN111199343B (en) Multi-model fusion tobacco market supervision abnormal data mining method
Stella et al. Influence of augmented humans in online interactions during voting events
CN107563841B (en) Recommendation system based on user score decomposition
CN104598611B (en) The method and system being ranked up to search entry
KR20210040248A (en) Generative structure-property inverse computational co-design of materials
CN112215604B (en) Method and device for identifying transaction mutual-party relationship information
CN107895038A (en) A kind of link prediction relation recommends method and device
CN112115377A (en) Graph neural network link prediction recommendation method based on social relationship
CN111127146A (en) Information recommendation method and system based on convolutional neural network and noise reduction self-encoder
CN110837603B (en) Integrated recommendation method based on differential privacy protection
CN114548373B (en) Differential privacy deep learning method based on feature region segmentation
CN113762468A (en) Classification model generation method based on missing data
CN114723067A (en) Federal mixed filtering recommendation method based on user privacy protection
Song et al. Multiobjective optimization-based hyperspectral band selection for target detection
CN112085158A (en) Book recommendation method based on stack noise reduction self-encoder
CN111949892A (en) Multi-relation perception temporal interaction network prediction method
Gong Deep belief network-based multifeature fusion music classification algorithm and simulation
CN114970816A (en) Method and device for training neural network of graph
CN113850616A (en) Customer life cycle value prediction method based on depth map neural network
Yang et al. Multi-view federated learning with data collaboration
Kuo et al. Integration of particle swarm optimization and immune genetic algorithm-based dynamic clustering for customer clustering
CN113034231B (en) Multi-supply chain commodity intelligent recommendation system and method based on SaaS cloud service
CN115168595A (en) Knowledge graph recommendation method combining multi-order collaborative information
CN111652695B (en) Collaborative filtering recommendation method based on parallel self-encoding machine
CN114358813A (en) Improved advertisement putting method and system based on field matrix factorization machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant