CN117592042A

CN117592042A - Privacy disclosure detection method and device for federal recommendation system

Info

Publication number: CN117592042A
Application number: CN202410071311.0A
Authority: CN
Inventors: 王滨; 王伟; 管晓宏; 王星; 许向蕊; 谢瀛辉
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2024-01-17
Filing date: 2024-01-17
Publication date: 2024-02-23
Anticipated expiration: 2044-01-17
Also published as: CN117592042B

Abstract

The embodiment of the application provides a privacy disclosure detection method and device for a federal recommendation system, which relate to the technical field of data processing, wherein the privacy disclosure detection method for the federal recommendation system comprises the following steps: obtaining embedded gradients corresponding to each preset item, estimating two types of rating distribution aiming at each preset item based on the similarity relation between the obtained embedded gradients corresponding to each preset item, respectively taking item ratings in the two types of rating distribution as candidate rating true values, constructing two types of shadow data, training a local recommendation model of a server based on the two types of shadow data to obtain two types of prediction results, determining a target rating true value and a target prediction result of each preset item based on the two types of prediction results, and determining a detection result of privacy data leakage of a federal recommendation system based on the matching relation between the target rating true value and the target prediction result. Therefore, the method and the device can effectively detect data leakage of the Union recommendation system.

Description

Privacy disclosure detection method and device for federal recommendation system

Technical Field

The application relates to the technical field of data processing, in particular to a privacy disclosure detection method and device for a federal recommendation system.

Background

With the development of the age, federal recommendation systems are widely used in various scenes, for example, various item recommendation scenes such as music recommendation, movie recommendation, shopping commodity recommendation, and the like. The federal recommendation system is characterized in that a recommendation model of each user side can be trained under the condition of not sharing user data, and the recommendation model can be used for carrying out item rating on each item, namely analyzing whether the user is interested in each item.

Because the federal recommendation system is an open system, in order to ensure data security, detection and analysis requirements for privacy data leakage conditions of the federal recommendation system exist, so that an implementation basis is provided for further improving the security of the federal recommendation system.

Therefore, a privacy disclosure detection method for federal recommendation system is needed to effectively detect privacy data disclosure of the federal recommendation system.

Disclosure of Invention

The embodiment of the application aims to provide a privacy disclosure detection method and device for a federal recommendation system, so as to realize effective privacy data disclosure detection for the federal recommendation system. The specific technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a method for detecting privacy leakage for a federal recommendation system, where the federal recommendation system includes a user side and a server side, and the method includes:

obtaining the embedding gradient corresponding to each preset item; the embedded gradient corresponding to each preset item is generated when the user side trains the local recommendation model according to the training data set of each preset item about the same user;

based on the obtained similarity relation between the embedded gradients corresponding to the preset items, estimating a first class rating distribution and a second class rating distribution for the preset items according to a preset item rating determining mode; the method for determining the item ratings comprises the steps of setting the same item rating for any two items if similar relationships represent similar, setting different item ratings for the similar relationships represent dissimilar, and setting different item ratings in different rating distributions for the same item;

respectively taking the item ratings in the first class rating distribution and the item ratings in the second class rating distribution as candidate rating truth values to construct first class shadow data and second class shadow data; wherein the first type shadow data and the second type shadow data are training data sets of each preset item about the same user;

Training a local recommendation model of the server based on first-class shadow data and second-class shadow data respectively to obtain a first-class prediction result corresponding to the first-class shadow data and a second-class prediction result corresponding to the second-class shadow data; the first type of prediction results and the second type of prediction results comprise candidate prediction results of all preset items and corresponding embedding gradients;

determining a target rating truth value and a target prediction result of each preset item based on the first type prediction result and the second type prediction result and the obtained embedding gradient corresponding to each preset item;

and determining a detection result of the privacy data leakage of the federal recommendation system based on the matching relation between the target rating true value and the target prediction result of each preset item.

In a second aspect, an embodiment of the present application provides a privacy disclosure detection apparatus for a federal recommendation system, where the federal recommendation system includes a user side and a server side, and the apparatus includes:

the acquisition module is used for acquiring the embedded gradient corresponding to each preset item; the embedded gradient corresponding to each preset item is generated when the user side trains the local recommendation model according to the training data set of each preset item about the same user;

The estimating module is used for estimating the first class rating distribution and the second class rating distribution aiming at each preset item according to a preset item rating determining mode based on the acquired similarity relation between the embedded gradients corresponding to each preset item; the method for determining the item ratings comprises the steps of setting the same item rating for any two items if similar relationships represent similar, setting different item ratings for the similar relationships represent dissimilar, and setting different item ratings in different rating distributions for the same item;

the construction module is used for constructing first-class shadow data and second-class shadow data by taking the item ratings in the first-class rating distribution and the item ratings in the second-class rating distribution as candidate rating true values respectively; wherein the first type shadow data and the second type shadow data are training data sets of each preset item about the same user;

the training module is used for training the local recommendation model of the server based on the first type shadow data and the second type shadow data respectively to obtain a first type prediction result corresponding to the first type shadow data and a second type prediction result corresponding to the second type shadow data; the first type of prediction results and the second type of prediction results comprise candidate prediction results of all preset items and corresponding embedding gradients;

The first determining module is used for determining a target rating true value and a target prediction result of each preset item based on the first type prediction result, the second type prediction result and the obtained embedding gradient corresponding to each preset item;

and the second determining module is used for determining the detection result of the privacy data leakage of the federal recommendation system based on the matching relation between the target rating true value and the target prediction result of each preset item.

In a third aspect, an embodiment of the present application provides an electronic device, including:

a memory for storing a computer program;

and the processor is used for realizing any privacy leakage detection method facing the federal recommendation system when executing the program stored in the memory.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, where the computer program when executed by a processor implements any one of the above-described federal recommendation system-oriented privacy leak detection methods.

In a fifth aspect, embodiments of the present application provide a computer program, where the computer program when executed on a computer causes the computer to perform any one of the above-mentioned federal recommendation system-oriented privacy leak detection methods.

The beneficial effects of the embodiment of the application are that:

according to the privacy leakage detection method for the federation recommendation system, the first class rating distribution and the second class rating distribution for each preset item can be estimated based on the similarity relation of the embedded gradients corresponding to each preset item, the first class shadow data and the second class shadow data are constructed, the two classes of shadow data have completely opposite item ratings, wherein the item ratings of the first class shadow data are similar to the target rating truth values, the two classes of shadow data are trained to obtain a first class prediction result and a second class prediction result, the target rating truth value and the target prediction result are determined based on the first class prediction result and the second class prediction result, and accordingly the detection result of privacy data leakage of the federation recommendation system is determined according to the matching relation of the target rating truth value and the target prediction result.

And, the embodiment of the application discards the scheme of carrying out assumption on the positive and negative sampling ratio of the user in the related technology, but builds the first type shadow data and the second type shadow data with completely opposite project ratings based on the embedded gradients corresponding to the predetermined projects, so that the detection result of the privacy data leakage of the federal recommendation system is finally determined based on the first type shadow data and the second type shadow data, and therefore, the embodiment of the application does not need to carry out assumption on the positive and negative sampling ratio of the user, and the accuracy of detecting the federal recommendation system is improved.

Of course, not all of the above-described advantages need be achieved simultaneously in practicing any one of the products or methods of the present application.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly introduce the drawings that are required to be used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other embodiments may also be obtained according to these drawings to those skilled in the art.

Fig. 1 is a training schematic diagram of a user side and a server side in a federal recommendation system according to an embodiment of the present application;

fig. 2 is a flow chart of a first federal recommendation system-oriented privacy disclosure detection method according to an embodiment of the present application;

fig. 3 is a flow chart of a second federal recommendation system-oriented privacy disclosure detection method according to an embodiment of the present application;

fig. 4 is a flowchart of a third method for detecting privacy disclosure for a federal recommendation system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a privacy disclosure detection device for a federal recommendation system according to an embodiment of the present application;

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. Based on the embodiments herein, a person of ordinary skill in the art would be able to obtain all other embodiments based on the disclosure herein, which are within the scope of the disclosure herein.

The following first describes the terms of art referred to in the embodiments of the present application:

federal recommendation system: a distributed recommendation system model combining a recommendation system with a federal distribution scenario aims to handle the requirements of distributed data sources and protecting user privacy. The federal recommendation system can be understood as a user side and a server side, both the user side and the server side are provided with local recommendation models, the federal recommendation system can perform joint training on the local recommendation models of the user side and the server side, that is, the server side can update model parameters of the local recommendation models of the server side based on reported model parameters of the user side and send the model parameters to the user side so that the user side also updates the model parameters, and the purpose of joint training is achieved.

Cosine similarity (Cosine Similarity): a mathematical metric method for measuring the similarity between two non-zero vectors evaluates the degree of similarity between the two vectors by determining the angle between them in a multidimensional space. The calculation formula is as follows:

;

wherein,representation->And->The degree of cosine similarity between the two,/>respectively represent vector +.>Or B, typically the euclidean norm (L2 norm), calculated as the sum of squares of each element and then the root number. The cosine similarity has a value between-1 and 1, with values closer to 1 indicating more similarity, values closer to-1 indicating less similarity, and values closer to 0 indicating moderate similarity or uncorrelation.

For better understanding of the present solution, the following description will be given for simplicity of related art:

according to the privacy leakage detection method facing the federal recommendation system in the related art, embedding gradients corresponding to all preset items at any moment can be obtained, item ratings are randomly distributed to all preset items according to the assumed positive and negative sampling ratios, so that shadow data are built, the built shadow data are training data sets related to all preset items, wherein the training data of each preset item in the training data sets comprise user data of a sample user, characteristic data of the preset item and item ratings of the preset item; the assumed positive and negative sampling ratio is the ratio of interest to non-interest of a sample user aiming at each preset item, and a local recommendation model of a server is trained according to shadow data, so that a shadow embedding gradient generated in the training process is obtained, the shadow embedding gradient and the Euclidean distance between the obtained embedding gradients corresponding to each preset item are calculated, and therefore the detection result of privacy data leakage of a federal recommendation system is deduced.

However, the method relies on the fact that the shadow data is close to the real training data set of the local recommendation model, so that the detection result of the privacy data leakage of the federal recommendation system can be deduced, and the shadow data is constructed by assuming positive and negative sampling ratios, so that the situation that the similarity between the shadow data and the real training data set is extremely poor may exist, and therefore the federal recommendation system cannot be effectively detected for the privacy data leakage, and the detection accuracy of the federal recommendation system is low.

Based on the problems existing in the related art, the embodiment of the application provides a privacy disclosure detection method and device for a federal recommendation system, so as to effectively detect privacy data disclosure of the federal recommendation system.

In order to better understand the scheme, the following is a simple description of training principles of a user side and a server side in a federal recommendation system, and is shown in fig. 1 in conjunction with the accompanying drawings:

fig. 1 is a schematic diagram of training between a user end and a server end in a federal recommendation system, where the federal recommendation system includes a server end 110 and a plurality of user ends 120, after a local recommendation model of the server end 110 is trained according to a training data set, a global model may be updated, that is, model parameters may be sent to the plurality of user ends 120, so that the local recommendation models of the plurality of user ends 120 may be updated, the local recommendation model of the user ends 120 may also be trained locally according to a training set, and the plurality of user ends 120 may upload model parameters to the server end 110 for central aggregation, thereby implementing training between the user ends 120 and the server end 110.

Before training the local recommendation model of the user side, the user side can initialize the private parameters, wherein the private parameters can be user embedding, the user embedding can be called user data, the server side can initialize the global parameters, the global parameters can be item embedding, and the item embedding can be called feature data of the item.

Then, the process of training for the local recommendation model in the federal recommendation system may include the following four steps:

1) When each round of training is started, the server side firstly randomly selects the user side to participate in the round of training, and sends corresponding global parameters to the selected user side;

2) After receiving the global parameters, the user side combines the global parameters with local private parameters to form a local recommendation model, and the local recommendation model is used for a local private training setTraining, and continuously updating model parameters of the local recommendation model by using a loss function to obtain a user embedding gradient and a global project embedding gradient;

3) The user terminal keeps the user embedded gradient locally for updating own private parameters, and uploads the project embedded gradient to the server terminal;

4) The server obtains the project embedding gradient uploaded by each user terminal, aggregates the project embedding gradient, and updates the global parameters.

The above four steps may be iterated.

The loss function may be, for example, as follows:

;

wherein,loss value for model training, +.>For the user data of sample user i in the training set, i characterizes sample user i,/i>For the feature data of item j in the training set, j characterizes item j, < >>Sample user i's item rating for item j for training set, +.>Predicted values trained for the local recommendation model according to the user data of sample user i and the feature data of item j are +.>The training set consists of user data, characteristic data of the items and item ratings, and can be regarded as triple data, namely +.>。

Wherein,representing item j as the positive item of sample user i, sample user i may be considered interested in item j; on the contrary->Representing item j as a negative term for sample user i may be considered as sample user i not interested in item j. />Andthe positive and negative item sets of sample user i can be represented, respectively, and the federal recommendation system is trained to predict the sample user i's rating of items that have not yet interacted with +. >。

The user data may represent identity id information of the user, and it should be emphasized that the user data does not represent actual personal information such as gender, age, weight, etc. of the user, and may be represented by only one string of numbers, and when training is performed for the first time, the user data may be a string of numbers generated randomly, so that leakage of personal information of the user may be avoided.

Wherein the item ratings characterize the user's interest level in the item, i.e., whether it is of interest or not, may be represented in various forms, such as: "1" is characterized as interesting, then "0" may be characterized as not interesting, and the specific form of rating the item is not specifically limited by the embodiments of the present application. Illustratively, user A is interested in the bread, then the item rating of the item bread may be "1," characterized as being of interest.

The following first describes a privacy disclosure detection method for a federal recommendation system provided in the embodiments of the present application.

The privacy leakage detection method for the federal recommendation system, provided by the embodiment of the application, can be applied to electronic equipment. For example, in one alternative implementation, the electronic device may be a terminal device or server for monitoring the federal recommendation system; in another alternative implementation, the electronic device may also be a device running a server in a federal recommendation system. The specific device configuration of the electronic device is not limited in this application. Specifically, the execution body of the privacy disclosure detection method facing the federal recommendation system may also be a privacy disclosure detection device facing the federal recommendation system. For example, when the privacy disclosure detection method for the federal recommendation system is applied to the terminal device, the privacy disclosure detection device for the federal recommendation system may be software running in the terminal device and used for performing privacy data disclosure detection for the federal recommendation system. For example, when the federal recommendation system-oriented privacy disclosure detection method is applied to a server, the federal recommendation system-oriented privacy disclosure detection device may be a computer program running in the server, and the computer program may be used for performing privacy data disclosure detection for the federal recommendation system.

It can be understood that the federal recommendation system mentioned in the embodiments of the present application may be applicable to various item recommendation scenarios, and exemplary, music recommendation, movie recommendation, shopping commodity recommendation, and other various item recommendation scenarios, where in a music recommendation scenario, a local recommendation model obtained through user side training in the federal recommendation system may recommend music of interest to a user, and a local recommendation model obtained through user side training in the federal recommendation system may recommend movies of interest to a user.

In addition, for any predetermined item, the item rating of the predetermined item is used to characterize whether the user is interested in the predetermined item, i.e., the item rating of any predetermined item may include two ratings, characterizing interest and non-interest, respectively. Also, the various predetermined items mentioned in this application may be different for different scenarios. For example, for an audio recommendation scenario, each of the predetermined items belongs to an audio type, each of the predetermined items may be each of the predetermined audio, and for a merchandise recommendation scenario, each of the predetermined items belongs to a merchandise type, each of the predetermined items may be each of the predetermined merchandise.

The method for detecting privacy leakage of the federal recommendation system comprises a user side and a server side, and comprises the following steps:

Therefore, the first class rating distribution and the second class rating distribution for each preset item can be estimated based on the similarity relation of the embedded gradients corresponding to each preset item, the first class shadow data and the second class shadow data are constructed, the two classes of shadow data have completely opposite item ratings, the item ratings of the first class shadow data are similar to the target rating truth value, the two classes of shadow data are trained to obtain a first class prediction result and a second class prediction result, the target rating truth value and the target prediction result are determined based on the first class prediction result and the second class prediction result, and accordingly the detection result of privacy data leakage of the federal recommendation system is determined according to the matching relation of the target rating truth value and the target prediction result, so that the scheme can be used for effectively detecting privacy data leakage of the federal recommendation system, and the safety of the federal recommendation system is improved.

The following describes a privacy leakage detection method for a federal recommendation system provided by an embodiment of the present application with reference to the accompanying drawings.

As shown in fig. 2, the first federal recommendation system-oriented privacy leakage detection method provided in the embodiment of the present application may include the following steps:

s201, obtaining embedding gradients corresponding to all preset items;

the embedded gradient corresponding to each preset item is generated when the user side trains the local recommendation model according to the training data set of each preset item about the same user;

it will be appreciated that each predetermined item is predetermined and is for the same user, for example: in the scene of music recommendation, 1000 songs can be subjected to recommendation analysis aiming at a sample user A, namely whether the sample user A is interested in the 1000 songs or not is identified, and each song can be regarded as a preset item; in the commodity recommendation scenario, 500 commodities may be recommended to the sample user B, that is, whether the sample user B is interested in the 500 commodities is identified, and each commodity may be regarded as a predetermined item. In addition, each preset item does not belong to the range of privacy disclosure, so that the user side and the server side can acquire specific contents of each preset item in advance, and feature data of the preset item for constructing training data can be obtained.

It should be emphasized that, for each round of model training process, each predetermined item may have an embedding gradient corresponding to the predetermined item, and the embedding gradient corresponding to each predetermined item characterizes an update direction of the predetermined item embedded in an update process corresponding to the round of training, that is, an update direction of updating the item rating and the user data for the predetermined item in a model training process based on training data of the predetermined item. In this application, the embedding gradient corresponding to each predetermined item obtained in step S101 may be the embedding gradient corresponding to each predetermined item obtained in any round of model training process. It should be emphasized that the local recommendation model may be regarded as a two-classification model, the item ratings may be obtained based on the activation function, the input for the local recommendation model may be the user data and the feature data of the predetermined item, the output result is the item ratings characterizing the interest or the non-interest, and in an ideal state, the user data and the feature data of the predetermined itemBy performing a specific operation, a rating of the item can be obtained, and then the input of user i for the activation function can beWherein h can be considered as a global model, < +. >User i can be considered as user data of the t-th round of model training, +.>Characteristic data of a predetermined item, which can be considered as model training round t,/>May be considered as input by user i for the activation function. The probability after the sigmoid activation function is: />，/>The range of the value of (2) can be (0, 1)]，/>Probability output for the local recommendation model, then the formula for calculating the embedded gradient corresponding to the predetermined item is:

;

wherein i characterizes the user i, V characterizes the characteristic data of the respective predetermined item,characterizing the embedding gradient of user i with respect to a predetermined item, < >>Characterizing the partial derivative of the input of user i for the activation function,/->Partial derivative of the characteristic data characterizing the predetermined item, < ->A truth value characterizing user i in the training set regarding the rating of the item, < >>Characterizing the partial derivative of the output of user i for the activation function; wherein a true value of the item rating characterizes a true item rating of the user for each predetermined item.

It should be emphasized that the above formula is merely an exemplary calculation method for the calculation method of the embedding gradient, which is not specifically limited in this application.

S202, estimating a first class rating distribution and a second class rating distribution for each preset item according to a preset item rating determination mode based on the obtained similarity relation between embedded gradients corresponding to each preset item;

The method for determining the item ratings comprises the steps of setting the same item rating for any two items if similar relationships represent similar, setting different item ratings for the similar relationships represent dissimilar, and setting different item ratings in different rating distributions for the same item;

the inventors found in the study that: if the item rating truth values of the two preset items corresponding to the same sample user are consistent, the embedding gradient obtained by the preset items in the training process is similar, and if the item rating truth values of the two preset items corresponding to the same sample user are inconsistent, the embedding gradient obtained by the two preset items in the training process is large. Based on the relation between the embedded gradients found by the research of the inventor and the item ratings, in the application, based on the obtained similarity relation between the embedded gradients corresponding to each preset item, for any two preset items, if the similarity relation of the embedded gradients corresponding to the two preset items is characterized as similar, the same item ratings can be set for the two preset items, and similarly, if the similarity relation of the embedded gradients corresponding to the two preset items is characterized as dissimilar, the opposite item ratings can be set for the two preset items, and for the same item, different item ratings are provided in different rating distributions, so that two types of rating distributions with completely opposite item ratings can be obtained. It should be emphasized that the first class rating distribution or the second class rating distribution has a class of item ratings similar to the target rating truth value, where the target rating truth value may represent a true item rating of a user for each predetermined item, may be an item rating of the first class rating distribution or may be an item rating of the second class rating distribution, and embodiments of the present application are not specifically limited.

It will be appreciated that if the similarity of the corresponding embedded gradients of two predetermined items is characterized as similar, then item ratings that characterize interest or not may be set for both predetermined items, and similarly, if the similarity of the embedded gradients of two predetermined items is characterized as dissimilar, then item ratings that characterize interest may be set for one of the two predetermined items, and item ratings that characterize not interest may be set for the other, so that a first type rating distribution and a second type rating distribution for each predetermined item may be obtained. For example, each predetermined item includes a commodity a, a commodity b and a commodity c, the similarity relation between the embedded gradients corresponding to the commodity a and the commodity b is similar in representation, and the similarity relation between the embedded gradients corresponding to the commodity a and the commodity c is dissimilar in representation, so that the item ratings of interest can be set for the commodity a and the commodity b, the item ratings of non-interest can be set for the commodity c, a first class rating distribution can be obtained, the item ratings of non-interest can be set for the commodity a and the commodity b, the item ratings of interest can be set for the commodity c, a second class rating distribution can be obtained, and the item ratings of the first class rating distribution and the second class rating distribution are completely opposite for the item ratings of the commodity a, the commodity b and the commodity c.

In one implementation manner, a cosine similarity matrix of the embedded gradients can be established by using a cosine similarity calculation formula, so that the similarity relation between the embedded gradients corresponding to each predetermined item is clearly displayed. The calculation formula of the cosine similarity is as follows:

;

wherein p and q are any two predetermined items,for user i, an embedding gradient corresponding to predetermined item p, < >>For user i, an embedding gradient corresponding to predetermined item q, < >>Is->And->Cosine similarity of (c).

It may be appreciated that, based on the calculated cosine similarity, the server may establish a cosine similarity matrix, and may further analyze a rating distribution of the user for each predetermined item by using signs of each cosine similarity in the cosine similarity matrix, that is, analyze which predetermined items belong to the same class of item ratings.

For a better understanding of the present solution, the following analysis is performed for the relation between the embedding gradient found by the inventor's study and the item rating:

the embedding gradient corresponding to any pair of preset items isThe formula for calculating cosine similarity between two embedded gradients can be deformed by combining the formula for calculating the embedded gradient corresponding to the predetermined item, so that the following formula can be obtained: / >=/>Wherein->Probability of the predetermined item p for user i output for the local recommendation model, +.>For user i to rate the actual item for the predetermined item p,probability of predetermined item q for user i output for local recommendation model, +.>For user i, true item rating for predetermined item q,/for user i>The partial derivative of the probability for user i with respect to the predetermined item p is output for the local recommendation model,partial derivatives of the probability of user i with respect to the predetermined item q, outputted for the local recommendation model, +.>Partial derivative of characteristic data for a predetermined item p, < ->Partial derivative of the characteristic data for the predetermined item q, due to +.>And->The range of the value of (2) can be (0, 1)]When the predetermined items p, q belong to the same item, i.e. when both predetermined items p, q are items of interest or items not of interest to the user i, the user i is not interested in the item>The value of the product is a positive number, whereas when the predetermined items p, q do not belong to the same item, i.e. one of the predetermined items p, q is an item of interest to the user and the other is an item of no interest to the user i->The value of the product is a negative number. Then, for any two predetermined items, their embedding gradients can always form an acute angle throughout the calculation of the cosine similarity of the embedding gradients, so that the cosine value of the embedding gradient is greater than 0, i.e. >The first class rating distribution and the second class rating distribution for each predetermined item may be estimated for each predetermined item in a predetermined item rating determination manner. Exemplary, the corresponding embedding gradient of the commercial bread is +.>The embedding gradient corresponding to the commodity mobile phone is +.>Then the cosine similarity between the two embedding gradients can be calculated, and +.>=/>Wherein->Probability of merchandise bread for user i output for local recommendation model, +.>For user i, rating the real item of merchandise bread, < >>Commodity hand for user i output for local recommendation modelProbability of machine->Rating the user i for the real item of the merchandise mobile phone, < +.>Partial derivatives of the probability of user i on commodity bread output for the local recommendation model, +.>Partial derivatives of the probability of the user i about the merchandise cell phone, which are outputted for the local recommendation model, +.>Partial derivative of characteristic data for commodity bread, < +.>Partial derivative of characteristic data of commodity mobile phone, < ->Is->And->Cosine similarity of (c). Then, the cosine similarity is calculated to be positive, so that the commodity bread and the commodity mobile phone belong to the same class of item ratings, and the item ratings can be set for the commodity bread and the commodity mobile phone, namely, the commodity bread and the commodity mobile phone are set to be the item ratings of interest or the commodity bread and the commodity mobile phone are set to be the item ratings of no interest.

S203, respectively taking the item ratings in the first class rating distribution and the item ratings in the second class rating distribution as candidate rating truth values, and constructing first class shadow data and second class shadow data;

wherein the first type shadow data and the second type shadow data are training data sets of each preset item about the same user;

it may be appreciated that the specific form of the shadow data is triple data, that is, the specific form is composed of user data, feature data of a predetermined item and an item rating, and then the item rating in the first class rating distribution is used as a candidate rating truth value, and since the user data and the predetermined item can be obtained in advance, the first class shadow data can be constructed, and similarly, the item rating in the second class rating distribution is used as a candidate rating truth value, and the second class shadow data can be constructed. It should be emphasized that, since the item ratings in the first class rating distribution or the second class rating distribution are similar to the target rating truth value, it is obvious that the item ratings in the first class rating distribution and the item ratings in the second class rating distribution are used as candidate rating truth values, the item ratings of one class of shadow data in the constructed first class shadow data and the constructed second class shadow data are also similar to the target rating truth value, and the first class shadow data and the second class shadow data are training data sets of each predetermined item about the same user, so that the constructed first class shadow data or one class shadow data in the second class shadow data are similar to the real training data sets of the user side about each predetermined item and about the same user, and can be the first class shadow data or the second class shadow data.

S204, training a local recommendation model of the server based on the first type shadow data and the second type shadow data respectively to obtain a first type prediction result corresponding to the first type shadow data and a second type prediction result corresponding to the second type shadow data;

the first type of prediction results and the second type of prediction results comprise candidate prediction results of all preset items and corresponding embedding gradients;

wherein the prediction result may specifically characterize the user as interested or not interested in the predetermined item.

It can be understood that, for each predetermined item in the first-type shadow data and the second-type shadow data, the local recommendation model of the server is trained, so that a candidate prediction result and a corresponding embedding gradient of the first-type shadow data for each predetermined item and a candidate prediction result and a corresponding embedding gradient of the second-type shadow data for each predetermined item can be obtained.

It can be understood that the local recommendation model of the server is essentially the same as the local recommendation model of the user, and the local recommendation model of the user has been described above, so the local recommendation model and training process of the server and the part for calculating the embedding gradient are not described herein, and the description thereof is not described herein.

S205, determining a target rating true value and a target prediction result of each preset item based on the first-type prediction result and the second-type prediction result and the obtained embedding gradient corresponding to each preset item;

it may be understood that, since the first type of prediction result and the second type of prediction result both include candidate prediction results of each predetermined item and corresponding embedding gradients, the similarity degree of the candidate prediction results of each predetermined item and the corresponding embedding gradients of each predetermined item included in the first type of prediction result and the obtained embedding gradients of each predetermined item included in the second type of prediction result may be compared respectively, and the item ratings of the predetermined items with high similarity degrees in the first type of rating distribution and the second type of rating distribution may be used as the target rating truth value. The first type of prediction results comprise candidate prediction results of commodity a, commodity b and commodity c and corresponding embedding gradients, wherein the candidate prediction results are respectively that commodity a is of interest, commodity b is of interest, and commodity c is of no interest; then, the similarity degree comparison can be performed with the obtained embedding gradients corresponding to the commodity a, the commodity b and the commodity c respectively, the embedding gradients corresponding to the commodity a and the commodity c in the first type of prediction results can be obtained, the similarity degree of the embedding gradients corresponding to the obtained commodity a and the commodity c is higher, the embedding gradient corresponding to the commodity b in the second type of prediction results is higher, and therefore the item ratings of the commodity a and the commodity c can be selected from the rating distribution corresponding to the first type of prediction results, and the item ratings of the commodity b can be selected from the rating distribution corresponding to the second type of prediction results, so that the target rating true value is obtained.

In addition, the target prediction result may be selected from candidate prediction results in the first type of prediction result and the second type of prediction result, so the target prediction result may be a candidate prediction result in the first type of prediction result or a candidate prediction result in the second type of prediction result, and the embodiment of the present application is not specifically limited.

For the sake of clarity of layout, other implementations of this portion of content will be described in the following embodiments, and thus will not be described in detail herein.

S206, determining detection results of privacy data leakage of the federal recommendation system based on matching relations between target rating true values and target prediction results of all the preset items.

It can be understood that when the similarity between the target rating truth value and the target prediction result of each predetermined item is higher, it can be determined that the detection result of the privacy data disclosure of the federal recommendation system is the case of privacy data disclosure, whereas when the similarity between the target rating truth value and the target prediction result of each predetermined item is lower, it can be determined that the detection result of the privacy data disclosure of the federal recommendation system is the case of no privacy data disclosure. For the sake of clarity of layout, other implementations of this portion of content will be described in the following embodiments, and thus will not be described in detail herein.

Optionally, in one implementation manner, the determining the detection result of the privacy data disclosure of the federal recommendation system based on the matching relationship between the target rating truth value and the target prediction result of each predetermined item includes steps A1-A2:

step A1, judging whether a target rating true value corresponding to each preset item is consistent with a target prediction result or not according to each preset item, and obtaining a judgment result of the preset item;

it can be understood that, for each predetermined item, the consistency of the target rating truth value corresponding to the predetermined item and the target prediction result can be judged, and two judging results can be obtained, that is, the target rating truth value corresponding to the predetermined item is consistent with the target prediction result, or the target rating truth value corresponding to the predetermined item is inconsistent with the target prediction result. For example, aiming at the pull noodles of the takeaway commodity, the corresponding target rating truth value represents the interest, the target prediction result represents the non-interest, and at the moment, the target rating truth value and the target prediction result represent the non-interest, and the target rating truth value and the target prediction result are not consistent, so that the judgment result aiming at the pull noodles of the takeaway commodity is obtained.

And step A2, determining the detection result of the privacy data leakage of the federal recommendation system based on the proportion of the number of the preset items with the judging result with the consistent characterization and the total number of the preset items.

It is understood that, for each predetermined item, the determination result of the predetermined item may be obtained, and then the number of predetermined items representing the consistent determination result may be counted, and the ratio of the number of predetermined items representing the consistent determination result to the total number of predetermined items may be calculated, thereby determining the detection result of the privacy data disclosure of the federal recommendation system. For example, the number of predetermined items representing the consistent determination result is 50, and the total number of each predetermined item is 100, and then the ratio may be calculated to be 0.5, so as to determine the detection result of the privacy data leakage of the federal recommendation system.

In one implementation, the determining the detection result of the privacy data disclosure of the federal recommendation system based on the ratio of the number of the predetermined items having the determination result of the consistent characterization to the total number of the respective predetermined items in step A2 may include steps a21-a23:

step A21, judging whether the ratio of the number of the preset items with the judging results consistent with the characterization and the total number of the preset items is larger than a preset threshold value or not;

it is understood that the predetermined threshold is a preset threshold for determining a detection result of the private data leakage of the federal recommendation system, and the predetermined threshold may be an empirical value, and then, it is determined whether the ratio of the number of predetermined items having a determination result indicating coincidence to the total number of respective predetermined items is greater than the predetermined threshold, two results may occur, one is greater than the predetermined threshold, and the other is less than the predetermined threshold, and the detection result of the private data leakage of the federal recommendation system may be determined based on the two cases.

Step A22, if the detection result is larger than a preset threshold value, determining the first detection result as a detection result of privacy data leakage of the federal recommendation system;

the first detection result is used for representing a detection result of the federal recommendation system, wherein the detection result is used for representing the problem of privacy data leakage.

It may be appreciated that when the number of predetermined items having the determination result with the consistent characterization is greater than the predetermined threshold, the target rating truth value corresponding to the predetermined item may be considered to be similar to the target prediction result to a higher degree, and the first detection result may be determined as the detection result of the privacy data leakage of the federal recommendation system, that is, the federal recommendation system may be considered to have the privacy data leakage problem. For example, if the number of predetermined items having a judgment result indicating agreement with the total number of the respective predetermined items is 0.8 and the predetermined threshold is 0.5, it is greater than the predetermined threshold, the federal recommendation system may be considered to have a problem of private data leakage.

Step A23, if the detection result is not greater than the preset threshold value, determining a second detection result as a detection result of privacy data leakage of the federal recommendation system;

The second detection result is used for representing a detection result of the federal recommendation system, wherein the detection result is free of a privacy data leakage problem.

It may be appreciated that when the number of predetermined items having the determination result with the consistent characterization is smaller than the predetermined threshold, the target rating truth value corresponding to the predetermined item may be considered to be similar to the target prediction result to a low degree, and the second detection result may be determined as the detection result of the privacy data leakage of the federal recommendation system, that is, the federal recommendation system may be considered to have no privacy data leakage problem. For example, if the ratio of the number of predetermined items having the determination result indicating agreement to the total number of each predetermined item is 0.2 and the predetermined threshold is 0.5, it is less than the predetermined threshold, and it can be considered that the federal recommendation system has no problem of private data leakage.

And determining the second detection result as a detection result of the leakage of the privacy data of the federal recommendation system when the ratio of the number of predetermined items having the judgment result indicating agreement to the total number of the respective predetermined items agrees with the predetermined threshold.

Of course, the above manner of characterizing the detection result of the disclosure of the private data of the federal recommendation system is only one feasible implementation manner, and the manner of characterizing the detection result of the disclosure of the private data of the federal recommendation system may be multiple, and correspondingly, the manner of determining the detection result of the disclosure of the private data of the federal recommendation system may also be multiple.

Therefore, according to the embodiment of the application, whether the proportion of the number of the preset items with the judging result consistent with the characterization to the total number of the preset items is larger than the preset threshold value or not can be determined, so that the detection result of the private data leakage of the federal recommendation system can be determined, and different detection results can be determined based on different conditions.

Optionally, in an implementation manner, the obtained similarity relationship between the embedded gradients corresponding to the respective predetermined items includes:

the similarity relationship between the embedding gradients corresponding to the reference items and the embedding gradients corresponding to other preset items respectively; wherein the reference item is one item in each preset item, and any other preset item is one item except the reference item in each preset item;

Correspondingly, based on the obtained similarity relation between the embedded gradients corresponding to each predetermined item, the first class rating distribution and the second class rating distribution for each predetermined item are estimated according to a predetermined item rating determination mode, and the method comprises the following steps of:

step B1, setting a first item rating for the reference item, setting a second item rating for each other preset item according to similarity relation representation of the embedding gradients corresponding to the other preset items and the embedding gradients corresponding to the reference item, and obtaining a first class rating distribution according to similarity relation representation similarity of the embedding gradients corresponding to the other preset items and the embedding gradients corresponding to the reference item; wherein the first and second item ratings are different item ratings;

it may be understood that when the similarity relationship between the embedding gradients corresponding to each predetermined item is obtained, a reference item may be determined, and the embedding relationship between the embedding gradients corresponding to the reference item and the embedding gradients corresponding to each other predetermined item may be calculated, where the reference item may be any one item of each predetermined item, and selection of the reference item may be randomly selected or designated for selection. For example, the predetermined items include a commodity a, a commodity b and a commodity c, and when the similarity relationship between the embedding gradients corresponding to the respective predetermined items is calculated, the commodity a may be used as a reference item, so as to determine the similarity relationship between the embedding gradients corresponding to the commodity a and the commodity b, and the similarity relationship between the embedding gradients corresponding to the commodity a and the commodity c.

It may be appreciated that the first item rating may be set for the reference item, where the first item rating may be a rating indicating an item of interest or may be a rating indicating an item of no interest, which is not limited in the embodiment of the present application; however, the first item rating and the second item rating are required to be completely opposite item ratings, and then after the first item rating is set for the reference item, the first item rating can be set for other predetermined items if the similarity relation between the embedding gradient corresponding to the other predetermined items and the embedding gradient corresponding to the reference item is similar, otherwise, the second item rating can be set for other predetermined items if the similarity relation between the embedding gradient corresponding to the other predetermined items and the embedding gradient corresponding to the reference item is dissimilar, so as to obtain the first class rating distribution. For example, each predetermined item includes song 1, song 2 and song 3, song 1 may be used as a reference item, the first item rating is used for representing the interesting item rating, then the second item rating is used for representing the non-interesting item rating, the similarity relationship between the embedding gradient corresponding to song 1 and the embedding gradient corresponding to song 2 is similar, so that the first item rating may be set for song 2, that is, representing the interesting item rating, the similarity relationship between the embedding gradient corresponding to song 1 and the embedding gradient corresponding to song 3 is dissimilar, so that the second item rating may be set for song 3, that is, representing the non-interesting item rating, and thus the first class rating distribution may be obtained.

And B2, setting the second item rating for the reference item, setting the second item rating for other predetermined items according to similarity relation characterization of the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item, and setting the first item rating for the other predetermined items according to dissimilarity relation characterization of the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item, so as to obtain a second class rating distribution.

It may be appreciated that a second item rating may be set for the reference item, where the second item rating may be a rating that characterizes an item of interest or may be a rating that characterizes an item of no interest, which is not limited in this embodiment of the present application; however, the second item rating is required to be completely opposite to the first item rating, so after the second item rating is set for the reference item, the second item rating can be set for other predetermined items if the similarity relation between the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item is similar, otherwise, the first item rating can be set for other predetermined items if the similarity relation between the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item is dissimilar, so as to obtain a second class rating distribution. By way of example, each predetermined item includes a commodity apple, a commodity orange and a commodity banana, the commodity apple can be used as a reference item, the second item rating is an item rating representing no interest, then the first item rating is an item rating representing interest, the similarity relation representation of the embedding gradient corresponding to the commodity apple and the embedding gradient corresponding to the commodity orange is similar, so the second item rating can be set for the commodity orange, namely, the item rating representing no interest is not similar to the similarity relation representation of the embedding gradient corresponding to the commodity banana, so the first item rating can be set for the commodity banana, namely, the item rating representing interest is similar, and the second class of rating distribution can be obtained.

Of course, other manners may be adopted to obtain the first class rating distribution and the second class rating distribution, and the foregoing implementation manner is merely an example, and in calculating the similarity, a manner of selecting a reference item may not be adopted, and the similarity between every two predetermined items may be calculated, or the predetermined items may be sorted, and the similarity with the previous predetermined item may be calculated, which is not limited in this embodiment of the present application.

Therefore, according to the embodiment of the application, one standard item can be selected, the first item rating or the second item rating is set for the standard item, the item ratings are set for other preset items based on the similarity relation between the embedding gradients corresponding to the standard item and the embedding gradients of other preset items, so that two types of completely opposite rating distribution are obtained.

Optionally, in an implementation manner, the embodiment of the present application further provides a second federal recommendation system-oriented privacy disclosure detection method, as shown in fig. 3, where the method may include:

S301, obtaining embedding gradients corresponding to all preset items;

s302, estimating a first class rating distribution and a second class rating distribution for each preset item according to a preset item rating determination mode based on the acquired similarity relation between embedded gradients corresponding to each preset item;

s303, respectively taking the item ratings in the first class rating distribution and the item ratings in the second class rating distribution as candidate rating truth values, and constructing first class shadow data and second class shadow data;

s304, training a local recommendation model of the server based on the first type shadow data and the second type shadow data respectively to obtain a first type prediction result corresponding to the first type shadow data and a second type prediction result corresponding to the second type shadow data;

it is understood that steps S301-S304 and S307 are identical to steps S201-S204 and S206, and will not be described herein.

S305, respectively determining a similarity relation between an embedding gradient corresponding to the preset item in a first type of prediction result and the obtained embedding gradient corresponding to the preset item and a similarity relation between the embedding gradient corresponding to the preset item in a second type of prediction result and the obtained embedding gradient corresponding to the preset item according to each preset item, and selecting a target rating true value corresponding to the preset item from the item ratings of the preset item contained in the first type of rating distribution and the item ratings of the preset item contained in the second type of rating distribution based on the determined similarity relation;

The obtained embedding gradient corresponding to the predetermined item, that is, the embedding gradient generated when the local recommendation model is trained according to the training data set about the same user of each predetermined item uploaded by the user side, may be understood as the embedding gradient of the real training data set about the same user of each predetermined item, which is similar to the embedding gradient, and the item rating of the predetermined item in the first class rating distribution or the second class rating distribution may be considered as the target rating truth value corresponding to the predetermined item.

It may be appreciated that, for each predetermined item, if the embedding gradient corresponding to the predetermined item in the first type of prediction result is similar to the obtained embedding gradient corresponding to the predetermined item, the item rating of the predetermined item included in the first type of rating distribution may be used as the target rating truth value corresponding to the predetermined item; similarly, if the embedding gradient corresponding to the predetermined item in the second class prediction result is similar to the obtained embedding gradient corresponding to the predetermined item, the item rating of the predetermined item included in the second class rating distribution may be used as the target rating truth value corresponding to the predetermined item.

Then, the target rating truth value corresponding to each predetermined item may include a first type rating distribution and/or an item rating of the predetermined item in a second type rating distribution, and in a special case, there is also an item rating of each predetermined item, where each target rating truth value corresponding to each predetermined item is included in the first type rating distribution or the second type rating distribution. For example, there are 100 commodities, 60 commodities whose embedding gradients are similar to the obtained embedding gradients corresponding to the predetermined items may be determined in the first type of prediction result, and 40 commodities whose embedding gradients are similar to the obtained embedding gradients corresponding to the predetermined items may be determined in the second type of prediction result, so that the item ratings of 60 commodities whose embedding gradients are similar to the embedding gradients included in the first type of rating distribution and the item ratings of 40 commodities whose embedding gradients are similar to the embedding gradients included in the second type of rating distribution may be regarded as target rating truth values corresponding to the 100 commodities.

In one implementation manner, the selecting, based on the determined similarity relationship, a target rating truth value corresponding to the predetermined item from the item ratings of the predetermined item included in the first class rating distribution and the item ratings of the predetermined item included in the second class rating distribution may include steps C1-C3:

Step C1, based on the determined similarity relation, selecting an embedding gradient with high similarity degree with the obtained embedding gradient corresponding to the preset item from the embedding gradient corresponding to the preset item contained in the first type of prediction result and the embedding gradient corresponding to the preset item contained in the second type of prediction result, and obtaining the embedding gradient to be utilized;

it can be understood that, for each predetermined item, if the embedding gradient corresponding to the predetermined item included in the first type of prediction result is more similar to the obtained embedding gradient corresponding to the predetermined item than the embedding gradient corresponding to the predetermined item included in the second type of prediction result, the embedding gradient corresponding to the predetermined item included in the first type of prediction result can be selected as the embedding gradient to be utilized; similarly, if the embedding gradient corresponding to the predetermined item included in the second type of prediction result is similar to the obtained embedding gradient corresponding to the predetermined item more highly than the embedding gradient corresponding to the predetermined item included in the first type of prediction result, the embedding gradient corresponding to the predetermined item included in the second type of prediction result may be selected as the embedding gradient to be utilized. It can be seen that the embedding gradient to be utilized may be the embedding gradient corresponding to the predetermined item included in the first type of prediction result, or the embedding gradient corresponding to the predetermined item included in the second type of prediction result, which is not specifically limited in the embodiment of the present application. For example, if the embedding gradient corresponding to the commodity a included in the first type of prediction result is similar to the obtained embedding gradient corresponding to the commodity a, the embedding gradient corresponding to the commodity a included in the first type of prediction result may be used as the embedding gradient to be utilized.

Step C2, determining the rating distribution based on the shadow data corresponding to the embedded gradient to be utilized in construction from the first class rating distribution and the second class rating distribution; the shadow data corresponding to the embedded gradient to be utilized is shadow data for training to obtain the embedded gradient to be utilized;

it can be understood that, the shadow data corresponding to the embedded gradient to be utilized may be first-class shadow data or second-class shadow data, and the rating distribution based on the corresponding shadow data when being constructed may be determined by utilizing the embedded gradient, and if the embedded gradient to be utilized is the embedded gradient corresponding to the predetermined item included in the first-class prediction result, the corresponding shadow data is the first-class shadow data, and then the rating distribution based on the first-class shadow data when being constructed may be the first-class rating distribution; similarly, if the embedding gradient to be utilized is the embedding gradient corresponding to the predetermined item included in the second type of prediction result, the corresponding shadow data is the second type of shadow data, and the rating distribution based on the second type of shadow data when the second type of shadow data is constructed may be the second type of rating distribution.

And C3, determining the item rating of the predetermined item in the determined rating distribution as a target rating true value corresponding to the predetermined item.

It may be appreciated that the determined rating distribution may be a first type rating distribution or a second type rating distribution, and the item rating of the predetermined item in the determined rating distribution may be used as a target rating truth value corresponding to the predetermined item. For example, if the item rating of the commodity apple in the determined first class rating distribution is interested in the item rating characterization, the item rating of the commodity apple can be determined to be a target rating true value corresponding to the commodity apple.

Therefore, based on the similarity relation of the determined embedding gradients, the embedding gradients to be utilized can be selected, based on the embedding gradients to be utilized, the rating distribution based on the shadow data corresponding to the embedding gradients to be utilized in construction is determined from the first class rating distribution and the second class rating distribution, and is determined to be the target rating true value corresponding to the predetermined item.

S306, determining a prediction result corresponding to target shadow data in the first type of prediction result and the second type of prediction result for each preset item, and determining a candidate prediction result of the preset item from the determined prediction results as a target prediction result of the preset item;

The target shadow data is shadow data constructed based on a target rating true value of the predetermined item.

It may be understood that, since the first type of prediction result and the second type of prediction result include the prediction results of each predetermined item, for each predetermined item, the prediction result corresponding to the target shadow data in the first type of prediction result and the second type of prediction result may be determined, and the target shadow data is the shadow data constructed by the target rating true value of the predetermined item, so that the target shadow data may be the first type of shadow data or the second type of shadow data, and when the shadow data constructed by the target rating true value of the predetermined item is the first type of shadow data, the corresponding prediction result may be determined to be the first type of prediction result, and similarly, when the shadow data constructed by the target rating true value of the predetermined item is the second type of shadow data, the corresponding prediction result may be determined to be the second type of shadow data; based on the determined prediction results, candidate prediction results for the predetermined item may be determined as target prediction results for the predetermined item.

S307, determining the detection result of the privacy data leakage of the federal recommendation system based on the matching relation between the target rating true value and the target prediction result of each preset item.

As can be seen, according to the embodiment of the present application, for each predetermined item, based on a similarity relationship between an embedding gradient corresponding to the predetermined item and an obtained embedding gradient corresponding to each predetermined item in the first-type prediction result and the second-type prediction result, a target rating truth value corresponding to the predetermined item is selected, and based on the target rating truth value corresponding to the predetermined item, a target prediction result of the predetermined item is determined from the first-type prediction result and the second-type prediction result, and for each predetermined item, a detection result of privacy data leakage of the federal recommendation system can be determined based on a matching relationship between the target rating truth value and the target prediction result of the predetermined item, so that the embodiment of the present application may perform effective privacy data leakage detection on the federal recommendation system, thereby improving security of the federal recommendation system.

Optionally, in an implementation manner, the embodiment of the present application further provides a third federal recommendation system-oriented privacy disclosure detection method, as shown in fig. 4, where the method may include:

s401, obtaining embedding gradients corresponding to all preset items;

it can be understood that the server side can obtain the embedded gradient corresponding to each predetermined item trained by the local recommendation model of the user side.

S402, calculating cosine similarity between embedded gradients corresponding to each preset item to obtain a cosine similarity matrix;

it can be understood that, for any two embedded gradients corresponding to predetermined items, the cosine similarity of the two embedded gradients can be calculated, and a cosine similarity matrix can be constructed, and specific calculation formulas are already described above, so that no further description is given here.

S403, analyzing the item ratings of each preset item based on the cosine similarity matrix to obtain two types of rating distribution;

wherein the two types of rating distributions may include a first type of rating distribution and a second type of rating distribution.

It may be appreciated that, based on the cosine similarity matrix, the item ratings of each predetermined item may be analyzed in a predetermined item rating determination manner, and a first type rating distribution and a second type rating distribution may be obtained.

S404, establishing two types of corresponding shadow data based on the two types of rating distribution;

the two types of corresponding shadow data may include a first type of shadow data and a second type of shadow data.

It should be understood that the process of creating the shadow data is also described in the above embodiments, and thus will not be described herein.

S405, training a local recommendation model of the server based on the two types of corresponding shadow data to obtain two types of prediction results;

the two types of prediction results may include a first type of prediction result and a second type of prediction result, where the first type of prediction result and the second type of prediction result each include candidate prediction results of each predetermined item and a corresponding embedded gradient.

S406, determining a detection result of privacy data leakage of the federal recommendation system based on the two types of prediction results.

It can be appreciated that, based on the first type of prediction result and the second type of prediction result, and the obtained embedded gradients corresponding to each predetermined item, a target rating truth value and a target prediction result of each predetermined item can be determined, and a detection result of the privacy data leakage of the federal recommendation system is determined based on a matching relationship between the target rating truth value and the target prediction result of each predetermined item.

Based on the above method embodiment, as shown in fig. 5, the embodiment of the application provides a privacy disclosure detection device for a federal recommendation system, where the federal recommendation system includes a user side and a server side, and the device includes:

an obtaining module 510, configured to obtain an embedding gradient corresponding to each predetermined item; the embedded gradient corresponding to each preset item is generated when the user side trains the local recommendation model according to the training data set of each preset item about the same user;

the estimating module 520 is configured to estimate a first class rating distribution and a second class rating distribution for each predetermined item according to a predetermined item rating determining manner based on the obtained similarity relationship between the embedded gradients corresponding to each predetermined item; the method for determining the item ratings comprises the steps of setting the same item rating for any two items if similar relationships represent similar, setting different item ratings for the similar relationships represent dissimilar, and setting different item ratings in different rating distributions for the same item;

a construction module 530, configured to construct first-type shadow data and second-type shadow data by using the item ratings in the first-type rating distribution and the item ratings in the second-type rating distribution as candidate rating truth values, respectively; wherein the first type shadow data and the second type shadow data are training data sets of each preset item about the same user;

The training module 540 is configured to train the local recommendation model of the server based on the first type shadow data and the second type shadow data, respectively, to obtain a first type prediction result corresponding to the first type shadow data and a second type prediction result corresponding to the second type shadow data; the first type of prediction results and the second type of prediction results comprise candidate prediction results of all preset items and corresponding embedding gradients;

a first determining module 550, configured to determine a target rating truth value and a target prediction result of each predetermined item based on the first type of prediction result and the second type of prediction result, and the obtained embedding gradients corresponding to each predetermined item;

and a second determining module 560, configured to determine a detection result of the privacy data disclosure of the federal recommendation system based on a matching relationship between the target rating truth value and the target prediction result of each predetermined item.

Optionally, the first determining module includes:

a selecting sub-module, configured to determine, for each predetermined item, a similarity between an embedding gradient corresponding to the predetermined item in the first type of prediction result and the obtained embedding gradient corresponding to the predetermined item, and a similarity between an embedding gradient corresponding to the predetermined item in the second type of prediction result and the obtained embedding gradient corresponding to the predetermined item, and select, based on the determined similarity, a target rating truth value corresponding to the predetermined item from an item rating of the predetermined item included in the first type of rating distribution and an item rating of the predetermined item included in the second type of rating distribution;

A first determining submodule, configured to determine, for each predetermined item, a prediction result corresponding to target shadow data in the first-class prediction result and the second-class prediction result, and determine, from the determined prediction results, a candidate prediction result of the predetermined item as a target prediction result of the predetermined item; the target shadow data is shadow data constructed based on a target rating true value of the predetermined item.

Optionally, the selecting submodule is specifically configured to:

based on the determined similarity relation, selecting an embedding gradient with high similarity degree with the obtained embedding gradient corresponding to the preset item from the embedding gradient corresponding to the preset item contained in the first type of prediction result and the embedding gradient corresponding to the preset item contained in the second type of prediction result, and obtaining the embedding gradient to be utilized;

determining the rating distribution based on the shadow data corresponding to the embedded gradient to be utilized in construction from the first class rating distribution and the second class rating distribution; the shadow data corresponding to the embedded gradient to be utilized is shadow data for training to obtain the embedded gradient to be utilized;

And determining the item rating of the predetermined item in the determined rating distribution as a target rating truth value corresponding to the predetermined item.

Optionally, the second determining module includes:

the judging sub-module is used for judging whether a target rating true value corresponding to each preset item is consistent with a target prediction result or not according to each preset item, and obtaining a judging result of the preset item;

and the second determining submodule is used for determining the detection result of the privacy data leakage of the federal recommendation system based on the proportion of the number of the preset items with the judging result with the consistent characterization and the total number of the preset items.

Optionally, the judging submodule is specifically configured to:

judging whether the ratio of the number of the preset items with the consistent characterization judgment results to the total number of the preset items is larger than a preset threshold value or not;

if the detection result is larger than a preset threshold value, determining the first detection result as a detection result of privacy data leakage of the federal recommendation system;

if the detection result is not greater than the preset threshold value, determining a second detection result as a detection result of privacy data leakage of the federal recommendation system;

the first detection result is used for representing a detection result of the federal recommendation system with the privacy data leakage problem, and the second detection result is used for representing a detection result of the federal recommendation system without the privacy data leakage problem.

Optionally, the obtained similarity relationship between the embedded gradients corresponding to the respective predetermined items includes:

the estimating module comprises:

a first setting sub-module, configured to set a first item rating for the reference item, and set, for each other predetermined item, a first item rating for the other predetermined item in response to similarity of the embedding gradient corresponding to the other predetermined item and the similarity of the embedding gradient corresponding to the reference item, and set a second item rating for the other predetermined item in response to dissimilarity of the embedding gradient corresponding to the other predetermined item and the similarity of the embedding gradient corresponding to the reference item, to obtain a first class rating distribution; wherein the first and second item ratings are different item ratings;

and the second setting submodule is used for setting the second item rating for the standard item, setting the second item rating for other preset items according to similar relation representation of the embedding gradients corresponding to the other preset items and the embedding gradients corresponding to the standard item, setting the first item rating for the other preset items according to dissimilar relation representation of the embedding gradients corresponding to the other preset items and the embedding gradients corresponding to the standard item, and obtaining second class rating distribution.

In the technical scheme of the application, the related operations of acquiring, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user are all performed under the condition that the authorization of the user is obtained.

The user data in this embodiment may not reflect personal information of a specific user.

The embodiment of the application also provides an electronic device, as shown in fig. 6, including:

a memory 601 for storing a computer program;

the processor 602 is configured to implement the above-mentioned federal recommendation system-oriented privacy disclosure detection method when executing the program stored in the memory 601.

And the electronic device may further comprise a communication bus and/or a communication interface, through which the processor 602, the communication interface, and the memory 601 communicate with each other.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-ProgrammableGate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment provided herein, a computer readable storage medium is provided, where a computer program is stored, the computer program, when executed by a processor, implementing any one of the above-mentioned federal recommendation system-oriented privacy leak detection methods.

In yet another embodiment provided herein, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform any of the federal recommendation system-oriented privacy leak detection methods of the above embodiments.

In yet another embodiment provided herein, there is further provided a computer program that, when run on a computer, causes the computer to perform any of the above embodiments of the federal recommendation system-oriented privacy leak detection method.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a Solid State Disk (SSD), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modifications, equivalent substitutions, improvements, etc. that are within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. The privacy leakage detection method for the federal recommendation system is characterized in that the federal recommendation system comprises a user side and a server side, and the method comprises the following steps:

2. The method of claim 1, wherein determining the target rating truth value and the target prediction result for each predetermined item based on the first type of prediction result and the second type of prediction result and the obtained embedded gradient corresponding to each predetermined item comprises:

for each preset item, respectively determining a similarity relation between an embedding gradient corresponding to the preset item in a first type of prediction result and the obtained embedding gradient corresponding to the preset item and a similarity relation between the embedding gradient corresponding to the preset item in a second type of prediction result and the obtained embedding gradient corresponding to the preset item, and selecting a target rating true value corresponding to the preset item from the item ratings of the preset item contained in the first type of rating distribution and the item ratings of the preset item contained in the second type of rating distribution based on the determined similarity relation;

for each preset item, determining a prediction result corresponding to target shadow data in the first-type prediction result and the second-type prediction result, and determining a candidate prediction result of the preset item from the determined prediction results as a target prediction result of the preset item; the target shadow data is shadow data constructed based on a target rating true value of the predetermined item.

3. The method of claim 2, wherein selecting the target rating truth value corresponding to the predetermined item from the item ratings of the predetermined item included in the first type of rating distribution and the item ratings of the predetermined item included in the second type of rating distribution based on the determined similarity relationship comprises:

4. The method according to claim 1, wherein the determining the detection result of the privacy data disclosure of the federal recommendation system based on the matching relationship between the target rating truth value and the target prediction result of each predetermined item comprises:

Judging whether a target rating true value corresponding to each preset item is consistent with a target prediction result or not according to each preset item, and obtaining a judgment result of the preset item;

and determining a detection result of the privacy data leakage of the federal recommendation system based on the ratio of the number of the preset items with the consistent characterization judgment result to the total number of the preset items.

5. The method of claim 4, wherein determining the detection result of the leakage of the private data of the federal recommendation system based on the ratio of the number of predetermined items having the determination result of the agreement of the characterization to the total number of the respective predetermined items comprises:

6. The method according to any one of claims 1-5, wherein the acquired similarity relationship between the embedding gradients corresponding to the respective predetermined items comprises:

based on the obtained similarity relation between the embedded gradients corresponding to each predetermined item, estimating a first class rating distribution and a second class rating distribution for each predetermined item according to a predetermined item rating determination mode, wherein the method comprises the following steps:

setting a first item rating for the reference item, setting a first item rating for the other predetermined items according to similarity relation characterization similarity of the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item, and setting a second item rating for the other predetermined items according to similarity relation characterization dissimilarity of the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item, so as to obtain a first class rating distribution; wherein the first and second item ratings are different item ratings;

Setting the second item rating for the reference item, setting the second item rating for the other predetermined items according to similarity relation representation of the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item, setting the first item rating for the other predetermined items according to similarity relation representation of the embedding gradients corresponding to the other predetermined items and the embedding gradients corresponding to the reference item, and obtaining a second class rating distribution.

7. The utility model provides a privacy leakage detection device towards federal recommendation system which characterized in that, federal recommendation system includes user side and service side, the device includes:

8. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method of any of claims 1-6 when executing a program stored on a memory.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-6.

10. A computer program, characterized in that the computer program, when run on a computer, causes the computer to perform the method of any of claims 1-6.