CN116501978A - Recommendation model generation method and device based on privacy protection machine forgetting algorithm - Google Patents

Recommendation model generation method and device based on privacy protection machine forgetting algorithm Download PDF

Info

Publication number
CN116501978A
CN116501978A CN202310774448.8A CN202310774448A CN116501978A CN 116501978 A CN116501978 A CN 116501978A CN 202310774448 A CN202310774448 A CN 202310774448A CN 116501978 A CN116501978 A CN 116501978A
Authority
CN
China
Prior art keywords
model
sample set
recommendation model
data
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310774448.8A
Other languages
Chinese (zh)
Inventor
郑小林
陈超超
李宇渊
刘俊麟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Jinzhita Technology Co ltd
Original Assignee
Hangzhou Jinzhita Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Jinzhita Technology Co ltd filed Critical Hangzhou Jinzhita Technology Co ltd
Priority to CN202310774448.8A priority Critical patent/CN116501978A/en
Publication of CN116501978A publication Critical patent/CN116501978A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Medical Informatics (AREA)
  • Medical Treatment And Welfare Office Work (AREA)
  • Telephonic Communication Services (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application provides a recommendation model generation method and device based on a privacy protection machine forgetting algorithm, wherein the recommendation model generation method based on the privacy protection machine forgetting algorithm comprises the following steps: acquiring a plurality of models, wherein the models are obtained by training an initial recommendation model based on a plurality of sample sets, and the plurality of sample sets are obtained by dividing sample sets to be trained; determining a sample set to be updated in a plurality of sample sets based on data information of the data to be forgotten, deleting the data to be forgotten in the sample set to be updated, and determining a first sample set; training an initial recommendation model based on the first sample set to obtain a first recommendation model; aggregating model parameters of a first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets except for the sample set to be updated; the target recommendation model is generated based on the aggregated model parameters, so that the computational power resources of the model are saved, and the use performance of the model is improved.

Description

Recommendation model generation method and device based on privacy protection machine forgetting algorithm
Technical Field
The application relates to the technical field of computers, in particular to a recommendation model generation method based on a privacy protection machine forgetting algorithm. The application also relates to an item recommendation method, a recommendation model generating device based on a privacy protection machine forgetting algorithm, a computing device and a computer readable storage medium.
Background
Recommendation models are widely used in different scenarios such as online shopping, music recommendation, movie recommendation, etc. In order to accurately and efficiently recommend users, the recommendation model needs to use real data of the users as a training set to improve the usability of the model, but the real data of the users can be always stored in a server side in the mode, so that the risk of revealing private data of the users is increased.
In order to delete the real data of the user or filter the data which has noise and is applied to the model with poor training effect, the model needs to execute forgetting operation, but the structure of the current recommended model does not support the machine forgetting of partial data and can only be realized by completely retraining the model, which brings larger calculation amount and also causes low performance of the recommended model.
Disclosure of Invention
In view of this, the embodiment of the application provides a recommendation model generation method based on a privacy preserving machine forgetting algorithm. The application also relates to an item recommendation method, a recommendation model generating device based on a privacy protection machine forgetting algorithm, a computing device and a computer readable storage medium, so as to solve the problems of large calculation amount of a retraining model and low model performance in the prior art.
According to a first aspect of an embodiment of the present application, there is provided a recommendation model generating method based on a privacy preserving machine forgetting algorithm, including:
obtaining a plurality of models, wherein the models are obtained by training an initial recommendation model based on a plurality of sample sets, and the plurality of sample sets are obtained by dividing sample sets to be trained;
determining a sample set to be updated in the plurality of sample sets based on data information of the data to be forgotten, deleting the data to be forgotten in the sample set to be updated, and determining a first sample set;
training the initial recommendation model based on the first sample set to obtain a first recommendation model;
aggregating model parameters of the first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets except the sample set to be updated;
and generating a target recommendation model based on the aggregated model parameters.
According to a second aspect of the embodiments of the present application, there is provided an item recommendation method, including:
acquiring user information of a target user;
inputting the user information into a target recommendation model to obtain project recommendation information for the target user, wherein the target recommendation model is obtained by using the recommendation model generation method based on the privacy protection machine forgetting algorithm.
According to a third aspect of the embodiments of the present application, there is provided a recommendation model generating device based on a privacy preserving machine forgetting algorithm, including:
a model acquisition module configured to acquire a plurality of models, wherein the plurality of models are obtained by training an initial recommendation model based on a plurality of sample sets obtained by dividing a sample set to be trained;
the sample set updating module is configured to determine a sample set to be updated in the plurality of sample sets based on data information of the data to be forgotten, delete the data to be forgotten in the sample set to be updated and determine a first sample set;
the model training module is configured to train the initial recommendation model based on the first sample set to obtain a first recommendation model;
a parameter aggregation module configured to aggregate model parameters of the first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets except the sample set to be updated;
the model generation module is configured to generate a target recommendation model based on the aggregated model parameters.
According to a third aspect of embodiments of the present application, there is provided a computing device including a memory, a processor, and computer instructions stored on the memory and executable on the processor, the processor implementing the steps of the recommendation model generation method based on a privacy preserving machine forgetting algorithm when executing the computer instructions.
According to a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the recommendation model generation method based on a privacy preserving machine forgetting algorithm.
According to the recommendation model generation method based on the privacy protection machine forgetting algorithm, a plurality of models are obtained, wherein the models are obtained by training an initial recommendation model based on a plurality of sample sets, and the plurality of sample sets are obtained by dividing sample sets to be trained; determining a sample set to be updated in the plurality of sample sets based on data information of the data to be forgotten, deleting the data to be forgotten in the sample set to be updated, and determining a first sample set; training the initial recommendation model based on the first sample set to obtain a first recommendation model; aggregating model parameters of the first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets except the sample set to be updated; and generating a target recommendation model based on the aggregated model parameters.
According to the embodiment of the application, the plurality of models are respectively trained on the initial recommendation model by acquiring the plurality of sample sets, after receiving the data information of the data to be forgotten, the data to be forgotten can be deleted from the sample set where the data to be forgotten is located, the first sample set is obtained, and further, only the model corresponding to the first sample set from which the data to be forgotten is deleted is retrained, so that the problem that the initial recommendation model is required to be retrained due to updating of the sample to be trained is avoided, and great calculation force waste is brought; in addition, by aggregating the model parameters of the retrained recommended model and the model parameters of the original untrained model, a new target recommended model after machine forgetting processing can be constructed, so that the calculation power resource of the model is saved, and the generation efficiency and the use performance of the recommended model are improved.
Drawings
Fig. 1 is a schematic architecture diagram of a recommendation model generating method based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application;
FIG. 2 is a flowchart of a recommendation model generation method based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application;
FIG. 3 is a schematic model generation diagram of a recommendation model generation method based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for recommending items according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a recommendation model generating device based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application;
FIG. 6 is a block diagram of a computing device according to one embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is, however, susceptible of embodiment in many other ways than those herein described and similar generalizations can be made by those skilled in the art without departing from the spirit of the application and the application is therefore not limited to the specific embodiments disclosed below.
The terminology used in one or more embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of one or more embodiments of the application. As used in this application in one or more embodiments and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present application refers to and encompasses any or all possible combinations of one or more of the associated listed items.
It should be understood that, although the terms first, second, etc. may be used in one or more embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first may also be referred to as a second, and similarly, a second may also be referred to as a first, without departing from the scope of one or more embodiments of the present application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.
First, terms related to one or more embodiments of the present application will be explained.
Machine forgetting: user data is deleted, noise is deleted, or corrupted training data is deleted in machine learning.
Privacy protection: it means that the information which is unwilling to be known by outsiders, such as individuals or groups, is protected. The privacy is widely contained, and one important type of privacy is personal identity information, namely, the information can be directly or indirectly used for tracing to a person through connection inquiry; privacy, for the purposes of the collective, generally refers to sensitive information representing various actions of a community.
With the popularity of big data systems, data-driven recommendation systems have brought tremendous commercial value, while people are increasingly focusing on privacy protection issues. In order to protect the "forgotten rights" of the user, the system needs to delete the data of the corresponding user from the recommendation model. However, as the number of model parameters increases, training data in the model is directly deleted and the entire model is retrained, which results in a great amount of wasted computation.
In order to solve this problem, the embodiment of the present specification proposes a recommendation learning machine forgetting algorithm based on deep clustering and attention mechanism, in which, in order to delete user data from a recommendation model, a route to delete part of the user data from a data set and retrain is selected in the embodiment, so as to reduce the retrain cost.
In the implementation, because the collaborative information among users in the recommendation system is key information, in the embodiment, the users are classified in a deep clustering mode to keep the collaborative property, different users are classified into a plurality of categories, and a plurality of corresponding sub-models are respectively trained in each category. When a user makes a data deleting request, the system quickly locates the category to which the user belongs through the cache information, only the user information needs to be deleted in the category, and the rest of user data is used for retraining to obtain a deleted submodel. When the system is used for recommending prediction, the neural network is trained through the attention mechanism to aggregate the prediction results of all the submodels, and then a final result can be obtained.
In the present application, a recommendation model generating method based on a privacy preserving machine forgetting algorithm is provided, and the present application relates to an item recommendation method, a recommendation model generating device based on a privacy preserving machine forgetting algorithm, a computing device, and a computer-readable storage medium, which are described in detail in the following embodiments one by one.
Fig. 1 shows a schematic architecture diagram of a recommendation model generation method based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application.
It should be noted that the recommendation model provided in this embodiment may be applied to various application scenarios, including but not limited to, article recommendation scenarios of e-commerce platforms, movie, book and other recommendation scenarios; in order to protect the privacy of the user, the machine forgetting operation needs to be executed on the trained recommendation model, so that on one hand, the application performance of the recommendation model can be improved, and on the other hand, the privacy data of the user can be protected.
In practical application, the initial recommendation model can be trained according to the sample set to be trained, but because the training data amount in the sample set to be trained is increased, the cost of training the initial recommendation model is gradually increased, so that in order to forget the privacy data of some users in the initial recommendation model, the cost of retraining the whole initial recommendation model is reduced, the sample set to be trained is divided into a plurality of sample sets according to users, as shown in fig. 1, including sample set 1 and sample set 2. Further, after receiving the data information of the data to be forgotten, determining a sample set in which the data to be forgotten is located, such as sample set 1, deleting the data to be forgotten in sample set 1 to obtain a sample set to be updated, and retraining an initial recommendation model by using the sample set to be updated to obtain a first recommendation model; finally, the model parameters in the first recommended model and the model parameters of the model 2-model n which do not participate in retraining are acquired, so that the aggregation processing of the model parameters is completed, the target recommended model is obtained again, and further, in the target recommended model, the model forgets the data to be forgotten, meanwhile, the retraining cost is only in the model training corresponding to the sample set 1, and training models corresponding to all training samples are not involved, so that the model training cost is reduced to a great extent, and the use performance of the recommended model is improved.
In summary, according to the recommended model generation method based on the privacy protection machine forgetting algorithm provided by the embodiment of the specification, the model is retrained based on the updated sample set by locating the sample set to which the data to be forgotten belongs and updating the sample set, so that the model parameters of the retrained model are obtained.
Fig. 2 shows a flowchart of a recommendation model generation method based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application, which specifically includes the following steps:
it should be noted that, the recommendation model generating method based on the privacy protection machine forgetting algorithm provided in the embodiment can be applied to an end side device and a cloud side device, and the embodiment does not limit an execution subject; the specific application scene can be applied to recommendation systems for recommending articles, services, movies, books and the like.
Step 202: a plurality of models are obtained, wherein the models are obtained by training an initial recommended model based on a plurality of sample sets, and the plurality of sample sets are obtained by dividing sample sets to be trained.
The multiple models may be understood as models generated by training the initial recommendation model according to multiple sample sets, and the models may be used for recommendation tasks of various application scenarios, which is not limited in this embodiment.
In practical application, the execution subject may divide the sample set to be trained to obtain a plurality of sample sets, and train the initial recommendation model based on the plurality of sample sets to obtain a plurality of models, where the plurality of models may be understood as recommendation models trained according to the small-scale training data.
Further, since the collaborative information among users in the recommendation model is key information, a plurality of models trained in the embodiment can be obtained by training a plurality of sample sets divided according to the user information in the sample set to be trained; specifically, before the obtaining the plurality of models, the method further includes:
acquiring a sample set to be trained;
dividing the sample set to be trained based on the user information in the sample set to be trained to obtain a plurality of sample sets;
training the initial recommendation model based on the plurality of sample sets respectively to obtain a plurality of models.
The sample set to be trained can be understood as a sample set for training a recommendation model, and the sample set can represent user information, project information and information for recommending projects for users (namely, association information between users and projects).
In practical application, the execution main body can acquire a sample set to be trained, determine user information in the sample set to be trained, divide the sample set to be trained by classifying the user information to obtain a plurality of sample sets, and further train the initial recommendation model by utilizing the plurality of sample sets to obtain a plurality of models; it should be noted that, because the user information in the sample set to be trained has cooperativity in each item information, and the recommendation model recommends corresponding items for the user, the sample set to be trained can be divided by taking the user information as a division standard, so as to obtain a plurality of sample sets.
In addition, in some optional embodiments, the recommended project information may be used as a dividing standard, where the sample set to be trained is divided, so as to divide a large-scale data set, which is a sample set to be trained, into a plurality of small-scale data sets, and meanwhile, the data in each small-scale data set has a corresponding commonality, so that the data can be applied to various application scenarios, which is not limited in this embodiment.
Furthermore, in this embodiment, a deep clustering manner may be used to classify users and divide a plurality of sample sets; specifically, the dividing the sample to be trained based on the user information in the sample set to be trained to obtain a plurality of sample sets includes:
Performing format conversion on the data content in the sample set to be trained, and determining an interaction matrix vector;
based on the user information in the sample set to be trained, clustering the interaction matrix vectors to obtain a plurality of sample sets;
wherein the sample set includes a user information vector, a project information vector, and an association information vector between the user information vector and the project information vector.
The interaction matrix vector can be understood to be in the form of an interaction matrix, wherein the transverse vector representation in the matrix can be user information, and the longitudinal vector representation can be project information.
In practical application, the execution body may perform format conversion on the data content in the sample set to be trained, and process the data content into an interaction matrix form, namely an interaction matrix vector, and further perform deep clustering on the interaction matrix vector according to user information in the sample set to be trained to obtain a plurality of sample sets, where the sample sets may include a user information vector, a project information vector, and an associated information vector between the user information vector and the project information vector.
It should be noted that, because the data scale of the sample set to be trained is larger, the information represented in the corresponding interaction matrix vector is denser, and the information represented in the sample set after the sample set to be trained is divided is relatively sparser.
In this embodiment, after the sample set to be trained is divided, the large-scale training data is divided into a plurality of small-scale training data, so that the subsequent efficiency of retraining the model by using the small-scale training data is facilitated.
Step 204: and determining a sample set to be updated in the plurality of sample sets based on the data information of the data to be forgotten, deleting the data to be forgotten in the sample set to be updated, and determining a first sample set.
The data to be forgotten can be understood as data content which needs to be forgotten by a machine in a model network, wherein the data content is data input by a model, such as personal information of a user, item information of interest to the user and the like.
In practical application, the execution body may obtain data information corresponding to the data to be forgotten, where the data information may represent a data type, a data identifier, or a time of inputting the data into the model, etc., which is not limited in this embodiment, and may be understood as that a sample set to which the data to be forgotten belongs may be located according to the data information; further, according to the data information, positioning a sample set to which the data to be forgotten belongs in a plurality of sample sets to determine the sample set to be updated; further, the data to be forgotten is deleted in the sample set to be updated to determine a first sample set, wherein the first sample set can be understood as the sample set from which the data to be forgotten is deleted.
It should be noted that, the number of the sample sets to which the data to be forgotten belongs is not limited in this embodiment, and accordingly, after deleting the data to be forgotten in each sample set, a plurality of sample sets to be updated may also be obtained, and a plurality of first sample sets may be obtained, which are associated with the data size and content of the data to be forgotten, which is not limited in this embodiment.
Further, when the execution subject acquires the data to be forgotten, the execution subject may acquire the data according to a forgetting request of the user, or may automatically determine noise data according to a model to request forgetting, which is not limited in this embodiment; specifically, the determining the sample set to be updated in the plurality of sample sets based on the data information of the data to be forgotten includes:
responding to a data forgetting instruction, and acquiring data information of data to be forgotten;
and determining the sample set in which the data to be forgotten is located as the sample set to be updated based on the data information in the plurality of sample sets.
In practical application, the execution main body can respond to a data forgetting instruction, wherein the data forgetting instruction can be understood as a request forgetting instruction aiming at data to be forgotten, and it is to be noted that the data forgetting instruction can be triggered based on a forgetting request of a user or based on a filtering requirement of noise data in a model, and the embodiment is not limited to the data forgetting instruction; further, data information of data to be forgotten in the data forgetting instruction is obtained, a sample set in which the data to be forgotten is located is determined in a plurality of sample sets according to the data information, and the sample set is determined as a sample set to be updated.
Finally, deleting the data to be forgotten in the sample set to be updated to obtain a first sample set, wherein the first sample set is the sample set after data filtering is completed, so that a subsequent model can learn forgotten training content according to the first sample set.
Step 206: and training the initial recommendation model based on the first sample set to obtain a first recommendation model.
In practical application, after determining the first sample set, the executing body can retrain the initial recommendation model according to the first sample set to obtain a first recommendation model; it should be noted that, before deleting the data to be forgotten, the recommendation model is trained according to the sample set to be updated, and the data content to be forgotten is already learned in the recommendation model, so in order to make the model implement machine forgetting, the initial recommendation model can be retrained according to the first sample set for deleting the data to be forgotten, so as to obtain the first recommendation model, and at this time, the first recommendation model has implemented the machine forgetting process.
Step 208: and aggregating model parameters of the first recommended model and model parameters of a second recommended model, wherein the second recommended model is a model trained by other sample sets except the sample set to be updated.
It should be noted that, the second recommended model may be understood as a model trained by a sample set other than the sample set to be updated, that is, a model trained by a sample set in which there is no data to be forgotten; meanwhile, the training process of the second recommendation model is completed before the model realizes machine forgetting, so that model parameters can be conveniently and directly extracted from the second recommendation model.
In specific implementation, the embodiment also provides a method for aggregating model parameters, and the attention mechanism can be utilized to learn the aggregation weight; specifically, the aggregating the first model parameter and the second model parameter in the second recommendation model includes:
extracting model parameters in the second recommendation model to obtain second model parameters;
the first model parameters and the second model parameters are aggregated based on an attention mechanism.
In practical application, the executing body can extract model parameters in the second recommendation model, determine the second model parameters, train the neural network through the attention mechanism, and aggregate the first model parameters and the second model parameters so as to obtain the recommendation model with good recommendation effect.
Furthermore, in this embodiment, for the aggregation part of the model parameters, after the user information vector and the project information vector are aggregated, the vector weights between the user information and the project information are aggregated; specifically, the aggregating the first model parameter and the second model parameter based on the attention mechanism includes:
extracting a first user embedded vector and a first item embedded vector in the first model parameters, and extracting a second user embedded vector and a second item embedded vector in the second model parameters;
aggregating the first user embedded vector and the second user embedded vector to determine a user embedded weight;
aggregating the first item embedding vector and the second item embedding vector to determine an item embedding weight;
based on an attention mechanism, the user embedding weights and the item embedding weights are aggregated.
It should be noted that, the first model parameters in the first recommendation model may include a first user embedded vector, a first item embedded vector, and correspondingly, the second model parameters in the second recommendation model may include a second user embedded vector and a second item embedded vector.
In practical application, extracting user embedded vectors and project embedded vectors in each model parameter respectively, carrying out weighted average processing on each user embedded vector to obtain user embedded weights, and carrying out weighted average processing on each project embedded vector to obtain project embedded weights; finally, the user embedding weight and the project embedding weight are aggregated through an attention mechanism.
In this embodiment, after the user embedding weight and the item embedding weight are aggregated by the attention mechanism, model parameters of the recommendation model after the forgetting request is executed can be obtained, so as to obtain a new recommendation model.
Step 210: and generating a target recommendation model based on the aggregated model parameters.
Further, the execution entity may generate the target recommendation model according to the aggregated model parameters after obtaining the aggregated model parameters, and it should be noted that the target recommendation model may be constructed as a learning network, and in some embodiments, the learning network may include a plurality of networks, where each network may be a multi-layer neural network, and may be composed of a large number of neurons, and the aggregated model parameters may be understood as parameters of neurons in these networks and may be collectively referred to as parameters of the recommendation model.
In summary, according to the recommendation model generation method based on the privacy protection machine forgetting algorithm provided by the embodiment of the specification, the user's cooperativity in the recommendation model is effectively reserved according to the design characteristics of the recommendation model, the sample set to be trained is divided into a plurality of sample sets based on the user information, so that the sample set where the forgetting data is located is conveniently updated, the recommendation model is retrained, model parameters of the newly trained recommendation model and model parameters of the model which are not retrained are subsequently retrained, a new target recommendation model after machine forgetting processing is built, the calculation power resource of the model is saved, and the generation efficiency and the service performance of the recommendation model are improved.
With reference to fig. 3, fig. 3 shows a schematic model generation diagram of a recommendation model generation method based on a privacy preserving machine forgetting algorithm according to an embodiment of the present application.
The model generation process in fig. 3 may include two phases, namely a training phase and a forgetting phase, wherein the training phase may be understood as a process of training a plurality of recommended models according to classification of a sample set to be trained, and the forgetting phase may be understood as a process of retraining a part of recommended models according to a forgetting request; for ease of understanding, in this embodiment, the recommendation model is used to recommend items to the user.
In practical application, in the training stage, the execution body may convert the sample set to be trained into the form of an interaction matrix, and Deep-learn a clustering algorithm, such as Deep Autoencoder algorithm, to Deep aggregate the interaction matrix according to the user to obtain a plurality of sub-data sets, such as three sub-data sets illustrated in FIG. 3, where each sub-data set includes user information and piece-wise article information (such as piece 1, piece 2, and piece 3 illustrated in FIG. 3); further, each sub-data set participates in the training process of the initial recommendation model, and a plurality of sub-models can be obtained respectively, wherein each sub-model comprises user embedding and article embedding, and then the weight of the user embedding and the weight of the article embedding are trained through an attention mechanism, so that model parameters of prediction scores (such as a schematic of lower left prediction scores in fig. 3) are obtained by aggregating the user embedding and the article embedding.
Further, in the forgetting stage, the user makes a forgetting request, and the user may position the corresponding sub-data set according to the data information of the data to be forgotten in the forgetting request, for example, the data information is positioned to the first sub-data set in fig. 3, that is, the corresponding article embedding vector of the sub-data set slice 1 needs to be retrained, while the sub-data set slices 2 and 3 do not need to be retrained, and the user embedding and the article embedding of the original training can be directly obtained, and further, the user embedding weight and the article embedding weight need to be re-aggregated according to the weight so as to obtain the model parameters of the new prediction score (as shown in the lower right prediction score in fig. 3).
In summary, in the recommendation model generation method based on the privacy protection machine forgetting algorithm provided in the embodiment, by separating the training process of the training stage and the forgetting stage, the training process of the forgetting stage is ensured, a small part of training data is involved in retraining, so that the calculation power waste caused by retraining the model by all training data of the training stage is avoided, and the generation of the recommendation model after machine forgetting can be rapidly realized.
Fig. 4 shows a flowchart of a method for recommending items according to an embodiment of the present application, specifically including the following steps:
Note that, the item recommendation method provided in this embodiment may be applied to an item recommendation scene, a book recommendation scene, and the like, which is not limited in this embodiment.
Step 402: and acquiring user information of the target user.
The user information may be understood as information identifying the target user, such as ID information of the user in the platform, etc.
In practical application, the execution body can acquire the user information of the target user so as to acquire the information associated with the target user, and reasonably and efficiently recommend the project for the target user.
Step 404: inputting the user information into a target recommendation model to obtain project recommendation information for the target user, wherein the target recommendation model is obtained by the recommendation model generation method based on the privacy protection machine forgetting algorithm.
In practical application, the execution body may further input user information into the target recommendation model, so as to obtain item recommendation information recommended to the target user and output by the target recommendation model, and display the item recommendation information as the target user.
It should be noted that, the target recommendation model may be understood as a recommendation model for protecting user privacy, and meanwhile, the model has a machine forgetting capability, and a specific model generating process may refer to the description of the target recommendation model generating process in the above embodiment; in addition, after the target user sends a forgetting request for the user information of the target user, the position of a training sample set of the target recommendation model to which the user information belongs can be determined, and then a small part of sub-models of the model are retrained to obtain the target recommendation model for forgetting the user information again, which is not described in detail in this embodiment too much.
In summary, on the basis of the generation of the target recommendation model, user information is input into the target recommendation model, so that a great amount of computing power resources are saved in the generation process of the target recommendation model, and the project recommendation result with higher rationality can be ensured to be obtained.
Corresponding to the method embodiment, the present application further provides an embodiment of a recommendation model generating device based on a privacy protection machine forgetting algorithm, and fig. 5 shows a schematic structural diagram of the recommendation model generating device based on the privacy protection machine forgetting algorithm provided in an embodiment of the present application. As shown in fig. 5, the apparatus includes:
a model acquisition module 502 configured to acquire a plurality of models, wherein the plurality of models are obtained by training an initial recommendation model based on a plurality of sample sets obtained by dividing a sample set to be trained;
a sample set updating module 504 configured to determine a sample set to be updated in the plurality of sample sets based on data information of the data to be forgotten, and delete the data to be forgotten in the sample set to be updated, to determine a first sample set;
a model training module 506 configured to train the initial recommendation model based on the first sample set to obtain a first recommendation model;
A parameter aggregation module 508 configured to aggregate model parameters of the first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets than the sample set to be updated;
the model generation module 510 is configured to generate a target recommendation model based on the aggregated model parameters.
Optionally, the sample set updating module 504 is further configured to:
responding to a data forgetting instruction, and acquiring data information of data to be forgotten;
and determining the sample set in which the data to be forgotten is located as the sample set to be updated based on the data information in the plurality of sample sets.
Optionally, the apparatus further comprises:
the initial model training module is configured to acquire a sample set to be trained;
dividing the sample set to be trained based on the user information in the sample set to be trained to obtain a plurality of sample sets;
training the initial recommendation model based on the plurality of sample sets respectively to obtain a plurality of models.
Optionally, the sample set updating module 504 is further configured to:
performing format conversion on the data content in the sample set to be trained, and determining an interaction matrix vector;
Based on the user information in the sample set to be trained, clustering the interaction matrix vectors to obtain a plurality of sample sets;
wherein the sample set includes a user information vector, a project information vector, and an association information vector between the user information vector and the project information vector.
Optionally, the parameter aggregation module 508 is further configured to:
extracting model parameters in the second recommendation model to obtain second model parameters;
the first model parameters and the second model parameters are aggregated based on an attention mechanism.
Optionally, the parameter aggregation module 508 is further configured to:
extracting a first user embedded vector and a first item embedded vector in the first model parameters, and extracting a second user embedded vector and a second item embedded vector in the second model parameters;
aggregating the first user embedded vector and the second user embedded vector to determine a user embedded weight;
aggregating the first item embedding vector and the second item embedding vector to determine an item embedding weight;
based on an attention mechanism, the user embedding weights and the item embedding weights are aggregated.
According to the recommendation model generating device based on the privacy protection machine forgetting algorithm, the plurality of models are respectively trained on the initial recommendation model by acquiring the plurality of sample sets, after the data information of the data to be forgotten is received, the data to be forgotten can be deleted in the sample set where the data to be forgotten is located, the first sample set is obtained, further, only the model corresponding to the first sample set where the data to be forgotten is deleted is retrained, and the situation that the initial recommendation model is required to be retrained due to updating of the sample to be trained is avoided, and great calculation power waste is brought; in addition, by aggregating the model parameters of the retrained recommended model and the model parameters of the original untrained model, a new target recommended model after machine forgetting processing can be constructed, so that the calculation power resource of the model is saved, and the generation efficiency and the use performance of the recommended model are improved.
The above is a schematic scheme of a recommendation model generating device based on a privacy preserving machine forgetting algorithm in this embodiment. It should be noted that, the technical solution of the recommendation model generating device based on the privacy protection machine forgetting algorithm and the technical solution of the recommendation model generating method based on the privacy protection machine forgetting algorithm belong to the same concept, and details of the technical solution of the recommendation model generating device based on the privacy protection machine forgetting algorithm, which are not described in detail, can be referred to the description of the technical solution of the recommendation model generating method based on the privacy protection machine forgetting algorithm.
Fig. 6 illustrates a block diagram of a computing device 600 provided in accordance with an embodiment of the present application. The components of computing device 600 include, but are not limited to, memory 610 and processor 620. The processor 620 is coupled to the memory 610 via a bus 630 and a database 650 is used to hold data.
Computing device 600 also includes access device 640, access device 640 enabling computing device 600 to communicate via one or more networks 660. Examples of such networks include public switched telephone networks (PSTN, public Switched Telephone Network), local area networks (LAN, local Area Network), wide area networks (WAN, wide Area Network), personal area networks (PAN, personal Area Network), or combinations of communication networks such as the internet. The access device 640 may include one or more of any type of network interface, wired or wireless, such as a network interface card (NIC, network interface controller), such as an IEEE802.11 wireless local area network (WLAN, wireless Local Area Network) wireless interface, a worldwide interoperability for microwave access (Wi-MAX, worldwide Interoperability for Microwave Access) interface, an ethernet interface, a universal serial bus (USB, universal Serial Bus) interface, a cellular network interface, a bluetooth interface, a near field communication (NFC, near Field Communication) interface, and so forth.
In one embodiment of the present application, the above-described components of computing device 600, as well as other components not shown in FIG. 6, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device illustrated in FIG. 6 is for exemplary purposes only and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as desired.
Computing device 600 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smart phone), wearable computing device (e.g., smart watch, smart glasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or personal computer (PC, personal Computer). Computing device 600 may also be a mobile or stationary server.
Wherein the processor 620 executes the computer instructions to implement the steps of the recommendation model generation method based on the privacy preserving machine forgetting algorithm.
The foregoing is a schematic illustration of a computing device of this embodiment. It should be noted that, the technical solution of the computing device and the technical solution of the above-mentioned recommendation model generation method based on the privacy protection machine forgetting algorithm belong to the same conception, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the above-mentioned recommendation model generation method based on the privacy protection machine forgetting algorithm.
An embodiment of the present application also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, implement the steps of a recommendation model generation method based on a privacy preserving machine forgetting algorithm as described above.
The above is an exemplary version of a computer-readable storage medium of the present embodiment. It should be noted that, the technical solution of the storage medium and the technical solution of the above-mentioned recommendation model generation method based on the privacy protection machine forgetting algorithm belong to the same concept, and details of the technical solution of the storage medium which are not described in detail can be referred to the description of the technical solution of the above-mentioned recommendation model generation method based on the privacy protection machine forgetting algorithm.
The foregoing describes specific embodiments of the present application. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
The computer instructions include computer program code that may be in source code form, object code form, executable file or some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium contains content that can be appropriately scaled according to the requirements of jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is subject to legislation and patent practice, the computer readable medium does not include electrical carrier signals and telecommunication signals.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are all expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily all necessary for the present application.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
The above-disclosed preferred embodiments of the present application are provided only as an aid to the elucidation of the present application. Alternative embodiments are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the teaching of this application. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. This application is to be limited only by the claims and the full scope and equivalents thereof.

Claims (10)

1. A recommendation model generation method based on a privacy protection machine forgetting algorithm is characterized by comprising the following steps:
obtaining a plurality of models, wherein the models are obtained by training an initial recommendation model based on a plurality of sample sets, and the plurality of sample sets are obtained by dividing sample sets to be trained;
determining a sample set to be updated in the plurality of sample sets based on data information of the data to be forgotten, deleting the data to be forgotten in the sample set to be updated, and determining a first sample set;
Training the initial recommendation model based on the first sample set to obtain a first recommendation model;
aggregating model parameters of the first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets except the sample set to be updated;
and generating a target recommendation model based on the aggregated model parameters.
2. The method of claim 1, wherein the determining a sample set to be updated among the plurality of sample sets based on the data information of the data to be forgotten comprises:
responding to a data forgetting instruction, and acquiring data information of data to be forgotten;
and determining the sample set in which the data to be forgotten is located as the sample set to be updated based on the data information in the plurality of sample sets.
3. The method of claim 1, wherein prior to the obtaining the plurality of models, further comprising:
acquiring a sample set to be trained;
dividing the sample set to be trained based on the user information in the sample set to be trained to obtain a plurality of sample sets;
training the initial recommendation model based on the plurality of sample sets respectively to obtain a plurality of models.
4. A method according to claim 3, wherein the dividing the samples to be trained based on the user information in the samples to be trained to obtain a plurality of sample sets comprises:
performing format conversion on the data content in the sample set to be trained, and determining an interaction matrix vector;
based on the user information in the sample set to be trained, clustering the interaction matrix vectors to obtain a plurality of sample sets;
wherein the sample set includes a user information vector, a project information vector, and an association information vector between the user information vector and the project information vector.
5. The method of claim 1, wherein the aggregating the second model parameters in the first model parameters and the second recommendation model comprises:
extracting model parameters in the second recommendation model to obtain second model parameters;
the first model parameters and the second model parameters are aggregated based on an attention mechanism.
6. The method of claim 5, wherein the aggregating the first model parameters and the second model parameters based on an attention mechanism comprises:
Extracting a first user embedded vector and a first item embedded vector in the first model parameters, and extracting a second user embedded vector and a second item embedded vector in the second model parameters;
aggregating the first user embedded vector and the second user embedded vector to determine a user embedded weight;
aggregating the first item embedding vector and the second item embedding vector to determine an item embedding weight;
based on an attention mechanism, the user embedding weights and the item embedding weights are aggregated.
7. A method of recommending items, comprising:
acquiring user information of a target user;
inputting the user information into a target recommendation model to obtain project recommendation information for the target user, wherein the target recommendation model is obtained by using the recommendation model generation method based on the privacy protection machine forgetting algorithm according to any one of claims 1-6.
8. A recommendation model generating device based on a privacy preserving machine forgetting algorithm, comprising:
a model acquisition module configured to acquire a plurality of models, wherein the plurality of models are obtained by training an initial recommendation model based on a plurality of sample sets obtained by dividing a sample set to be trained;
The sample set updating module is configured to determine a sample set to be updated in the plurality of sample sets based on data information of the data to be forgotten, delete the data to be forgotten in the sample set to be updated and determine a first sample set;
the model training module is configured to train the initial recommendation model based on the first sample set to obtain a first recommendation model;
a parameter aggregation module configured to aggregate model parameters of the first recommendation model and model parameters of a second recommendation model, wherein the second recommendation model is a model trained by other sample sets except the sample set to be updated;
the model generation module is configured to generate a target recommendation model based on the aggregated model parameters.
9. A computing device comprising a memory, a processor, and computer instructions stored on the memory and executable on the processor, wherein the processor, when executing the computer instructions, performs the steps of the method of any one of claims 1-7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the steps of the method of any one of claims 1-7.
CN202310774448.8A 2023-06-28 2023-06-28 Recommendation model generation method and device based on privacy protection machine forgetting algorithm Pending CN116501978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310774448.8A CN116501978A (en) 2023-06-28 2023-06-28 Recommendation model generation method and device based on privacy protection machine forgetting algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310774448.8A CN116501978A (en) 2023-06-28 2023-06-28 Recommendation model generation method and device based on privacy protection machine forgetting algorithm

Publications (1)

Publication Number Publication Date
CN116501978A true CN116501978A (en) 2023-07-28

Family

ID=87328843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310774448.8A Pending CN116501978A (en) 2023-06-28 2023-06-28 Recommendation model generation method and device based on privacy protection machine forgetting algorithm

Country Status (1)

Country Link
CN (1) CN116501978A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094012A (en) * 2024-03-26 2024-05-28 佛山的度云企业管理有限公司 Information recommendation method and device based on privacy protection

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635204A (en) * 2018-12-21 2019-04-16 上海交通大学 Online recommender system based on collaborative filtering and length memory network
CN112183652A (en) * 2020-10-09 2021-01-05 浙江工业大学 Edge end bias detection method under federated machine learning environment
CN113362160A (en) * 2021-06-08 2021-09-07 南京信息工程大学 Federal learning method and device for credit card anti-fraud
US20220261633A1 (en) * 2021-02-15 2022-08-18 Actimize Ltd. Training a machine learning model using incremental learning without forgetting
CN115098771A (en) * 2022-06-09 2022-09-23 阿里巴巴(中国)有限公司 Recommendation model updating method, recommendation model training method and computing device
CN115660783A (en) * 2022-11-07 2023-01-31 中国联合网络通信集团有限公司 Model training method, commodity recommendation method, device, equipment and medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635204A (en) * 2018-12-21 2019-04-16 上海交通大学 Online recommender system based on collaborative filtering and length memory network
CN112183652A (en) * 2020-10-09 2021-01-05 浙江工业大学 Edge end bias detection method under federated machine learning environment
US20220261633A1 (en) * 2021-02-15 2022-08-18 Actimize Ltd. Training a machine learning model using incremental learning without forgetting
CN113362160A (en) * 2021-06-08 2021-09-07 南京信息工程大学 Federal learning method and device for credit card anti-fraud
CN115098771A (en) * 2022-06-09 2022-09-23 阿里巴巴(中国)有限公司 Recommendation model updating method, recommendation model training method and computing device
CN115660783A (en) * 2022-11-07 2023-01-31 中国联合网络通信集团有限公司 Model training method, commodity recommendation method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
朱妮;: "区分长短期兴趣的用户动态推荐模型研究", 合作经济与科技, no. 10 *
王刚;王含茹;胡可;贺曦冉;: "任务推荐中考虑任务关联度与时间因素的改进OCCF方法", 计算机科学, no. 07 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118094012A (en) * 2024-03-26 2024-05-28 佛山的度云企业管理有限公司 Information recommendation method and device based on privacy protection

Similar Documents

Publication Publication Date Title
CN110334201B (en) Intention identification method, device and system
CN111931062A (en) Training method and related device of information recommendation model
CN111931002B (en) Matching method and related equipment
CN111382868A (en) Neural network structure search method and neural network structure search device
JP6029041B2 (en) Face impression degree estimation method, apparatus, and program
CN111898703B (en) Multi-label video classification method, model training method, device and medium
US11423307B2 (en) Taxonomy construction via graph-based cross-domain knowledge transfer
CN110929836B (en) Neural network training and image processing method and device, electronic equipment and medium
CN113656699B (en) User feature vector determining method, related equipment and medium
CN116501978A (en) Recommendation model generation method and device based on privacy protection machine forgetting algorithm
CN113705598A (en) Data classification method and device and electronic equipment
CN116310318A (en) Interactive image segmentation method, device, computer equipment and storage medium
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN116152938A (en) Method, device and equipment for training identity recognition model and transferring electronic resources
CN117726884A (en) Training method of object class identification model, object class identification method and device
CN116541712B (en) Federal modeling method and system based on non-independent co-distributed data
CN115223214A (en) Identification method of synthetic mouth-shaped face, model acquisition method, device and equipment
CN115098771B (en) Recommendation model updating method, recommendation model training method and computing equipment
CN116910357A (en) Data processing method and related device
CN113688421B (en) Prediction model updating method and device based on privacy protection
CN116957128A (en) Service index prediction method, device, equipment and storage medium
CN118035800A (en) Model training method, device, equipment and storage medium
CN111079013B (en) Information recommendation method and device based on recommendation model
CN111552846B (en) Method and device for identifying suspicious relationships
CN111274856A (en) Face recognition method and device, computing equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230728

RJ01 Rejection of invention patent application after publication