CN115630297A

CN115630297A - Model training method and related equipment

Info

Publication number: CN115630297A
Application number: CN202211182296.4A
Authority: CN
Inventors: 戴全宇; 王磊; 陈旭; 董振华; 唐睿明
Original assignee: Huawei Technologies Co Ltd; Renmin University of China
Current assignee: Huawei Technologies Co Ltd; Renmin University of China
Priority date: 2022-09-27
Filing date: 2022-09-27
Publication date: 2023-01-20

Abstract

A model training method relates to the field of artificial intelligence, and comprises the following steps: acquiring a training sample and a first probability distribution, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on a first article; a first probability distribution including selection probabilities corresponding to a plurality of combinations of the operation data, the first probability distribution for selecting a target combination from the plurality of combinations; acquiring first information and second information according to the target combination; the first information represents the influence capacity on the model precision when each operation data is selected to train the recommended model, and the second information represents the user sensitivity when the first article contained in each operation data is disclosed; the first information, the second information and the target combination are used for obtaining the reward value; and updating the first probability distribution according to the reward value to obtain a second probability distribution, wherein the second probability distribution is used for training the recommendation model. The method and the device can ensure the balance of the trained model on the public sensitivity and the model performance.

Description

Model training method and related equipment

Technical Field

The application relates to the field of artificial intelligence, in particular to a model training method and related equipment.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence, senses the environment, acquires knowledge and uses knowledge to obtain the best results through a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision and reasoning, human-computer interaction, recommendation and search, AI basic theory, and the like.

The recommendation system builds a model by mining the contents such as historical behavior records, personal preferences or statistical characteristics of the user, captures the interest and the demand of the user, characterizes the user image, generates a recommendation list which the user may be interested in, filters a large amount of information, and obtains the contents which meet the demand of the user for personalized recommendation. Collaborative filtering is a recommendation algorithm commonly used by a recommendation system, similar users of a specified user are found in a user group by analyzing user interests, and the evaluation of the similar users on certain information is integrated to form preference degree prediction of the specified user on the information by the system.

The application of recommendation algorithms requires the collection of a large amount of user personal data, but from the user's perspective, it may not be desirable to publish all of their historical behavior for training the model. If an item with user privacy information is used to train the model, the recommendation list generated by the model is likely to contain items similar to the item with privacy, and a person viewing the recommendation can easily infer the user privacy information from the generated recommendation.

Therefore, a recommendation model training method capable of ensuring user privacy is needed.

Disclosure of Invention

The application provides a model training method which can ensure the balance of the trained model on the public sensitivity and the model performance.

In a first aspect, the present application provides a model training method, which is applied to training a recommendation model, and the method includes: acquiring a training sample and a first probability distribution, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on a first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operation data, the first probability distribution being used to select a target combination from the plurality of combinations; acquiring first information and second information according to the target combination; the first information represents the influence capacity on model accuracy when each operation data is selected to train the recommendation model, and the second information represents the user sensitivity when a first article contained in each operation data is disclosed; the first information, the second information and the target combination are used for obtaining a reward value; and updating the first probability distribution according to the reward value to obtain a second probability distribution, wherein the second probability distribution is used for training the recommendation model.

The first probability distribution may indicate which operation data should be used during model training, in order to ensure that operation data corresponding to an article that a user does not want to be disclosed is not used in model training, it is necessary to make the probability of operation data corresponding to an article that a user does not want to be disclosed smaller in the probability distribution, however, some operation data of the operation data may have a key influence on the precision of model training, the present application may obtain the influence capability of each operation data on the precision of the model (for example, the first information in the present application embodiment) and the disclosed sensitivity (for example, the second information in the present application embodiment), update the first probability distribution based on the first information and the second information, and the updated first probability distribution is used for model training, so that a balance between the disclosure sensitivity and the model performance of the trained model may be ensured.

In one possible implementation, the second probability distribution is a distribution that satisfies nash equilibrium.

In one possible implementation, the obtaining first information according to the target combination includes: determining target anchor point data from a plurality of anchor point data according to first attribute information of a first article contained in the plurality of operation data, wherein each anchor point data comprises operation information of the user on the plurality of articles, the similarity between the attribute information of the article contained in the target anchor point data and the first attribute information meets a preset condition, and the first information is determined by training the recommendation model in advance through the target anchor point data.

In one possible implementation, the second information is derived from a user's sensitivity input for the first article, the sensitivity input being one of a sensitivity score and a sensitivity rating.

In one possible implementation, the second information is obtained based on an empirical value representing a sensitivity of the user to disclosure of the first article.

By means of the method, the user is allowed to actively express the open will of the user on the interactive record, the user-controllable recommendation system is achieved, the open will of the user is considered, and the user experience is improved.

In particular, in one possible implementation, the first probability distribution may be updated according to the reward value by projecting a gradient descent, resulting in a second probability distribution.

The attribute information of the user may be at least one of attributes related to the preference characteristics of the user, sex, age, occupation, income, hobbies and education level, wherein the sex may be a male or a female, the age may be a number between 0 and 100, the occupation may be a teacher, a programmer, a chef and the like, the hobbies may be basketball, tennis, running and the like, and the education level may be primary school, junior school, high school, university and the like; the application does not limit the specific type of the attribute information of the user.

The article may be an entity article or a virtual article, for example, the article may be an article such as an Application (APP), audio/video, a webpage, and news information, the attribute information of the article may be at least one of an article name, a developer, an installation package size, a category, and a goodness, where, taking the article as an example, the category of the article may be a chat category, a cool game category, an office category, and the goodness may be a score, a comment, and the like for the article; the application does not limit the specific type of attribute information for the article.

In a second aspect, the present application provides a model training apparatus, applied to training a recommended model, the apparatus including:

the processing module is used for acquiring a training sample and a first probability distribution, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on a first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operation data, the first probability distribution being used to select a target combination from the plurality of combinations;

acquiring first information and second information according to the target combination; the first information represents the influence capacity on model accuracy when each operation data is selected to train the recommendation model, and the second information represents the user sensitivity when the first article contained in each operation data is disclosed; the first information, the second information and the target combination are used for obtaining a reward value;

and the updating module is used for updating the first probability distribution according to the reward value to obtain a second probability distribution, and the second probability distribution is used for training the recommendation model.

In a possible implementation, the processing module is specifically configured to:

determining target anchor point data from a plurality of anchor point data according to first attribute information of a first article contained in the plurality of operation data, wherein each anchor point data comprises operation information of the user on the plurality of articles, the similarity between the attribute information of the article contained in the target anchor point data and the first attribute information meets a preset condition, and the first information is determined by training the recommendation model in advance through the target anchor point data.

In a possible implementation, the update module is specifically configured to:

and updating the first probability distribution according to the reward value through projection gradient descent to obtain a second probability distribution.

In one possible implementation, the attribute information includes a user attribute of the user, and the user attribute includes at least one of: gender, age, occupation, income, hobbies, education level.

In one possible implementation, the attribute information includes an item attribute of the item, the item attribute including at least one of: item name, developer, installation package size, category and goodness.

In a third aspect, an embodiment of the present application provides a model training apparatus, which may include a memory, a processor, and a bus system, where the memory is used to store a program, and the processor is used to execute the program in the memory to perform the method according to the first aspect and any optional method thereof.

In a fourth aspect, embodiments of the present application provide a data processing apparatus, which may include a memory, a processor, and a bus system, wherein the memory is used for storing programs, and the processor is used for executing the programs in the memory to perform the method according to the third aspect and any optional method thereof.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer program causes the computer to execute the first aspect and any optional method thereof, or the third aspect and any optional method thereof.

In a sixth aspect, embodiments of the present application provide a computer program product comprising instructions, which when run on a computer, cause the computer to perform the first aspect and any optional method thereof, or the third aspect and any optional method thereof.

In a seventh aspect, the present application provides a chip system, which includes a processor, configured to support a model training apparatus to implement part or all of the functions mentioned in the above aspects, for example, to transmit or process data mentioned in the above methods; or, information. In one possible design, the system-on-chip further includes a memory, which stores program instructions and data necessary for the execution device or the training device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

Drawings

FIG. 1 is a schematic diagram of an application architecture;

FIG. 2 is a schematic diagram of an application architecture;

FIG. 3 is a schematic diagram of an application architecture;

FIG. 4 is a schematic of an application architecture;

FIG. 5 is a schematic diagram of an application architecture;

FIG. 6A is a schematic diagram of an embodiment of a model training method provided in the embodiments of the present application;

fig. 6B is a schematic diagram of a software architecture provided in an embodiment of the present application;

FIG. 7 is a schematic diagram of an embodiment of a model training apparatus according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an execution device according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a chip according to an embodiment of the present disclosure.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenes, the technical solutions provided in the embodiments of the present application are also applicable to similar technical problems.

The terms "first," "second," and the like in the description and claims of this application and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the manner in which objects of the same nature are distinguished in the embodiments of the application. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The "IT value chain" reflects the value of artificial intelligence to the information technology industry from the underlying infrastructure of human intelligence, information (provision and processing technology implementation) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by intelligent chips (hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA and the like); the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can be used for performing symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, wisdom city etc..

The embodiment of the application can be applied to the field of information recommendation, including but not limited to scenarios related to e-commerce product recommendation, search engine result recommendation, application market recommendation, music recommendation, video recommendation, and the like, and items recommended in various different application scenarios may also be referred to as "objects" to facilitate subsequent description, that is, in different recommendation scenarios, a recommended object may be APP, video, music, or some commodity (for example, a presentation interface of an online shopping platform, different commodities may be displayed for presentation according to a difference of a user, and this may also be presented through a recommendation result of a recommendation model). These recommendation scenarios generally involve user behavior log collection, log data preprocessing (e.g., quantization, sampling, etc.), sample set training to obtain a recommendation model, and analyzing and processing objects (such as APP, music, etc.) involved in the scenario corresponding to the training sample item according to the recommendation model, for example, a sample selected in a training link of the recommendation model comes from an operation behavior of a mobile phone APP market user on a recommended APP, so that the thus-trained recommendation model is applicable to the mobile phone APP application market, or may be used in APP application markets of other types of terminals to recommend the terminal APP. The recommendation model finally calculates the recommendation probability or score of each object to be recommended, the recommendation system sorts the recommendation results selected according to a certain selection rule, for example, according to the recommendation probability or score, the recommendation results are presented to the user through corresponding application or terminal equipment, and the user operates the objects in the recommendation results to generate a user behavior log.

Referring to fig. 4, in the recommendation process, when a user interacts with the recommendation system to trigger a recommendation request, the recommendation system inputs the request and related feature information into a deployed recommendation model, and then predicts click rates of the user on all candidate objects. And then, the candidate objects are arranged in a descending order according to the predicted click rate, and the candidate objects are displayed at different positions in the order to serve as recommendation results for the user. The user browses the displayed items and generates user actions such as browsing, clicking, downloading, etc. The user behaviors can be stored in a log as training data, parameters of the recommendation model are updated irregularly through an offline training module, and the recommendation effect of the model is improved.

For example, a user can trigger a recommendation module of an application market by opening a mobile phone application market, and the recommendation module of the application market can predict the download possibility of the user for each given candidate application according to the historical download record of the user, the click record of the user, the self characteristics of the application, the environmental characteristic information of time, place and the like. According to the predicted result, the application market is displayed in a descending order according to the possibility, and the effect of improving the application downloading probability is achieved. Specifically, applications that are more likely to be downloaded are ranked in a front position, and applications that are less likely to be downloaded are ranked in a rear position. And the user behavior is also stored in a log, and the parameters of the prediction model are trained and updated through the offline training module.

For another example, in applications related to a lifelong partner, a cognitive brain can be constructed by various models and algorithms according to a human brain mechanism based on historical data of a user in a domain such as video, music, news and the like, and a lifelong learning system framework of the user can be constructed. The lifelong partner can record events which occur in the past of the user according to system data, application data and the like, understand the current intention of the user, predict future actions or behaviors of the user and finally realize intelligent service. In the first current stage, behavior data (including information such as short messages, photos and mail events) of a user are obtained according to a music APP, a video APP, a browser APP and the like, on one hand, a user portrait system is built, on the other hand, a learning and memory module based on user information filtering, correlation analysis, cross-domain recommendation, cause and effect reasoning and the like is realized, and a user personal knowledge map is built.

Next, an application architecture of the embodiment of the present application is described.

Referring to fig. 2, an embodiment of the present invention provides a recommendation system architecture 200. The data collecting device 260 is configured to collect samples, a training sample may be composed of a plurality of feature information (or described as attribute information, such as user attributes and article attributes), the feature information may be various, and specifically may include user feature information, object feature information and label features, the user feature information is used to characterize features of a user, such as gender, age, occupation, hobby, and the like, the object feature information is used to characterize features of an object pushed to the user, different recommendation systems correspond to different objects, and types of features required to be extracted by different objects are not required to be the same, for example, the object features extracted from the training sample in the APP market may be name (identifier), type, size, and the like of an APP; the object characteristics mentioned in the training sample of the e-commerce APP can be the name of the commodity, the category of the commodity, the price interval and the like; the tag feature is used to indicate whether the sample is a positive example or a negative example, the tag feature of the sample can be obtained through operation information of the user on the recommended object, the sample of the user on which the user operates on the recommended object is a positive example, the user does not operate on the recommended object, or only the sample browsed is a negative example, for example, when the user clicks or downloads or purchases the recommended object, the tag feature is 1, indicating that the sample is a positive example, and if the user does not perform any operation on the recommended object, the tag feature is 0, indicating that the sample is a negative example. The sample may be stored in the database 230 after being collected, or part or all of the characteristic information in the sample in the database 230 may also be directly obtained from the client device 240, such as user characteristic information, operation information of the user on the object (for determining the type identifier), object characteristic information (such as the object identifier), and the like. The training device 220 trains the acquisition model parameter matrix based on the samples in the database 230 for generating the recommendation model 201. In the following, it will be described in more detail how the training device 220 trains to obtain a model parameter matrix for generating the recommendation model 201, where the recommendation model 201 can be used to evaluate a large number of objects to obtain scores of the objects to be recommended, further, a specified or preset number of objects can be recommended from the evaluation results of the large number of objects, and the calculation module 211 obtains a recommendation result based on the evaluation result of the recommendation model 201 and recommends the recommendation result to the client device through the I/O interface 212.

In this embodiment of the application, the training device 220 may select positive and negative samples from a sample set in the database 230 to be added to the training set, and then train the samples in the training set by using a recommendation model to obtain a trained recommendation model; the implementation details of the calculation module 211 can be described in detail with reference to the method embodiment shown in fig. 5.

The training device 220 obtains a model parameter matrix based on sample training and then is used for constructing the recommendation model 201, and then sends the recommendation model 201 to the execution device 210, or directly sends the model parameter matrix to the execution device 210, and constructs a recommendation model in the execution device 210 for recommendation of a corresponding system, for example, the recommendation model obtained based on video-related sample training may be used for recommendation of a video to a user in a video website or APP, and the recommendation model obtained based on APP-related sample training may be used for recommendation of an APP to the user in an application market.

The execution device 210 is configured with an I/O interface 212 to perform data interaction with an external device, and the execution device 210 may obtain user characteristic information, such as user identification, user identity, gender, occupation, hobbies, and the like, from the client device 240 through the I/O interface 212, and this part of information may also be obtained from a system database. The recommendation model 201 recommends a target recommendation object to the user based on the user characteristic information and the characteristic information of the object to be recommended. The execution device 210 may be disposed in the cloud server, or may be disposed in the user client.

The execution device 210 may call data, code, etc. in the data storage system 250 and may also store the output data in the data storage system 250. The data storage system 250 may be disposed in the execution device 210, may be disposed independently, or may be disposed in other network entities, and the number may be one or more.

The calculation module 211 uses the recommendation model 201 to process the user characteristic information and the characteristic information of the object to be recommended, for example, the calculation module 211 uses the recommendation model 201 to analyze and process the user characteristic information and the characteristic information of the object to be recommended, so as to obtain a score of the object to be recommended, and sorts the objects to be recommended according to the scores, wherein the object ranked earlier will be an object recommended to the client device 240.

Finally, the I/O interface 212 returns the recommendation to the client device 240 for presentation to the user.

Further, the training device 220 may generate corresponding recommendation models 201 based on different sample feature information for different targets to provide better results to the user.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the position relationship between the devices, modules, etc. shown in the diagram does not constitute any limitation, for example, in fig. 2, the data storage system 250 is an external memory with respect to the execution device 210, and in other cases, the data storage system 250 may be disposed in the execution device 210.

In this embodiment of the application, the training device 220, the executing device 210, and the client device 240 may be three different physical devices, or the training device 220 and the executing device 210 may be on the same physical device or a cluster, or the executing device 210 and the client device 240 may be on the same physical device or a cluster.

Referring to fig. 3, a system architecture 300 according to an embodiment of the invention is shown. In this architecture, the execution device 210 is implemented by one or more servers, optionally in cooperation with other computing devices, such as: data storage, routers, load balancers, and other devices; the execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The execution device 210 may use data in the data storage system 250 or call program code in the data storage system 250 to implement the object recommendation function, specifically, input information of the objects to be recommended into a recommendation model, generate pre-estimated scores for each object to be recommended by the recommendation model, then sort the objects to be recommended according to the order of the pre-estimated scores from high to low, and recommend the objects to be recommended to the user according to the sorting result. For example, the top 10 objects in the ranking result are recommended to the user.

The data storage system 250 is configured to receive and store parameters of the recommendation model sent by the training device, and is configured to store data of recommendation results obtained by the recommendation model, and may of course include program codes (or instructions) required by the storage system 250 to operate normally. The data storage system 250 may be a distributed storage cluster formed by one or more devices disposed outside the execution device 210, and in this case, when the execution device 210 needs to use data on the storage system 250, the storage system 250 may send the data needed by the execution device 210 to the execution device 210, and accordingly, the execution device 210 receives and stores (or caches) the data. Of course, the data storage system 250 may also be disposed in the execution device 210, and when disposed in the execution device 210, the distributed storage system may include one or more memories, and optionally, when there are multiple memories, different memories are used to store different types of data, for example, the model parameters of the recommendation model generated by the training device and the data of the recommendation result obtained by the recommendation model may be stored in two different memories respectively.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smartphone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set-top box, gaming console, and so forth.

The local devices of each user may interact with the enforcement device 210 via a communication network of any communication mechanism/standard, such as a wide area network, a local area network, a peer-to-peer connection, etc., or any combination thereof.

In another implementation, the execution device 210 may be implemented by a local device, for example, the local device 301 may implement a recommendation function of the execution device 210 based on a recommendation model to obtain user characteristic information and feed back a recommendation result to a user, or provide a service for the user of the local device 302.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the sake of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described first.

1. Click probability (click-through, CTR)

The click probability may also be referred to as a click rate, and refers to a ratio of the number of times that recommended information (e.g., recommended articles) on a website or an application is clicked to the number of times that recommended articles are exposed, and the click rate is generally an important index for measuring a recommendation system in the recommendation system.

2. Personalized recommendation system

The personalized recommendation system is a system which analyzes by using a machine learning algorithm according to historical data (such as operation information in the embodiment of the application) of a user, predicts a new request according to the analysis, and gives a personalized recommendation result.

3. Off-line training (offretiraining)

The offline training refers to a module for iteratively updating recommendation model parameters according to a learning algorithm of a recommendation model in a personalized recommendation system according to historical data (such as operation information in the embodiment of the application) of a user until set requirements are met.

4. Online prediction (onlineinterference)

The online prediction means that the preference degree of a user to recommended articles in the current context environment is predicted according to the characteristics of the user, the articles and the context and the probability of selecting the recommended articles by the user is predicted based on an offline trained model.

For example, fig. 4 is a schematic diagram of a recommendation system provided in an embodiment of the present application. As shown in fig. 4, when a user enters the system, a request for recommendation is triggered, and the recommendation system inputs the request and its related information (e.g., the operation information in the embodiment of the present application) into the recommendation model, and then predicts the user's selection rate of the items in the system. Further, the items may be sorted in descending order according to the predicted selection rate or based on some function of the selection rate, i.e., the recommendation system may present the items in different positions in order as a result of the recommendation to the user. The user browses various located items and undertakes user actions such as browse, select, and download. Meanwhile, the actual behavior of the user can be stored in a log to be used as training data, and the parameters of the recommended model are continuously updated through the offline training module, so that the prediction effect of the model is improved.

For example, a user opening an application market in a smart terminal (e.g., a cell phone) may trigger a recommendation system in the application market. The recommendation system of the application market predicts the probability of downloading each recommended candidate APP by the user according to the historical behavior log of the user, for example, the historical downloading record and the user selection record of the user, and the self characteristics of the application market, such as the environmental characteristic information of time, place and the like. According to the calculated result, the recommendation system of the application market can display the candidate APPs in a descending order according to the predicted probability value, so that the downloading probability of the candidate APPs is improved.

For example, the APP with the higher predicted user selection rate may be presented at the front recommended position, and the APP with the lower predicted user selection rate may be presented at the rear recommended position.

The recommended model may be a neural network model, and the following describes terms and concepts related to a neural network that may be involved in embodiments of the present application.

(1) Neural network

The neural network may be composed of neural units, and the neural units may refer to operation units with xs (i.e. input data) and intercept 1 as inputs, and the output of the operation units may be:

wherein s =1, 2, \8230, n is a natural number greater than 1, ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) Deep neural network

Deep Neural Networks (DNNs), also known as multi-layer Neural networks, can be understood as Neural networks having many hidden layers, where "many" has no particular metric. From the DNN, which is divided by the positions of different layers, the neural networks inside the DNN can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron at the ith layer is necessarily connected with any neuron at the (i + 1) th layer. Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input. The summary is that: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (formed by a number of layers of vectors W) of all layers of the deep neural network that has been trained.

(3) Loss function

In the process of training the deep neural network, because the output of the deep neural network is expected to be as close to the value really expected to be predicted as possible, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the really expected target value by comparing the predicted value of the current network and the really expected target value (of course, an initialization process is usually carried out before the first update, namely parameters are pre-configured for each layer in the deep neural network), for example, if the predicted value of the network is high, the weight vector is adjusted to be lower in prediction, and the adjustment is carried out continuously until the deep neural network can predict the really expected target value or the value which is very close to the really expected target value. Therefore, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which are loss functions (loss functions) or objective functions (objective functions), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, if the higher the output value (loss) of the loss function indicates the larger the difference, the training of the deep neural network becomes the process of reducing the loss as much as possible.

(4) Back propagation algorithm

The size of the parameters in the initial model can be corrected in the training process by adopting a Back Propagation (BP) algorithm, so that the error loss of the model is smaller and smaller. Specifically, the error loss is generated by passing the input signal forward until the output, and the parameters in the initial model are updated by back-propagating the error loss information, so that the error loss is converged. The back propagation algorithm is an error-loss dominated back propagation motion aimed at obtaining optimal model parameters, such as a weight matrix.

(5) Machine learning system

And training parameters of the machine learning model by an optimization method such as gradient descent and the like based on the input data and the label, and finally completing prediction of unknown data by using the trained model.

(6) Personalized recommendation system

And analyzing and modeling by using a machine learning algorithm according to the historical data of the user, predicting a new user request according to the analysis and modeling, and providing a personalized recommendation result.

(7) Nash equilibrium (nash equibrium)

Also known as non-cooperative game balancing, is an important term for game theory. In a game process, regardless of the choice of the strategy of the other party, the party can choose a certain strategy, and the strategy is called a dominant strategy. If any participant chooses the most optimal policy, as determined by the policies of all other participants, then this combination is defined as nash equilibrium.

One combination of strategies is known as nash equilibrium, when the equilibrium strategy of each player is to maximize the desired profit, all other players follow the same strategy.

(8) The reward function: the objectives of the agent's learning are defined and at each step of the interaction the environment communicates a value, called a reward, to the agent. The goal of the agent is to maximize the sum of the total rewards during the interaction.

(9) And (3) decision making: the method refers to action strategies of participants, each participant has a own strategy set, one strategy is selected from a strategy space as a current decision when a decision is made in the game, and the decision sets comprising all participants form a strategy combination in the game.

(10) Anchor point strategy: refers to a preset strategy combination of all users.

(11) The participator: refers to various rational decision-making subjects in the game, and they can select the optimal behavior mode to obtain the maximum benefit for themselves.

The recommendation system builds a model by mining the contents such as historical behavior records, personal preferences or statistical characteristics of the user, captures the interests and requirements of the user, depicts the user image, generates a recommendation list which the user may be interested in, filters a large amount of information, and obtains the contents meeting the user requirements for personalized recommendation. The collaborative filtering is a recommendation algorithm commonly used by a recommendation system, similar users of a specified user are found in a user group by analyzing user interests, and the evaluation of the similar users on certain information is integrated to form preference prediction of the specified user on the information.

The recommendation system builds a model by mining the contents such as historical behavior records, personal preferences or statistical characteristics of the user, captures the interest and the demand of the user, characterizes the user image, generates a recommendation list which the user may be interested in, filters a large amount of information, and obtains the contents which meet the demand of the user for personalized recommendation. The collaborative filtering is a recommendation algorithm commonly used by a recommendation system, similar users of a specified user are found in a user group by analyzing user interests, and the evaluation of the similar users on certain information is integrated to form preference prediction of the specified user on the information.

As shown in fig. 5, fig. 5 shows a system scenario of the embodiment of the present application. The method comprises modules of user data, preprocessing, recommendation models, prediction lists and the like.

The basic operation logic of the recommendation system is as follows: the user conducts a series of actions in the front-end display list, such as browsing, clicking, commenting, purchasing and the like, interaction record data of the user are generated and stored in the database. The recommendation system carries out preprocessing on user behavior data, article data and the like to convert the user behavior data and the article data into training data, then carries out offline model training, generates a recommendation model after the training is converged, deploys the model to an online environment, gives a predicted recommendation list based on request access, article characteristics and context information of a user, and then the user generates feedback on a recommendation result to form user data.

In the preprocessing stage, original user behavior data are converted into training data, and the content and the structure of the data which finally participate in model training are determined. The data involved in model training both affects the quality of the model and contains a great deal of user privacy.

The embodiment of the application provides a recommendation framework comprehensively considering user open will and model recommendation quality, and the recommendation framework is used for solving the problems in the existing method.

Referring to fig. 6A, fig. 6A is a schematic flow chart of a model training method provided in the embodiment of the present application, and as shown in fig. 6A, the model training method provided in the embodiment of the present application includes:

601. acquiring a training sample and a first probability distribution, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on a first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operation data, and the first probability distribution is used to select a target combination from the plurality of combinations.

In embodiments of the present application, the subject of execution of step 601 may be a terminal device, which may be a portable mobile device, such as, but not limited to, a mobile or portable computing device (e.g., a smartphone), a personal computer, a server computer, a handheld device (e.g., a tablet) or laptop, a multiprocessor system, a gaming console or controller, a microprocessor-based system, a set top box, a programmable consumer electronics, a mobile phone, a mobile computing and/or communication device with a wearable or accessory form factor (e.g., a watch, glasses, headset or earpiece), a network PC, a minicomputer, a mainframe computer, a distributed computing environment that includes any of the above systems or devices, and so on.

In this embodiment of the application, the execution main body in step 601 may be a server on a cloud side, and the server may receive operation data of a user sent by a terminal device, and then the server may obtain the operation data of the user.

For convenience of description, the following describes the training apparatus without distinguishing the form of the execution subject.

In one possible implementation, the model training method provided in FIG. 5 may be applied to training a recommendation model.

In one possible implementation, the tasks implemented by the recommendation model may be a variety of the following tasks: purchasing behavior prediction, shopping cart behavior prediction, sharing behavior prediction, browsing behavior prediction, broadcasting completion rate prediction, praise prediction, collection prediction, click prediction and click conversion prediction.

In a possible implementation, when a recommendation model is trained, training samples need to be obtained, taking the recommendation model as an example, the training samples may include attribute information of a user and an article, and the attribute information may be operation data of the user.

The operation data of the user can be obtained based on an interaction record (for example, a behavior log of the user) between the user and the article, the operation data can include a real operation record of the user on each article, and the operation data can include attribute information of the user, attribute information of each article, and an operation type (for example, clicking, downloading, and the like) of the operation performed by the user on the plurality of articles.

The attribute information of the user may be at least one of attributes related to favorite features of the user, gender, age, occupation, income, hobbies and education level, wherein the gender may be male or female, the age may be a number between 0 and 100, the occupation may be a teacher, a programmer, a chef and the like, the hobbies may be basketball, tennis, running and the like, and the education level may be primary school, junior school, high school, university and the like; the application does not limit the specific type of the attribute information of the user.

In one possible implementation, a training sample and a first probability distribution can be obtained, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on one first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operation data, the first probability distribution being used to select a target combination from the plurality of combinations.

In one possible implementation, a vector α can be maintained for each user that represents the probability distribution of the user taking different decisions, with the shape of [1,2 ] ^l ]Where l is the amount of operational data (or interaction records) of the user. For example, suppose that the training sample of user u is S ^u = 0,1,2, then α ^u Are defined in { (0, 0), (0, 1), (0, 1, 0), (1, 0), (0, 1), (1, 0, 1), (1, 0), (1, 1) }. For example, (0, 1) a strategy for data disclosure, items 1 and 2 in the training set representing user u are disclosed for training, while item 0 is not.

Then the probability of training the model using item 1,2 is indicated.

In a possible implementation, the first probability distribution may indicate which operation data should be used when performing model training, in order to ensure that operation data corresponding to an article that a user does not want to be disclosed is not used in the model training, it is necessary to make a probability in the probability distribution of operation data corresponding to an article that the user does not want to be disclosed (i.e., an article with a high sensitivity when being disclosed to the user) smaller, however, some operation data of the plurality of operation data may affect the precision-related key of the model training (e.g., lack of precision of a recommended model obtained by training the operation data is poor), in order to ensure balance of the two (i.e., privacy of the user and precision of the model training), influence capability of each operation data on the precision of the model (e.g., first information in the embodiment of the present application) and disclosed sensitivity (e., second information in the embodiment of the present application) may be obtained, and the first rate distribution may be updated based on the first information and the second information.

In one possible implementation, a target combination may be selected from the plurality of operational data according to the first probability distribution, and the target combination may be the most probable operational data in the first probability distribution. The target combination may represent data that is most likely to be selected as a training sample at the time of model training in the current probability distribution, and the updating of the first probability distribution may be performed according to the target combination, the first information, and the second information.

Illustratively, assume that the user's decision is

Wherein

0 means that the item is not involved in training, and 1 means training using the item. The goal is to learn a joint hybrid strategy α = (α) ¹ ,α ² ,…,α ^N ) In which

Is a distribution over a strategic space, α ^u (o ^u ) Is the probability that the user took the decision. By alpha ^-u Representing the federated policy for all users except user u. The decision distribution vector a of the user u can be traversed ^u Finding the decision o with the highest probability ^u As the current decision of the user.

602. Acquiring first information and second information according to the target combination; the first information represents the influence capacity on model accuracy when each operation data is selected to train the recommendation model, and the second information represents the user sensitivity when the first article contained in each operation data is disclosed; the first information, the second information and the goal combination are used to derive a reward value.

The first information is described next.

In one possible implementation, the first information may represent an ability to influence model accuracy when selecting each of the operation data to train the recommendation model. Usually, the influence of the operation data on the model accuracy can be known only when the operation data is applied to the training of the recommended model, however, the above method needs to perform multiple times of training of the model, and the training speed of the model is greatly reduced.

In the embodiment of the present application, an influence function may be calculated according to an anchor point policy (the influence function may represent an influence value corresponding to each operation data in anchor point data, and the influence value may represent an influence capability on model precision when model training is performed using the operation data). In the embodiment of the application, the error of approximately estimating the current decision recommendation quality is reduced by setting a plurality of anchor points and calculating corresponding influence functions. Illustratively, z may be used _train Representative of training samples, z _test Representing the test specimen.

The anchor strategy is a public case where a matrix consisting of 0 and 1 represents all user data. According to the anchor point strategy t, enabling the data disclosed by the user in the anchor point to participate in the training of the model, thereby obtaining the optimal parameters

In the pair z _train After this data has been affected (e.g., the data is deleted), z is trained during the training process _test The loss variation of (c) is:

wherein, I (z) _train ,z _test ) The larger the absolute value of (a) is, the larger the influence on the accuracy of the model is, and a positive value indicates that the data has a positive influence on the quality of the model, while a negative value indicates that the data has a negative influence on the quality of the model.

The influence degree of each piece of operation data on the recommendation quality of the recommendation model under the current anchor point strategy can be calculated through the formula 1, and meanwhile, the loss value under the current anchor point strategy can be recorded and recorded as

Through the above manner, the influence value corresponding to each anchor point data in the plurality of anchor point data can be obtained in advance.

In one possible implementation, in order to quickly and accurately know the influence value corresponding to each operation data, data similar to the operation data may be selected from the anchor data, and the influence values corresponding to the operation data may be determined by using the influence values corresponding to the anchor data similar to the operation data.

In one possible implementation, the target anchor point data may be determined from a plurality of anchor point data according to first attribute information of a first article contained in a plurality of pieces of operation data, where each anchor point data includes operation information of the user on a plurality of articles, and a similarity between attribute information of an article included in the target anchor point data and the first attribute information satisfies a preset condition (e.g., is the most similar of the plurality of anchor point data), and the first information is determined by training the recommendation model in advance through the target anchor point data.

For example, the hamming distance between the current policy space o (i.e., the plurality of operation data) and each anchor point policy combination may be calculated, and the anchor point a with the smallest hamming distance may be found _t . Namely:

A _t ＝{o|D _h (o，o ^t )＜D _h (o，o ^t )，t′≠t} (2)

wherein D is _h (o，o ^t ) Represents o, o ^t Hamming distance between.

By the mode, the method and the device can quickly calculate the recommended quality of models under different decisions, avoid retraining the models, greatly improve the solving speed of problems, and reduce the error between the approximate result and the true value of the recommended quality of the model influencing function calculation by a multi-anchor point method.

The second information is described next:

in one possible implementation, the second information indicates each of the operations to be performedThe sensitivity of the user when the first item contained in the data is published. For example, for each user, a vector may be given to represent the second information

Is a user's public intention vector, representing the user's degree of resistance to engaging his interactive items in training, with larger values representing that the user would not want to disclose this data.

In one possible implementation, the second information is derived from a sensitivity input of a user for the first item, the sensitivity input being in particular one of a sensitivity score, a sensitivity rating.

For example, a score of 0 to 1 for the user's own historical interaction record may be obtained to indicate how resistant the user is to disclosing the interaction record, and the larger the score is, the more the user does not want the interaction record to participate in the training of the recommendation model. The manner in which this score is obtained can be implemented in a variety of product forms. Taking a browser as an example, a user configuration page may be provided in the APP, for example, the user is provided with an option of scoring his own browsing history, and the scoring may be performed by inputting a value from 0 to 1, or by providing several selectable scores of different grades. Furthermore, feedback options regarding the degree of privacy may also be provided at the recommended content.

In one possible implementation, the prize value may be derived based on the first information and the second information. For example, a gradient of the probability distribution of the user reward for each user may be calculated based on the first information, the second information and the target combination (i.e. the current decision).

A specific example of calculating the prize value is described below:

suppose there is a user set

And article collection

And

representing the number of users and the number of items, respectively. As used herein

Representing a collection of items that the user has interacted with.

And T ^u Respectively representing a training set and a test set of the user. For each user, a vector is given

Is a user's public willingness vector representing the user's degree of resistance to engaging his interactive items in training, with larger values representing that the user would not want to disclose this piece of data. Suppose the user's decision is

Wherein

0 means that the item is not involved in training, and 1 means training using the item. The goal of this module is to learn a joint hybrid strategy α = (α) ¹ ，α ² ，...，α ^N ) Wherein

Is a distribution over a strategic space, α ^u (o ^u ) Is the probability that the user took the decision. By alpha ^-u Representing the federated policy for all users except user u.

In particular, the decision distribution vector α of user u may be traversed ^u Finding the decision o with the highest probability ^u As the current decision of the user.

Calculating the Hamming distance between the current strategy space o and each anchor point strategy combination, and finding the anchor point A with the minimum Hamming distance _t . Namely:

A _t ＝{o|D _h (o，o ^t )＜D _h (o，o ^t )，t′≠t} (2)

wherein D _h (o，o ^t ) Represents o, o ^t Hamming distance between.

According to the formula (1),

the kth item representing the interaction of user u, y representing the feedback of the user, for the convenience of computing the order

Representing the inverse of the influence function,

representing the inverse of the value of the influence function calculated for the user data based on the anchor point t,

respectively representing the disclosure of the current decision of the user and the anchor policy,

representing the accumulated value of the impact function calculated on the basis of the anchor point t. The reward value of the user can be calculated by combining the influence function of the current anchor point and the public willingness of the user:

in the embodiment of the application, the user publishing intention and the recommendation quality are considered. And the optimization target is comprehensively considered from the two aspects of the user open intention and the model recommendation quality, so that the user open intention is protected, and the recommendation quality of the model is ensured.

603. And updating the first probability distribution according to the reward value to obtain a second probability distribution, wherein the second probability distribution is used for training the recommendation model.

Due to the difficulty in obtaining alpha efficiently ^u So that the probability distribution of the user decision can be iteratively optimized using a projection gradient descent method. In particular, in one possible implementation, the first probability distribution may be updated according to the reward value by projecting a gradient descent, resulting in a second probability distribution.

For example, after calculating the user rewards, the probability distribution of the user decisions may be optimized according to the user rewards. In order to use the projection gradient descent method, it is necessary to first obtain z _u (α ^u ，α ^-u ) For the

Of the gradient of (c). For convenience of representation, let

The user reward may be abbreviated as:

order S _t (o ^u )＝{o ^-u |[o ^u ，o ^-u ]∈A _t Is then z _u (α ^u ，α ^-u ) For the

The gradient of (d) is:

let a

Vector of dimensions

Denotes z _u (α ^u ，α ^-u ) For alpha ^u Of the gradient of (c). The formula for updating the user decision probability distribution for the first time is as follows:

where γ is the hyper-parametric learning rate.

In order to take the recommendation quality and the public will of the user on personal information into account, the embodiment of the application provides a novel recommendation framework, which allows each user to actively set a will vector to indicate how many corresponding interaction records the user does not want to use for training a recommendation model. The method classifies the problem as a multiplayer game. Each user is a participant and the user's decision is a vector representing a subset of all the user's interacting items, and these interaction records will be published for use in training the model. And after the user makes a decision, the recommendation model carries out training optimization according to the interaction records selected and disclosed by all the users. The challenge in solving the game problem is how to efficiently obtain the system recommendation quality obtained by the user after adopting different decisions.

To address this problem, embodiments of the present application maintain a vector α for each user that represents the probability distribution of the user taking different decisions, with the shape of [1,2 ] ^l ]Where l is the number of interaction records for the user.

For example, if the training set of user u is Su = {0,1,2}, then α ^u Is defined as follows: { (0, 0), (0, 1), (0, 1, 0), (1, 0), (0, 1), (1, 0, 1), (1, 0), (1, 1) } onA probability distribution of (a). Wherein (0, 1) a strategy for data disclosure, items 1 and 2 in the training set representing user u are disclosed for training, while item 0 is not disclosed.

Representing the probability of training the model using item 1, 2.

In order to improve accuracy, a plurality of anchor point strategies are randomly set, a recommendation model is trained based on the anchor point strategies, and the influence degree of each interaction record on recommendation quality under different anchor point strategies can be obtained. Then, for different user decisions, the method firstly finds the anchor point strategy with the minimum distance from the current decision Huffman, and quickly obtains the recommended quality corresponding to the current decision by utilizing the influence function value corresponding to the anchor point strategy without retraining. After the recommendation quality is obtained, the current benefit of the user can be calculated according to the recommendation quality and the public will of the user under the current decision, and then the decision probability distribution of the user is optimized by utilizing projection gradient reduction according to the benefit of the user. And (4) carrying out the optimization of the decision probability distribution on each user, then carrying out the next decision, and repeating the process for a plurality of times to obtain the optimum decision of all users. The method can be used as a recommendation frame to be matched with different recommendation models, the user privacy is protected, the recommendation quality can be considered, and the maximization of the user benefit is realized.

The overall training process for the user decision optimization module can be exemplified as follows in table 1:

TABLE 1 user decision optimization Module training Process

An example process of the above algorithm is as follows:

suppose user u has two interaction records, which have a value of 2 ² The 4 possible decisions are { {0,0}, {1,0}, {0,1}, {1,1} }, respectively. The current decision probability distribution of the user is

And selecting a decision 3 ({ 0,1 }) with the maximum probability, namely hiding the first interaction record and disclosing the second interaction record.

And (2) each user performs the step 1, the decisions of all the users form a current decision combination, and a matrix consisting of 0 and 1 represents the data disclosure condition of all the users. And (3) assuming that the anchor point strategies are total, traversing the 3 anchor point strategies according to a formula (2) to find the anchor point strategy which is closest to the combined Huffman distance formed by the decisions of all the current users, and assuming that the anchor point strategy is 2.

Using the influence function value sigma corresponding to anchor point 2 closest to it _u，≠u (o ^u′ ) ^T g ^t，u′ +(o ^u ) ^T g ^t，u Loss value of

The current policy combination o and the public willingness β of the user ^u Substituting formula (5) to calculate the gradient of the current user reward

After obtaining the gradient, utilize

To update the current decision probability distribution of the user

I.e., update {0.15,0.2,0.35,0.3}, and finally order

And (5) constraining the sum of the probability of the result to be 1 through a softmax function, and completing the updating of the ith round of the user u.

The above process is repeated until the model converges.

The embodiment of the application provides a model training method, which is applied to training of a recommendation model and comprises the following steps: acquiring a training sample and a first probability distribution, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on a first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operational data; the first probability distribution is used to select a target combination from the plurality of combinations; acquiring first information and second information according to the training sample; the first information represents the influence capacity on model accuracy when each operation data is selected to train the recommendation model, and the second information represents the user sensitivity when a first article contained in each operation data is disclosed; the first information, the second information and the target combination are used for obtaining a reward value; and updating the first probability distribution according to the reward value to obtain a second probability distribution, wherein the second probability distribution is used for training the recommendation model. The first probability distribution may indicate which operation data should be used when performing model training, in order to ensure that operation data corresponding to an article that a user does not want to be disclosed is not used in the model training, it is necessary to make the probability of operation data corresponding to an article that a user does not want to be disclosed smaller in the probability distribution, however, some operation data of the plurality of operation data may affect the accuracy of the model training, the present application may obtain the influence capability of each operation data on the accuracy of the model (e.g., the first information in the present application embodiment) and the disclosed sensitivity (e.g., the second information in the present application embodiment), update the first probability distribution based on the first information and the second information, the updated first probability distribution is used for performing model training, and balance between the disclosure sensitivity and the model performance of the trained model may be ensured.

The following presents the beneficial effects of the embodiments of the present application in conjunction with experiments:

in one possible implementation, four data sets are used: digenetica data set, amazon Video Games data set, steam data set, and the generated simulation data set, and statistical information is shown in table 2.

Table 2 data set statistics

Data set	Number of users	Number of articles	Number of interactions	Degree of sparseness
					Diginetica	2852	10739	17073	99.94％
Amazon Video	2790	12435	18703	99.95％
					Steam	11942	6955	86595	99.89％
Simulation	1000	1000	6148	99.39％

The experimental evaluation index is F1@10 and Reward of a user, and five basic models including MF, neuMF, lightGCN, DIN and CDAE belong to five types including matrix decomposition, a deep neural network, a graph neural network, sequence recommendation and an automatic encoder respectively. The results of the experiment are shown in table 3. As can be seen from the table, the best results on the Reward of the user are obtained by the embodiment of the present application (IFRQE + +) compared to the baseline model, showing the superiority of the invention:

TABLE 3 statistics of the results

The embodiment of the application is a recommendation framework considering the public will of a user, and the user benefit is maximized by hiding part of user interaction data from the aspects of user privacy and recommendation quality. It can be seen from the table that in these different types of models, the addition of the recommended framework IFRQE + + of the embodiment of the present application can significantly improve the performance of the models, and shows that the framework has good compatibility.

Next, a software implementation framework of the embodiment of the present application is described:

referring to fig. 6B, the core device in the embodiment of the present application is mainly based on an anchor point policy influence function calculation module and a user decision optimization module. In the framework, the influence function calculation module calculates the influence function value of each piece of data according to a formula according to a preset anchor point strategy. The user decision optimization module optimizes the probability distribution of user decisions through iterative learning according to a user reward function, and finally obtains the optimal open interaction record decision of each user. The recommendation framework provided by the invention can be combined with various types of recommendation models, and has wide adaptation capability to different types of recommendation models.

As shown in fig. 6B, the user data is acquired and processed into a format for model training before entering the core device portion of the embodiment of the present application. The method comprises the steps of disclosing user data according to a plurality of anchor point strategies which are set in advance, training a model based on the published data, and further calculating the influence degree of each piece of interactive data on the model recommendation quality. The user decision optimization module firstly obtains the current decision of the user according to the decision probability distribution of the user and determines the public data according to the decision of the user. Firstly, an anchor point strategy closest to the current decision-making Huffman is found, the recommendation quality of the current recommendation system can be quickly calculated by utilizing the influence function value corresponding to the anchor point, and then the user reward of the current decision is obtained according to the reward function by combining the open will of the user. And optimizing the decision probability distribution of the user by using a projection gradient descent method according to the user reward, and then obtaining a new user decision. And circulating the steps, finally obtaining the optimal decision of the user, and disclosing the interaction record of the user based on the optimal decision of the user to finish the training of the recommendation model.

Referring to fig. 7, fig. 7 is a schematic structural diagram of a model training apparatus provided in an embodiment of the present application, and as shown in fig. 7, the apparatus 700 includes:

a processing module 701, configured to obtain a training sample and a first probability distribution, where the training sample includes multiple operation data, and each operation data is operation information of a user on a first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operation data, the first probability distribution being used to select a target combination from the plurality of combinations;

acquiring first information and second information according to the target combination; the first information represents the influence capacity on model accuracy when each operation data is selected to train the recommendation model, and the second information represents the user sensitivity when a first article contained in each operation data is disclosed; the first information, the second information and the target combination are used for obtaining a reward value;

for a detailed description of the processing module 701, reference may be made to the descriptions of step 601 and step 602 in the foregoing embodiments, and details are not described here.

An updating module 702, configured to update the first probability distribution according to the reward value to obtain a second probability distribution, where the second probability distribution is used to train the recommendation model.

For a detailed description of the updating module 702, reference may be made to the introduction of step 603 in the foregoing embodiment, and details are not described here.

In one possible implementation, the second information is derived from empirical values representing a sensitivity of the user to disclosure of the first item.

In a possible implementation, the update module is specifically configured to:

In one possible implementation, the attribute information includes an item attribute of the item, the item attribute including at least one of: item name, developer, installation package size, category, goodness.

Referring to fig. 8, fig. 8 is a schematic structural diagram of an execution device provided in the embodiment of the present application, and the execution device 800 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, and the like, which is not limited herein. Specifically, the execution apparatus 800 includes: a receiver 801, a transmitter 802, a processor 803, and a memory 804 (wherein the number of processors 803 in the execution device 800 may be one or more, and one processor is taken as an example in fig. 8), wherein the processor 803 may include an application processor 8031 and a communication processor 8032. In some embodiments of the present application, the receiver 801, the transmitter 802, the processor 803, and the memory 804 may be connected by a bus or other means.

The memory 804 may include both read-only memory and random access memory and provides instructions and data to the processor 803. A portion of the memory 804 may also include non-volatile random access memory (NVRAM). The memory 804 stores the processor and operating instructions, executable modules or data structures, or a subset or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 803 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiments of the present application can be applied to the processor 803 or implemented by the processor 803. The processor 803 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 803. The processor 803 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components. The processor 803 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 804, and the processor 803 reads the information in the memory 804, and completes the steps of the method in combination with the hardware.

Receiver 801 may be used to receive input numeric or character information and generate signal inputs related to performing device related settings and function control. Transmitter 802 may be used to output numeric or character information; the transmitter 802 may also be used to send instructions to the disk groups to modify the data in the disk groups.

In one embodiment of the present application, the processor 803 is configured to execute the steps of the model obtained by the model training method in the embodiment corresponding to fig. 6A.

In an embodiment of the present application, a server is further provided, please refer to fig. 9, fig. 9 is a schematic structural diagram of the server provided in the embodiment of the present application, specifically, the server 900 is implemented by one or more servers, and the server 900 may generate a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 99 (e.g., one or more processors) and a memory 932, and one or more storage media 930 (e.g., one or more mass storage devices) for storing an application 942 or data 944. Memory 932 and storage media 930 can be, among other things, transient storage or persistent storage. The program stored on the storage medium 930 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 99 may be arranged to communicate with the storage medium 930 to execute a series of instruction operations in the storage medium 930 on the server 900.

The server 900 may also include one or more power supplies 99, one or more wired or wireless network interfaces 950, one or more input-output interfaces 958; or, one or more operating systems 941 such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

In the embodiment of the present application, the central processing unit 99 is configured to execute the steps of the model training method in the embodiment corresponding to fig. 6A.

Also provided in embodiments of the present application is a computer program product comprising computer readable instructions, which when run on a computer, cause the computer to perform the steps as performed by the aforementioned execution device, or cause the computer to perform the steps as performed by the aforementioned training device.

Also provided in an embodiment of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is run on a computer, the program causes the computer to execute the steps executed by the aforementioned execution device, or causes the computer to execute the steps executed by the aforementioned training device.

The execution device, the training device, or the terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored in the storage unit to enable the chip in the execution device to execute the model training method described in the above embodiment, or to enable the chip in the training device to execute the steps related to the model training in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the radio access device, such as a read-only memory (ROM) or another type of static storage device that may store static information and instructions, a Random Access Memory (RAM), and the like.

Specifically, referring to fig. 10, fig. 10 is a schematic structural diagram of a chip provided in the embodiment of the present application, where the chip may be represented as a neural network processor NPU 1000, and the NPU 1000 is mounted on a main CPU (Host CPU) as a coprocessor, and the Host CPU allocates tasks. The core portion of the NPU is an arithmetic circuit 1003, and the controller 1004 controls the arithmetic circuit 1003 to extract matrix data in the memory and perform multiplication.

In some implementations, the arithmetic circuit 1003 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit 1003 is a two-dimensional systolic array. The arithmetic circuit 1003 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1003 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1002 and buffers it in each PE in the arithmetic circuit. The arithmetic circuit takes the matrix a data from the input memory 1001 and performs matrix arithmetic on the matrix B, and stores a partial result or a final result of the obtained matrix in an accumulator (accumulator) 1008.

The unified memory 1006 is used for storing input data and output data. The weight data is directly passed through a Memory cell Access Controller (DMAC) 1005, and the DMAC is carried into the weight Memory 1002. The input data is also carried into the unified memory 1006 by the DMAC.

The BIU is a Bus Interface Unit 1010 for interaction of the AXI Bus with the DMAC and an Instruction Fetch memory (IFB) 1009.

A Bus Interface Unit 1010 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1009 to fetch instructions from the external memory, and is also used for the memory Unit access controller 1005 to fetch the original data of the input matrix a or the weight matrix B from the external memory.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1006 or to transfer weight data into the weight memory 1002 or to transfer input data into the input memory 1001.

The vector calculation unit 1007 includes a plurality of operation processing units, and further processes the output of the operation circuit such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, if necessary. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization, pixel-level summation, up-sampling of a feature plane and the like.

In some implementations, the vector calculation unit 1007 can store the processed output vector to the unified memory 1006. For example, the vector calculation unit 1007 may calculate a linear function; alternatively, a non-linear function is applied to the output of the arithmetic circuit 1003, such as a linear interpolation of the feature planes extracted by the convolution layer, and then such as a vector of accumulated values to generate the activation value. In some implementations, the vector calculation unit 1007 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuit 1003, for example, for use in subsequent layers in a neural network.

An instruction fetch buffer 1009 connected to the controller 1004, for storing instructions used by the controller 1004;

the unified memory 1006, the input memory 1001, the weight memory 1002, and the instruction fetch memory 1009 are On-Chip memories. The external memory is private to the NPU hardware architecture.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the program.

It should be noted that the above-described embodiments of the apparatus are merely illustrative, where the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the embodiments of the application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optics, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device, such as a training device, data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

Claims

1. A model training method, applied to training of a recommendation model, the method comprising:

acquiring a training sample and a first probability distribution, wherein the training sample comprises a plurality of operation data, and each operation data is operation information of a user on a first article; the first probability distribution includes selection probabilities corresponding to a plurality of combinations of the operational data; the first probability distribution is used to select a target combination from the plurality of combinations;

acquiring first information and second information according to the training sample; the first information represents the influence capacity on model accuracy when each operation data is selected to train the recommendation model, and the second information represents the user sensitivity when a first article contained in each operation data is disclosed; the first information, the second information and the target combination are used for obtaining a reward value;

and updating the first probability distribution according to the reward value to obtain a second probability distribution, wherein the second probability distribution is used for training the recommendation model.

2. The method of claim 1, wherein the second probability distribution is a distribution that satisfies nash equilibrium.

3. The method according to claim 1 or 2, wherein the obtaining first information according to the target combination comprises:

4. The method of any of claims 1 to 3, wherein the second information is derived from a user's sensitivity input for the first item, the sensitivity input being in particular one of a sensitivity score and a sensitivity rating.

5. The method according to any one of claims 1 to 3, wherein the second information is obtained from an empirical value representing a sensitivity of a user to disclosure of the first item.

6. The method of any of claims 1 to 5, wherein said updating said first probability distribution based on said reward value, resulting in a second probability distribution, comprises:

and updating the first probability distribution according to the reward value by projecting gradient descent to obtain a second probability distribution.

7. The method according to any of claims 3 to 6, wherein the attribute information comprises user attributes of the user, wherein the user attributes comprise at least one of: gender, age, occupation, income, hobbies, education level.

8. The method of any of claims 3 to 7, wherein the attribute information comprises item attributes of the item, the item attributes comprising at least one of: item name, developer, installation package size, category, goodness.

9. A model training apparatus, for use in training a recommended model, the apparatus comprising:

10. The apparatus of claim 9, wherein the second probability distribution is a distribution that satisfies nash equilibrium.

11. The apparatus according to claim 9 or 10, wherein the processing module is specifically configured to:

12. The apparatus of any of claims 9 to 11, wherein the second information is derived from a user's sensitivity input with respect to the first item, the sensitivity input being in particular one of a sensitivity score and a sensitivity rating.

13. The apparatus according to any one of claims 9 to 12, wherein the second information is obtained from an empirical value representing a sensitivity of a user to disclosure of the first article.

14. The apparatus according to any one of claims 9 to 13, wherein the update module is specifically configured to:

15. The apparatus according to any one of claims 11 to 14, wherein the attribute information includes user attributes of the user, and the user attributes include at least one of: gender, age, occupation, income, hobbies, education level.

16. The apparatus of any one of claims 11 to 15, wherein the attribute information comprises an item attribute of the item, the item attribute comprising at least one of: item name, developer, installation package size, category, goodness.

17. A model training apparatus, the apparatus comprising a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of any of claims 1 to 8.

18. A computer readable storage medium comprising computer readable instructions which, when run on a computer device, cause the computer device to perform the method of any of claims 1 to 8.

19. A computer program product comprising computer readable instructions which, when run on a computer device, cause the computer device to perform the method of any one of claims 1 to 8.