CN118043802A

CN118043802A - Recommendation model training method and device

Info

Publication number: CN118043802A
Application number: CN202180102753.1A
Authority: CN
Inventors: 张小莲; 刘杜钢; 程朋祥
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-09-29
Filing date: 2021-09-29
Publication date: 2024-05-14
Also published as: WO2023050143A1

Abstract

A recommendation model training method comprises the following steps: the difference between the first recommendation result obtained by processing the plurality of first candidate objects by the first recommendation model and the second recommendation result obtained by processing the plurality of first candidate objects by the second recommendation model is taken as a part of target loss, the processing error which can characterize the second prediction model is also taken as a part of target loss, the target loss can more accurately characterize the difference between the prediction result and the accurate result of the first recommendation result, and the target loss constructed based on the results can improve the prediction performance of the first prediction model for random flow by training the first prediction model.

Description

Recommendation model training method and device

Technical Field

The application relates to the field of artificial intelligence, in particular to a recommendation model training method and device.

Background

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The selection rate prediction refers to predicting the selection probability of a certain item by a user under a specific environment. For example, in recommendation systems for applications such as application stores and online advertisements, selectivity prediction plays a key role; the method can achieve maximization of the income of enterprises and improvement of user satisfaction through the selection rate prediction, and the recommendation system needs to consider the selection rate of the user on the articles and the price of the articles at the same time, wherein the selection rate is obtained through the prediction of the recommendation system according to the historical behaviors of the user, and the price of the articles represents the income of the system after the articles are selected/downloaded. For example, a function may be constructed that computes a function value based on the predicted user selectivity and the item bids, and the recommendation system ranks the items in descending order of the function value.

The recommendation system acts as a feedback loop (feedback loop) system, and the user can create various bias problems, such as position bias, during interaction with the system. The presence of the positional bias is such that the user feedback data collected by the recommendation system does not reflect the actual preferences of the user. Most classical algorithms however by default assume that the observed user preferences are the actual preferences of the user and strive to better fit the observed feedback data distribution. This may result in the recommender system converging on a biased sub-optimal solution, reducing the recommendation performance of the recommender system.

Disclosure of Invention

The application provides a recommendation model training method and device, which can improve the prediction performance of a first prediction model aiming at random flow.

In a first aspect, the present application provides a recommendation model training method, the method comprising:

Acquiring a first recommendation model and a plurality of first candidate objects;

In one possible implementation, the first recommendation model may be an initialized model, where the initialized model may be understood as a model with parameters randomly initialized, it should be understood that the first recommendation model may also be a model with fewer training times and without higher recommendation performance, or the first recommendation model may also be a model obtained by training log data, where the recommendation result of the model with respect to the full amount of data (such as non-exposure data) is not accurate (inaccuracy may be understood as a large difference from the actual selection result of the user);

alternatively, the plurality of first candidate objects may be data that the recommendation system has not presented to the target user;

Alternatively, the plurality of first candidate objects may be objects selected from data that has not been presented to the target user;

Alternatively, the plurality of first candidate objects may be objects randomly selected from data that has not been presented to the target user;

wherein a so-called "presentation" may be described as a presentation, display, etc.;

wherein the term "not yet presented to the target user" is understood to mean that it has not yet been presented in one recommendation result at the same time;

In one possible implementation, the first recommendation model may be a machine learning model, which may consist of, for example, a single-level linear or nonlinear operation (e.g., support vector machine (support vector machines, SVM)) or may be a deep network, i.e., a machine learning model consisting of multiple levels of nonlinear operation. Examples of deep networks are neural networks with one or more hidden layers, and such machine learning models may be trained, for example, by adjusting the weights of the neural networks in accordance with a back propagation learning algorithm or the like;

Processing the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result;

In one possible implementation, the plurality of first candidate objects may be processed through the first recommendation model, that is, the plurality of first candidate objects are taken as input of the first recommendation model, and a feedforward process of the first recommendation model is performed;

Processing the plurality of first candidate objects through a second recommendation model to obtain a second recommendation result; the second recommendation model is trained based on operation data of a target user, the operation data comprises a plurality of second candidate objects and real selection results of the target user for the plurality of second candidate objects, the second candidate objects are different from the first candidate objects, and the results obtained by processing the plurality of second candidate objects by the second recommendation model are third recommendation results;

the second recommendation model may be a model obtained based on random flow training, and because the number of random flows is smaller, the recommendation accuracy of the second recommendation model is lower (for example, the variance of the recommendation result is larger);

The plurality of second candidate objects may be the random traffic, that is, the plurality of second candidate objects may be data that has been presented to the target user, and the target user has operated on the plurality of second candidate objects, and the operation data may include the plurality of second candidate objects and a true selection result of the target user for the plurality of second candidate objects;

In one possible implementation, the true selection result may indicate whether the second candidate object is a positive sample or a negative sample, i.e. a sample type label (label feature), and whether a sample belongs to a positive sample or a negative sample may be obtained by identifying a sample type label in the sample, e.g. when the sample type label of a sample is 1, this indicates that the sample is a positive sample, and when the sample type label of a sample is 0, this indicates that the sample is a negative sample. The sample type label of one sample is determined by the operation information of a user on an object of the feature description in the one sample;

Predicting an error of the second recommendation result according to the similarity between the first candidate objects and the second candidate objects and the first difference between the third recommendation result and the real selection result, wherein the error is inversely related to the similarity, and the error is positively related to the first difference;

In one possible implementation, the second recommendation model may obtain a second recommendation result when processing the plurality of first candidate objects, and the second recommendation result cannot be considered to represent the real intention of the user (that is, there is an error between the second recommendation result and the real intention of the user) because the recommendation accuracy of the second recommendation model is low.

In order to predict that an error exists between the second recommendation result and the real intention of the user, in the embodiment of the present application, the error of the second recommendation result may be predicted based on the similarity between the plurality of first candidate objects and the plurality of second candidate objects, and the first difference between the third recommendation result and the real selection result.

In one possible implementation, the second recommendation model may obtain a third recommendation result when processing the plurality of second candidate objects, and the third recommendation result cannot be considered to represent the real intention of the user (that is, there is an error between the third recommendation result and the real selection result) because of the low recommendation accuracy of the second recommendation model.

It should be appreciated that the plurality of second candidates processed by the second recommendation model in obtaining the third recommendation may be identical to the plurality of second candidates used in the relaxed training of the second recommendation model, e.g., there may be an intersection or no intersection.

In one aspect, the first difference between the third recommended result and the actual selection result may express a model error of the second recommended model, that is, an error of the second recommended result, to some extent. Alternatively, the error of the second recommendation result is positively correlated with the first difference, so that the positive correlation is understood to be that the larger the first difference between the third recommendation result and the actual selection result is, the larger the error of the second recommendation result is (in the case that other information is unchanged);

it should be appreciated that the first difference herein may be measured based on Euclidean distance (eucledian distance), manhattan distance (MANHATTAN DISTANCE), minkowski distance (minkowski distance), cosine similarity (cosine similarity), jaccard coefficients, pearson correlation coefficients (pearson correlation coefficient), and the like, and is not limited thereto;

On the other hand, when the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects is large, it can be said that the error of the second recommendation result is large (because, when the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects is small, equivalent to a training sample not adopted when the plurality of first candidate objects are trained for the second prediction model, the data processing accuracy of the second prediction model is lower with respect to the data processing accuracy when the plurality of first candidate objects are processed, that is, the error of the second recommendation result is larger), that is, the error of the second recommendation result is inversely correlated with the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects, so-called negative correlation, can be understood that the larger the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects is, the smaller the error of the second recommendation result (in the case where other information is unchanged);

It should be appreciated that the similarity herein may be measured based on Euclidean distance (eucledian distance), manhattan distance (MANHATTAN DISTANCE), minkowski distance (minkowski distance), cosine similarity (cosine similarity), jaccard coefficients, pearson correlation coefficients (pearson correlation coefficient), and the like, and is not limited thereto;

a target loss is determined based on a third difference between the first recommendation and the second recommendation, and the error, and the first recommendation model is updated according to the target loss.

The embodiment of the application provides a recommendation model training method, which comprises the following steps: acquiring a first recommendation model and a plurality of first candidate objects; processing the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result; processing the plurality of first candidate objects through a second recommendation model to obtain a second recommendation result; the second recommendation model is trained based on operation data of a target user, the operation data comprises a plurality of second candidate objects and real selection results of the target user for the plurality of second candidate objects, the second candidate objects are different from the first candidate objects, and the results obtained by processing the plurality of second candidate objects by the second recommendation model are third recommendation results; predicting an error of the second recommendation result according to the similarity between the first candidate objects and the second candidate objects and the first difference between the third recommendation result and the real selection result, wherein the error is inversely related to the similarity, and the error is positively related to the first difference; a target loss is determined based on a third difference between the first recommendation and the second recommendation, and the error, and the first recommendation model is updated according to the target loss. By the above manner, the third difference between the first recommended result and the second recommended result may represent the difference between the first prediction model and the second prediction model, although the prediction performance of the second prediction model itself is not high (because the number of the second candidate objects is lower as a training sample of the second prediction model), the calculated error may represent the processing error of the second prediction model, and the result obtained by combining (such as direct addition operation or other fusion operation) the third difference between the first recommended result and the second recommended result with the error may represent the difference between the prediction result and the accurate result of the first recommended result more accurately, and training the first prediction model based on the target loss constructed by the result may improve the prediction performance of the first prediction model for random flow.

In one possible implementation, the first plurality of candidate objects are objects that are not presented to the target user, and the second plurality of candidate objects are objects that have been presented to the target user.

The plurality of second candidate objects may be the random traffic, that is, the plurality of second candidate objects may be data that has been presented to the target user, and the target user has operated on the plurality of second candidate objects, and the operation data may include the plurality of second candidate objects and a true selection result of the target user for the plurality of second candidate objects.

In one possible implementation, the plurality of second candidate objects are randomly selected from a plurality of objects that have been presented to the target user, and the plurality of first candidate objects are randomly selected from a plurality of objects that have not been presented to the target user. The randomly selected object is used as a training sample, so that errors caused by offset of the recommendation model can be reduced.

In one possible implementation, the error is also inversely related to a number of second candidates in the plurality of second candidates. Although the number of second candidates among the plurality of second candidates is not high, when the number of second candidates among the plurality of second candidates is larger, the recommendation accuracy of the second recommendation model can be considered to be high, and therefore, the error can also be inversely related to the number of second candidates among the plurality of second candidates, so-called negative correlation, which can be understood as that the smaller the error of the second recommendation result (in the case where other information is unchanged) is the larger the number of second candidates among the plurality of second candidates.

In one possible implementation, the error includes a corresponding bias term for the second recommendation, a corresponding variance term for the second recommendation, the bias term being inversely related to the similarity, and the variance term being inversely related to a number of second candidates in the plurality of second candidates, and the first difference sum.

In one possible implementation, the first recommendation result and the second recommendation result respectively include a recommendation score for each of the first candidate objects; or the first recommendation result and the second recommendation result respectively comprise a target recommendation object selected from the plurality of first candidate objects.

Wherein the recommendation score may represent a predictive score of the first recommendation model for each first candidate object. The target recommended object may be determined according to a specific setting of the first recommendation model, for example, the target recommended object may be recommended according to a preset number and a score ranking. For example: and setting the model as a first candidate object with the recommended score of ten, and determining the first candidate object with the recommended score of ten as a target recommended object.

In one possible implementation, the method further comprises: processing the plurality of second candidate objects through the first recommendation model to obtain a fourth recommendation result; the determining a target loss based on a third difference between the first recommendation and the second recommendation, and the error, comprising: a target loss is determined based on a third difference between the first recommendation and the second recommendation, a fourth difference between the fourth recommendation and the actual selection, and the error. The first predictive model may also be trained using log data and tagged random traffic. When the first predictive model is trained using log data, the log data may be processed based on the first predictive model and the difference between the processing result and the actual tag of the log data may be taken as part of the target loss. When the tagged random traffic is used to train the first predictive model, the plurality of second candidate objects may be processed by the first predictive model to obtain a fourth recommended result, and a fourth difference between the fourth recommended result and the true selected result may be taken as part of the target loss. The fourth difference is used as a part of the target loss, so that the difference between the output of the first recommendation model and the accurate label can be expressed more accurately, and further the recommendation precision of the first recommendation model can be improved by updating the first recommendation model based on the target loss.

In one possible implementation, the method further comprises: obtaining a user attribute of the target user, wherein the user attribute comprises at least one of the following: gender, age, occupation, income, hobbies, education level; the processing the plurality of first recommended objects by the first recommendation model includes: processing the plurality of first recommended objects and the user attribute through the first recommendation model; the processing the plurality of first recommended objects through the second recommendation model includes: processing the plurality of first recommended objects and the user attribute through the second recommendation model.

In one possible implementation, the input data of the feed-forward process of the first recommendation model may include, in addition to the first plurality of candidates, user attributes of the target user, wherein the user attributes may include at least one of: gender, age, occupation, income, hobbies, education level;

Wherein the attribute information of the target user may be at least one of attribute related to user preference characteristics, gender, age, occupation, income, hobbies and education-receiving degree, wherein the gender may be male or female, the age may be a number between 0 and 100, the occupation may be teacher, programmer, chef, etc., the hobbies may be basketball, tennis, running, etc., the education-receiving degree may be primary school, junior middle school, high school, university, etc.; the present application is not limited to a specific type of attribute information of the target user.

In one possible implementation, the obtaining of the plurality of first candidate objects may be understood as obtaining feature information of each first candidate object in the plurality of first candidate objects, where the feature information may include one or more of, for example, a name of the candidate object (or referred to as an object Identification (ID)), an Identification (ID) of an APP recommendation result to which the object belongs (e.g., a utility, audio-visual entertainment, etc.), a profile of the candidate object, a size of the candidate object (e.g., when the candidate object is an APP, the size of the candidate object may be an installation package size of the candidate object), a developer of the candidate object, a tag of the object (e.g., the tag may indicate a class of the candidate object), a comment of the candidate object (e.g., a plausibility of the candidate object), or the like, and may include other attribute information of the object instead of including the information listed herein.

In one possible implementation, the method further comprises: the operation data are acquired by the terminal equipment based on the operation of the target user on a target interface, wherein the target interface comprises a first interface and a second interface, the first interface comprises a control, the control is used for indicating whether to start collection of random traffic, the operation comprises a first operation of the target user aiming at the first control, the second interface is an interface displayed in response to the first operation, the first operation is used for indicating to start collection of the random traffic, the second interface comprises a plurality of second candidate objects, the operation further comprises a second operation of the target user aiming at a plurality of second candidate objects, and the second operation is used for determining the real selection result.

In a second aspect, the present application provides a recommendation model training apparatus, the apparatus comprising:

the acquisition module is used for acquiring the first recommendation model and a plurality of first candidate objects;

The feedforward module is used for processing the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result;

An error determining module, configured to predict an error of the second recommendation result according to a similarity between the plurality of first candidate objects and the plurality of second candidate objects, and a first difference between the third recommendation result and the actual selection result, where the error is inversely related to the similarity, and the error is positively related to the first difference;

and the updating module is used for determining a target loss based on a third difference between the first recommendation result and the second recommendation result and the error, and updating the first recommendation model according to the target loss.

In the present application, the third difference between the first recommended result and the second recommended result may represent the difference between the first predicted result and the second predicted result, although the prediction performance of the second predicted model itself is not high (because the number of the second candidate objects is lower as a training sample of the second predicted model), the calculated error may represent the processing error of the second predicted model, and the result obtained by combining (such as direct addition operation or other fusion operation) the third difference between the first recommended result and the second recommended result with the error may represent the difference between the predicted result and the accurate result of the first recommended result more accurately, and training the first predicted model based on the target loss constructed by the result may improve the prediction performance of the first predicted model for the random flow.

In one possible implementation, the first recommendation model is an initialized model.

In one possible implementation, the plurality of second candidate objects are randomly selected from a plurality of objects that have been presented to the target user, and the plurality of first candidate objects are randomly selected from a plurality of objects that have not been presented to the target user.

In one possible implementation, the error is also inversely related to a number of second candidates in the plurality of second candidates.

In one possible implementation, the first recommendation result and the second recommendation result respectively include a recommendation score for each of the first candidate objects; or alternatively

The first recommendation result and the second recommendation result respectively include a target recommendation object selected from the plurality of first candidate objects.

In one possible implementation, the feedforward module is further configured to:

Processing the plurality of second candidate objects through the first recommendation model to obtain a fourth recommendation result;

The updating module is specifically configured to:

a target loss is determined based on a third difference between the first recommendation and the second recommendation, a fourth difference between the fourth recommendation and the actual selection, and the error.

In one possible implementation, the acquiring module is further configured to:

obtaining a user attribute of the target user, wherein the user attribute comprises at least one of the following: gender, age, occupation, income, hobbies, education level;

the feedforward module is specifically configured to:

Processing the plurality of first recommended objects and the user attribute through the first recommendation model;

Processing the plurality of first recommended objects and the user attribute through the second recommendation model.

In one possible implementation, the first candidate object and the second candidate object include at least one of the following information:

the name of the candidate, the developer of the candidate, the size of the installation package of the candidate, the class of the candidate, and the good evaluation of the candidate.

In one possible implementation, the acquiring module is further configured to:

The operation data are acquired by the terminal equipment based on the operation of the target user on a target interface, wherein the target interface comprises a first interface and a second interface, the first interface comprises a control, the control is used for indicating whether to start collection of random traffic, the operation comprises a first operation of the target user aiming at the first control, the second interface is an interface displayed in response to the first operation, the first operation is used for indicating to start collection of the random traffic, the second interface comprises a plurality of second candidate objects, the operation further comprises a second operation of the target user aiming at a plurality of second candidate objects, and the second operation is used for determining the real selection result.

In a third aspect, embodiments of the present application provide a computing device that may include a memory for storing a program, a processor for executing the program in the memory to perform any of the alternative methods of the first aspect described above, and a bus system.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium having a computer program stored therein, which when run on a computer causes the computer to perform the first aspect and any optional method described above, and the second aspect and any optional method described above.

In a fifth aspect, embodiments of the present application provide a computer program product comprising code which, when executed, is adapted to carry out the first aspect and any of the optional methods described above.

In a sixth aspect, the present application provides a chip system comprising a processor for supporting an execution device or training device to perform the functions involved in the above aspects, e.g. to send or process data involved in the above method; or, information. In one possible design, the chip system further includes a memory for holding program instructions and data necessary for the execution device or the training device. The chip system can be composed of chips, and can also comprise chips and other discrete devices.

Drawings

FIG. 1 is a schematic diagram of a structure of an artificial intelligence main body frame;

FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a system architecture according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a recommended stream scene according to an embodiment of the present application;

FIG. 5 is a flowchart of a recommendation model training method according to an embodiment of the present application;

FIG. 6a is a schematic illustration of a first interface;

FIG. 6b is a schematic illustration of a second interface;

FIG. 6c is a flowchart of a recommendation model training method according to an embodiment of the present application;

FIG. 7 is a flowchart of a recommendation model training method according to an embodiment of the present application;

FIG. 8 is a schematic flow chart of a training device for recommendation model according to an embodiment of the present application;

FIG. 9 is a schematic diagram of an implementation device according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a training apparatus according to an embodiment of the present application;

fig. 11 is a schematic diagram of a chip according to an embodiment of the present application.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings in the embodiments of the present invention. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments of the invention only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As one of ordinary skill in the art can know, with the development of technology and the appearance of new scenes, the technical scheme provided by the embodiment of the application is also applicable to similar technical problems.

The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely illustrative of the manner in which embodiments of the application have been described in connection with the description of the objects having the same attributes. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, a schematic structural diagram of an artificial intelligence main body framework is shown in fig. 1, and the artificial intelligence main body framework is described below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where the "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process. The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.

(1) Infrastructure of

The infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform. Communicating with the outside through the sensor; the computing power is provided by a smart chip (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform comprises a distributed computing framework, a network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection and interworking networks and the like. For example, the sensor and external communication obtains data that is provided to a smart chip in a distributed computing system provided by the base platform for computation.

(2) Data

The data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to the internet of things data of the traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.

Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capability

After the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.

(5) Intelligent product and industry application

The intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, smart city etc.

The embodiment of the application can be applied to the field of information recommendation, wherein the scenes comprise but are not limited to scenes related to e-commerce product recommendation, search engine result recommendation, application market recommendation, music recommendation, video recommendation and the like, and the recommended objects in various application scenes are all called as 'objects' hereinafter so as to facilitate subsequent description, namely in different recommendation scenes, the recommended objects can be APP, video, or music, or a certain commodity (such as a presentation interface of an online shopping platform, different commodities can be displayed for presentation according to different users), and the essence can also be presented through the recommendation result of a recommendation model. These recommendation scenarios typically involve user behavior log collection, log data preprocessing (e.g. quantization, sampling, etc.), sample set training to obtain recommendation models, analysis of the objects (e.g. APP, music, etc.) involved in the scenario to which the training sample items correspond according to the recommendation models, e.g. the samples selected in the recommendation model training session come from the mobile phone APP application market user's operation behavior on the recommended APP, the recommendation models thus trained are then applicable to the mobile phone APP application market described above, or the APP application market for other types of terminals may be used to make recommendations of the terminal APP. The recommendation model finally calculates the recommendation probability or score of each object to be recommended, the recommendation system sorts the recommendation results selected according to a certain selection rule, for example, the recommendation results are sorted according to the recommendation probability or score, and the recommendation results are presented to the user through corresponding application or terminal equipment, and the user operates the objects in the recommendation results to generate links such as user behavior logs.

Referring to fig. 4, in the recommendation process, when a user interacts with the recommendation system, a recommendation request is triggered, the recommendation system inputs the request and related feature information into the deployed recommendation model, and then the click rate of the user on all candidate objects is predicted. And then, the candidate objects are arranged in a descending order according to the predicted click rate, and the candidate objects are displayed at different positions in order to serve as recommendation results for users. The user browses the presented items and user behavior such as browsing, clicking, downloading, etc. occurs. The user behaviors can be stored in a log to be used as training data, and the parameters of the recommendation model are updated irregularly through the offline training module, so that the recommendation effect of the model is improved.

For example, a user opens a mobile phone application market to trigger a recommendation module of the application market, and the recommendation module of the application market predicts the possibility of downloading given candidate applications by the user according to the historical downloading records of the user, the clicking records of the user, the self-characteristics of the applications, the time, the place and other environmental characteristic information. According to the predicted result, the application market is displayed according to the descending order of the possibility, and the effect of improving the application downloading probability is achieved. Specifically, applications that are more likely to be downloaded are ranked in a front position, and applications that are less likely to be downloaded are ranked in a rear position. The behavior of the user can be stored in a log, and parameters of the prediction model are trained and updated through the offline training module.

For example, in the application related to life mate, the cognitive brain can be built by simulating the brain mechanism through various models and algorithms based on the historical data of the user in the fields of video, music, news and the like, and the life learning system framework of the user is built. The life mate can record events occurring in the past of the user according to system data, application data and the like, understand the current intention of the user, predict future actions or behaviors of the user and finally realize intelligent service. In the current first stage, behavior data (including information such as terminal side short messages, photos and mail events) of a user are obtained according to a music APP, a video APP, a browser APP and the like, a user portrait system is built, and learning and memory modules based on user information filtering, association analysis, cross-domain recommendation, causal reasoning and the like are realized to build a user personal knowledge map.

Next, an application architecture of an embodiment of the present application is described.

Referring to fig. 2, an embodiment of the present invention provides a recommendation system architecture 200. The data collection device 260 is configured to collect samples, where one training sample may be composed of multiple feature information, and the feature information may include user feature information and object feature information, and tag features, where the user feature information is used to characterize features of a user, such as gender, age, occupation, hobbies, and the like, and the object feature information is used to characterize features of an object pushed to the user, where different recommendation systems correspond to different objects, where types of features required to be extracted by different objects are also different, for example, the object features extracted in the training sample of the APP market may be names (identifiers), types, sizes, and the like of APPs; the object features mentioned in the training sample of the e-commerce APP can be the names of commodities, the category to which the commodities belong, price intervals and the like; the label feature is used to indicate whether the sample is a positive example or a negative example, and in general, the label feature of the sample may be obtained through operation information of the recommended object by the user, the sample in which the user has operated the recommended object is a positive example, the recommended object is not operated by the user, or only the sample browsed is a negative example, for example, when the user clicks or downloads or purchases the recommended object, the label feature is 1, which indicates that the sample is a positive example, and if the user has not operated any recommended object, the label feature is 0, which indicates that the sample is a negative example. The samples may be stored in the database 230 after collection, and some or all of the characteristic information in the samples in the database 230 may also be obtained directly from the client device 240, such as user characteristic information, user operation information on the object (for determining a type identifier), object characteristic information (such as an object identifier), and so on. The training device 220 trains the acquisition model parameter matrix based on the samples in the database 230 for generating the recommendation model 201. How the training device 220 trains to obtain the model parameter matrix for generating the recommendation model 201 will be described in more detail below, the recommendation model 201 can be used to evaluate a large number of objects to obtain the scores of the respective objects to be recommended, further a specified or preset number of objects can be recommended from the evaluation results of the large number of objects, the calculation module 211 obtains the recommendation result based on the evaluation results of the recommendation model 201, and recommends the recommendation result to the client device through the I/O interface 212.

In the embodiment of the present application, the training device 220 may select positive samples and negative samples from the sample set in the database 230 and add the positive samples and the negative samples to the training set, and then train the samples in the training set by using a recommendation model (for example, the first recommendation model in the embodiment of the present application) to obtain a trained recommendation model; details of the implementation of the computing module 211 may be found in the detailed description of the method embodiment shown in fig. 5.

The training device 220 is used for constructing the recommendation model 201 after obtaining the model parameter matrix based on sample training, and then sending the recommendation model 201 to the execution device 210, or directly sending the model parameter matrix to the execution device 210, and constructing a recommendation model in the execution device 210 for recommending a corresponding system, for example, the recommendation model obtained based on sample training related to video can be used for recommending video to a user in a video website or an APP, and the recommendation model obtained based on sample training related to APP can be used for recommending APP to the user in an application market.

The execution device 210 is configured with an I/O interface 212, and performs data interaction with an external device, and the execution device 210 may obtain user characteristic information, such as a user identifier, a user identity, a gender, a occupation, a preference, etc., from the client device 240 through the I/O interface 212, and this part of information may also be obtained from a system database. The recommendation model 201 recommends a target recommended object to the user based on the user characteristic information and the object characteristic information to be recommended. The execution device 210 may be disposed in a cloud server or in a user client.

The execution device 210 may invoke data, code, etc. in the data storage system 250 and may store the output data in the data storage system 250. The data storage system 250 may be disposed in the execution device 210, may be disposed independently, or may be disposed in other network entities, and the number may be one or multiple.

The calculation module 211 processes the user feature information by using the recommendation model 201, and the object feature information to be recommended, for example, the calculation module 211 uses the recommendation model 201 to analyze and process the user feature information and the feature information of the object to be recommended, so as to obtain the score of the object to be recommended, and the object to be recommended is ranked according to the score, wherein the object ranked in front is to be the object recommended to the client device 240.

Finally, the I/O interface 212 returns the recommendation to the client device 240 for presentation to the user.

Further, the training device 220 may generate respective recommendation models 201 for different targets based on different sample characteristic information to provide better results to the user.

It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present invention, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 2, the data storage system 250 is an external memory with respect to the execution device 210, and in other cases, the data storage system 250 may be disposed in the execution device 210.

In the embodiment of the present application, the training device 220, the executing device 210, and the client device 240 may be three different physical devices, or the training device 220 and the executing device 210 may be on the same physical device or a cluster, or the executing device 210 and the client device 240 may be on the same physical device or a cluster.

Referring to fig. 3, a system architecture 300 is provided in accordance with an embodiment of the present invention. In this architecture the execution device 210 is implemented by one or more servers, optionally in cooperation with other computing devices, such as: data storage, routers, load balancers and other devices; the execution device 210 may be disposed on one physical site or distributed across multiple physical sites. The executing device 210 may use data in the data storage system 250 or call program codes in the data storage system 250 to implement an object recommendation function, specifically, input information of objects to be recommended into a recommendation model, generate a pre-estimated score for each object to be recommended by the recommendation model, sort the objects according to the pre-estimated score from high to low, and recommend the objects to be recommended to the user according to the sorting result. For example, the first 10 objects in the ranking result are recommended to the user.

The data storage system 250 is configured to receive and store parameters of the recommendation model sent by the training device, and data for storing recommendation results obtained by the recommendation model, and may also include program code (or instructions) required for normal operation of the storage system 250. The data storage system 250 may be a distributed storage cluster formed by one device or a plurality of devices disposed outside the execution device 210, and when the execution device 210 needs to use the data on the storage system 250, the storage system 250 may send the data required by the execution device to the execution device 210, and accordingly, the execution device 210 receives and stores (or caches) the data. Of course, the data storage system 250 may also be deployed within the execution device 210, and when deployed within the execution device 210, the distributed storage system may include one or more memories, and optionally, when there are multiple memories, different memories may be used to store different types of data, such as model parameters of a recommendation model generated by the training device and data of recommendation results obtained by the recommendation model, may be stored on two different memories, respectively.

The user may operate respective user devices (e.g., local device 301 and local device 302) to interact with the execution device 210. Each local device may represent any computing device, such as a personal computer, computer workstation, smart phone, tablet, smart camera, smart car or other type of cellular phone, media consumption device, wearable device, set top box, game console, etc.

The local device of each user may interact with the performing device 210 through a communication network of any communication mechanism/communication standard, which may be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.

In another implementation, the execution device 210 may be implemented by a local device, for example, the local device 301 may obtain user characteristic information and feed back recommendation results to the user based on a recommendation model implementing a recommendation function of the execution device 210, or provide services to the user of the local device 302.

Because the embodiments of the present application relate to a large number of applications of neural networks, for convenience of understanding, related terms and related concepts of the neural networks related to the embodiments of the present application will be described below.

1. Click probability (click-throughrate, CTR)

The click probability, which may also be referred to as a click rate, refers to the ratio of the number of clicks to the number of exposures of recommended information (e.g., recommended items) on a web site or application, and is typically an important indicator in a recommendation system to measure the recommendation system.

2. Personalized recommendation system

The personalized recommendation system is a system for analyzing according to historical data (such as operation information in the embodiment of the application) of a user by utilizing a machine learning algorithm, predicting a new request and giving a personalized recommendation result.

3. Offline training (offlinetraining)

The offline training refers to a module for iteratively updating recommendation model parameters according to an algorithm learned by a device in a personalized recommendation system according to historical data (such as operation information in the embodiment of the application) of a user until the recommendation model parameters reach set requirements.

4. Online prediction (onlineinference)

The online prediction refers to predicting the preference degree of the user for the recommended item in the current context according to the characteristics of the user, the item and the context based on an offline trained model, and predicting the probability of selecting the recommended item by the user.

5. The inverse facts technology: the unobserved world is learned and inferred by the counterfactual, the imagination space is expanded, the constraint of the real world is eliminated, and new things are created.

6. Random flow rate: for user requests, we intervene in the recommendation system, i.e. we do not use recommendation strategies to assign items any more, but randomly sample some items from all candidate sets, randomly sort them, show them to users and collect corresponding feedback.

7. Log data: from a back-fact learning perspective, the current collection of user log data presents various bias issues (location bias, selection bias, etc.), which are considered biased data.

8. Exposure data: the recommendation system presents the data to the user.

9. Unexposed data: the recommendation system has not yet presented data to the user.

10. Full data: assuming that the recommender system can present all items to the user and collect the user's feedback on all items, this collected data is referred to as full data.

11. Position bias: the user's tendency to select an item in a better location for interaction is described, regardless of whether the item meets the user's actual needs.

12. Selecting bias: what happens is that a "population under study" is not able to represent a "target population" such that the risk/benefit measure for the "population under study" is not able to accurately characterize the "target population" resulting in the conclusion not being able to be effectively generalized.

13. Positive example: also referred to as positive samples, represent samples with user positive feedback, such as samples with download or purchase behavior.

15. Negative example: also called negative samples, represent samples with negative feedback from the user, such as poor ratings or browsing behavior only.

16. Training set: a sample set used to train the model.

17. And (3) tag: the sample is marked as positive or negative, e.g., 1 is positive and 0 is negative.

18. Life-long learning: based on the historical data of the user in the fields of video, music, news and the like, the cognitive brain is built by simulating a brain mechanism through various models and algorithms, and a user life-long learning system framework is built. Through personalized learning, the true intention understanding of the user is realized through reasoning, so that accurate service recommendation is enabled, and the user viscosity is enhanced.

19. Neural network

The neural network may be composed of neural units, which may refer to an arithmetic unit with xs (i.e., input data) and intercept 1 as inputs, and the output of the arithmetic unit may be:

Where s=1, 2, … … n, n is a natural number greater than 1, ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit to an output signal. The output signal of the activation function may be used as an input to a next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

20. Deep neural network

Deep neural networks (Deep Neural Network, DNN), also known as multi-layer neural networks, can be understood as neural networks having many hidden layers, many of which are not particularly metrics. From DNNs, which are divided by the location of the different layers, the neural networks inside the DNNs can be divided into three categories: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression: Wherein, Is the input vector which is to be used for the input,Is the output vector of the vector,Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for input vectorsObtaining the output vector through such simple operationSince the DNN layer number is large, the coefficient W and the offset vectorAnd thus a large number. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined asThe superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4. The summary is: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined asIt should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

21. Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be predicted to be lower, and the adjustment is continued until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.

22. Back propagation algorithm

An error Back Propagation (BP) algorithm may be used to correct the magnitude of the parameters in the initial model during the training process, so that the error loss of the model is smaller and smaller. Specifically, the input signal is forward-transferred until output, and error loss occurs, and parameters in the initial model are updated by back-propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation motion that dominates the error loss, aiming at deriving optimal model parameters, such as a weight matrix.

The recommended model training method provided by the embodiment of the application is described below by taking a model training stage as an example.

Referring to fig. 5, fig. 5 is an embodiment schematic diagram of a recommendation model training method provided by the embodiment of the present application, and as shown in fig. 5, the recommendation method provided by the embodiment of the present application includes:

501. a first recommendation model and a plurality of first candidate objects are acquired.

In the embodiment of the present application, the execution body of step 501 may be a cloud-side server or an end-side device with model training capability, which is not limited herein, and the specific structure may refer to, but is not limited to, the description of the training device 220 in the above embodiment.

In one possible implementation, the first recommendation model may be an initialized model, where the initialized model may be understood as a model with parameters randomly initialized, it should be understood that the first recommendation model may also be a model with fewer training times and without higher recommendation performance, or the first recommendation model may also be a model obtained by training log data, where the recommendation result of the model with respect to the full amount of data (such as non-exposure data) is not accurate (inaccuracy may be understood as a large difference from the actual selection result of the user).

In one possible implementation, the first recommendation model may be a machine learning model, which may consist of, for example, a single-level linear or nonlinear operation (e.g., support vector machine (support vector machines, SVM)) or may be a deep network, i.e., a machine learning model consisting of multiple levels of nonlinear operation. An example of a deep network is a neural network with one or more hidden layers, and such a machine learning model may be trained, for example, by adjusting the weights of the neural network in accordance with a back propagation learning algorithm or the like.

In order to reduce the impact of the position bias on the recommendation model, a random flow of data may be selected in the selection of training data, wherein the random flow of data may be understood as being presented to the target user based on a random selection of recommended objects from the candidate set of objects, which may be operated (e.g. selected or not). The above manner can be referred to as an unbiased optimization technique introducing random traffic, which aims to introduce a random traffic collected by a random policy to provide unbiased information, so as to guide the existing recommendation model based on log data to alleviate bias problems in the training process. The random strategy does not rely on any recommendation model, but rather randomly selects from a set of candidate items and presents them in a random order. Since the source of bias problems is avoided as much as possible, the random traffic collected under this strategy can be considered as a proxy for unbiased distribution, i.e. the recommendation model trained from random traffic is also relatively unbiased.

However, the collection process of random traffic is expensive because it can compromise the benefits of the platform and the user's experience. Thus, the collection of random traffic is typically limited to a small recommended proportion, resulting in the number of random traffic being sparse relative to the log data. Random traffic at this scale may not be a good alternative to ideal distribution.

In order to solve the problem that the random traffic amount is too small, some objects can be randomly selected from unexposed data (i.e. objects which are not presented to the target user), and the selection result of the target user for the objects is predicted, so that more data equivalent to the random traffic amount can be constructed as training samples. How to accurately predict the selection result of the target user for these objects will be described in the following embodiments.

Wherein the term "not yet presented to the target user" is understood to mean that it has not yet been presented in one recommendation at a time;

alternatively, the plurality of first candidate objects may be objects that are presented in one recommendation at a time, and the recommendation may be all or part of the recommendation system from which the sample originated. For example, a certain sample set includes a sample 1, a sample 2, and a sample 3, wherein the sample 1 includes a feature for describing the object 1, the sample 2 includes a feature for describing the object 2, and the sample 3 includes a feature for describing the object 3; here, object 1, object 2 and object 3 belong to three objects that are presented on one recommendation at a time. For the sake of easy understanding, the "object displayed on a recommendation result at the same time" is illustrated herein, and when we open the main page of the application market, the application market will display "utility" APP recommendation, "audio-visual entertainment" APP recommendation, etc., where the "utility" APP recommendation and the "audio-visual entertainment" APP recommendation do not belong to a recommendation result; the user uses the application market to display the recommended results in the previous day, and uses the application market to display the recommended results in the next day, so that the recommended results displayed in the previous day and the recommended results displayed in the next day do not belong to the same display.

Alternatively, the obtaining of the plurality of first candidate objects may be understood as obtaining feature information of each first candidate object in the plurality of first candidate objects, where the feature information may include one or more of a name of the candidate object (or referred to as an object Identification (ID)), an Identification (ID) of an APP recommendation result to which the object belongs (e.g., a utility tool, an audio-visual entertainment, etc.), a profile of the candidate object, a size of the candidate object (e.g., when the candidate object is an APP, the size of the candidate object may be an installation package size of the candidate object), a developer of the candidate object, a tag of the object (e.g., the tag may indicate a class of the candidate object), a comment of the candidate object (e.g., a plausibility of the candidate object), and other attribute information of the candidate object may not include these information.

502. Processing the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result;

In one possible implementation, the plurality of first candidate objects may be processed by the first recommendation model, that is, the plurality of first candidate objects are taken as input of the first recommendation model, and a feed-forward process of the first recommendation model is performed.

In one possible implementation, the output obtained by performing the feedforward process of the first recommendation model may be a first recommendation result, and optionally, the first recommendation result may include a recommendation score of each of the first candidate objects; or a target recommended object selected from the plurality of first candidate objects, wherein the target recommended object may be a part of objects in the plurality of first candidate objects (for example, may be a plurality of objects with highest recommendation scores).

For example, if the plurality of first candidate objects include APP1, APP2, APP3, APP4, APP5, APP6, APP7, APP8, APP9, APP10, the user characteristic information (for example, gender is male, age is 25, occupation is a software engineer, etc.) of the user U1 and the respective object characteristic information (for example, APP identification, APP profile, etc.) of the ten APPs are input to the first recommendation model, the first recommendation model may calculate predicted scores for the ten APPs, respectively, if the scores calculated for the ten APPs are: if app1=3.7, app2=2.2, app3=4.5, app4=4.3, app5=4.8, app6=1, app7=2.5, app8=3.0, app9=3.2, app10=1.1, then the order result (or list) is obtained by sorting from high to low according to the estimated score, and finally the object in which the top 5 bits (if M is equal to 5) are arranged in advance can be regarded as the target recommended object.

503. Processing the plurality of first candidate objects through a second recommendation model to obtain a second recommendation result; the second recommendation model is trained based on operation data of a target user, the operation data comprises a plurality of second candidate objects and real selection results of the target user for the plurality of second candidate objects, the second candidate objects are different from the first candidate objects, and the results obtained by processing the plurality of second candidate objects by the second recommendation model are third recommendation results.

The second recommendation model may be a model obtained based on random flow training, and because the number of random flows is smaller, the recommendation accuracy of the second recommendation model is lower (for example, the variance of the recommendation result is larger).

In one possible implementation, the operation data may be acquired based on an interface on a terminal, where the operation data is acquired by the terminal device based on an operation of the target user on a target interface, where the target interface includes a first interface and a second interface, the first interface includes a control, where the control is used to indicate whether to start collection of random traffic, the operation includes a first operation of the target user with respect to the first control, the second interface is an interface displayed in response to the first operation, the first operation is used to indicate to start collection of the random traffic, the second interface includes the plurality of second candidate objects, and the operation further includes a second operation of the target user with respect to the plurality of second candidate objects, where the second operation is used to determine the true selection result.

Referring to fig. 6a and 6b, fig. 6a is a schematic representation of a first interface and fig. 6b is a schematic representation of a second interface.

In one possible implementation, the true selection result may indicate whether the second candidate object is a positive sample or a negative sample, i.e. a sample type label (label feature), and whether a sample belongs to a positive sample or a negative sample may be obtained by identifying a sample type label in the sample, e.g. when the sample type label of a sample is 1, this indicates that the sample is a positive sample, and when the sample type label of a sample is 0, this indicates that the sample is a negative sample. The sample type label of a sample is determined by the operation information of the user on the object described by the features in the sample, for example, the operation information is used for representing operations such as "browse", "download", "comment", "buy", etc., and the operation information means that the sample type labels are different, for example, the sample type label is used for marking negative samples when the operation is "browse", and the sample type label is used for marking positive samples when the operation acts as "download". In practical applications, which operation corresponds to the positive sample and which operation corresponds to the negative sample can be predefined.

For how to train the second predictive model based on the operational data, reference may be made to the related description of model training in the prior art, which is not repeated here.

Regarding the model structure of the second recommendation model, the feedforward process of the second recommendation model, and the description of the second recommendation result, reference may be made to the model structure of the first recommendation model, the feedforward process of the first recommendation model, and the description of the first recommendation result in the above embodiments, and the description of the same will not be repeated here.

504. Predicting an error of the second recommendation result according to the similarity between the first candidate objects and the second candidate objects and the first difference between the third recommendation result and the real selection result, wherein the error is inversely related to the similarity, and the error is positively related to the first difference;

In one aspect, the first difference between the third recommended result and the actual selection result may express a model error of the second recommended model, that is, an error of the second recommended result, to some extent. Alternatively, the error of the second recommendation is positively correlated with the first difference, so-called positive correlation, which is understood to be that the larger the first difference between the third recommendation and the actual selection result, the larger the error of the second recommendation (in case other information is unchanged).

It should be appreciated that the first difference herein may be measured based on Euclidean distance (eucledian distance), manhattan distance (MANHATTAN DISTANCE), minkowski distance (minkowski distance), cosine similarity (cosine similarity), jaccard coefficients, pearson correlation coefficients (pearson correlation coefficient), and the like, without limitation.

On the other hand, when the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects is large, it can be said that the error of the second recommendation result is large (because, when the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects is small, equivalent to a training sample that is not employed when the plurality of first candidate objects are trained for the second prediction model, the data processing accuracy of the second prediction model is lower with respect to the data feature of the plurality of first candidate objects, that is, the error of the second recommendation result is larger), that is, the error of the second recommendation result is inversely correlated with the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects, so-called negative correlation, can be understood as the larger the degree of similarity between the plurality of first candidate objects and the plurality of second candidate objects is, the smaller the error of the second recommendation result (in the case where the other information is unchanged).

It should be appreciated that the similarity herein may be measured based on Euclidean distance (eucledian distance), manhattan distance (MANHATTAN DISTANCE), minkowski distance (minkowski distance), cosine similarity (cosine similarity), jaccard coefficients, pearson correlation coefficients (pearson correlation coefficient), and the like, and is not limited in this regard.

On the other hand, although the number of second candidates among the plurality of second candidates is not high, when the number of second candidates among the plurality of second candidates is larger, the recommendation accuracy of the second recommendation model can be considered to be high, and therefore, the error can also be inversely correlated with the number of second candidates among the plurality of second candidates, so-called negative correlation, which can be understood as the smaller the error of the second recommendation result (in the case where other information is unchanged) is the larger the number of second candidates among the plurality of second candidates.

In one possible implementation, the error includes a corresponding bias term for the second recommendation, a corresponding variance term for the second recommendation, and the first difference sum, the bias term being inversely related to the similarity, the variance term being inversely related to a number of second candidates in the plurality of second candidates.

In one possible implementation, although the recommendation accuracy of the second recommendation model is low, since the second recommendation model is also obtained based on the random traffic (i.e. the operation data described above), a certain recommendation accuracy is provided (at least the data processing accuracy for the random traffic is higher than for the first recommendation model), so that the third difference between the first recommendation result and the second recommendation result can be taken as a part of the loss for updating the first recommendation model, and the error can also be taken as a part of the loss.

505. A target loss is determined based on a third difference between the first recommendation and the second recommendation, and the error, and the first recommendation model is updated according to the target loss.

In the embodiment of the present application, the third difference between the first recommended result and the second recommended result may represent the difference between the first prediction model and the second prediction model, although the prediction performance of the second prediction model itself is not high (because the number of the second candidate objects is lower as a training sample of the second prediction model), the calculated error may represent the processing error of the second prediction model, and the result obtained by combining (for example, direct addition operation or other fusion operation) the third difference between the first recommended result and the second recommended result with the error may represent the difference between the prediction result and the accurate result of the first recommended result more accurately, and training the first prediction model based on the target loss constructed by the result may improve the prediction performance of the first prediction model for random flow.

In addition, the first predictive model may also be trained using log data and tagged random traffic.

When the first predictive model is trained using log data, the log data may be processed based on the first predictive model and the difference between the processing result and the actual tag of the log data may be taken as part of the target loss.

When the tagged random traffic is used to train the first predictive model, the plurality of second candidate objects may be processed by the first predictive model to obtain a fourth recommended result, and a fourth difference between the fourth recommended result and the true selected result may be taken as part of the target loss.

In particular, when the acquisition of random traffic is activated, the system can switch to a training mode based on unbiased optimization paradigm of random traffic, which is intended to enable the model trained by log data to approximate as closely as possible the full data distribution under random policy, with the following optimization objectives:

Wherein R ^t represents the real label of random traffic, Representing the prediction error of the model for random traffic.

Optionally, the error may include a corresponding bias term of the second recommendation, a corresponding variance term of the second recommendation, and the first difference sum, wherein the bias term is inversely related to the similarity, and the variance term is inversely related to a number of second candidates in the plurality of second candidates.

By way of example, the expression for this error may be as follows:

Wherein item a Representing a prediction error of the first recommendation model for the tagged random traffic (i.e., the plurality of second candidates); because the corresponding label of the user interaction corresponding to the log data under the random strategy is difficult to acquire, the item bIs not optimizable; item cRepresenting a prediction error of the first recommendation model for the log data; item dRepresenting predicted differences of the first recommendation model and the second recommendation model on the unexposed data (i.e., the first plurality of candidates); item eThe bias term and variance term represent the prediction error of the second recommended model on the unexposed data (i.e., the first plurality of candidates), with the last two terms being the bias term and variance term, respectively.

It should be appreciated that the expression for the error may be:

based on unequal The method can obtain the following steps:

After obtaining the error, a counterfacts recommendation method based on the generalized error upper bound may be performed to obtain a recommendation model, and the optimization function (i.e., the target loss) of the recommendation method may be expressed as follows:

The overall flow architecture may refer to the flow shown in fig. 6 c.

As shown in fig. 7, the input data in the framework includes random traffic data unitorm data (S _t), biased data non-unitorm data (S _c), and unobserved sample data unobserved data (S _u), the output being a recommended model. After collecting random traffic data, a pre-trained random model is first obtained. The random model and all data sources are then input into an unbiased optimization paradigm based on random traffic, related data is called according to different error terms and an optimization process is performed, and finally a more ideal recommendation model is obtained. By the method, an unbiased optimization paradigm based on random flow is designed, and theoretical completeness is provided for random flow use. The unbiased nature of this paradigm in combination with random flow enables the model to more fully approximate the unbiased distribution. In addition, a counterfactual recommendation method based on the generalization error upper bound is designed, and actual optimization is carried out on the proposed unbiased optimization paradigm. Direct optimization of an unbiased optimization paradigm is difficult, and by deriving an upper bound of generalization error for the paradigm, the proposed counterfactual recommendation method aims to optimize the upper bound sufficiently, statistically equivalent to a stepwise optimization of an unbiased optimization paradigm.

Taking life learning items as an example, in the life learning items, video APP, music APP, browser APP, application market APP and the like can be involved, user behavior data can be obtained from the end side APP, the life learning items can be used for constructing a user portrait system, user log data also need to be imported into a learning memory module, more valuable user characteristics are mined from multi-domain data, a user personal knowledge base is constructed, and a user personal knowledge map is jointly constructed by combining the user portrait system.

The single domain recommendation related to the project needs to use an algorithm in a recommendation system, the problem that the real behavior of a user cannot be reflected due to data bias existing in the recommendation system is inevitably solved, the real behavior of the user cannot be reflected if the collected log data of each domain has bias problem because the object of the project service is the user for life learning, and the constructed personal knowledge graph of the user is inaccurate.

Taking the recommendation of application market business as an example. The "recommended" home page can be seen after the APP is launched, and multiple sheets exist in this page. Taking the exquisite application as an example, a recommendation system of the application market predicts the click probability of a user on the candidate application according to the user, the candidate set commodity and the contextual characteristics, and arranges the candidate applications in descending order according to the probability, so that the application most likely to be downloaded is arranged at the forefront position. After the user sees the recommended result of the application market, according to the personal interests, the user selects operations such as browsing, clicking or downloading, and the user behaviors are all stored in the log.

The application marketplace uses these accumulated user behavior logs as training data to train click rate prediction models offline. However, the collected user behavior logs have the problems of position bias, selection bias and the like, in order to eliminate the influence of the bias on the click rate prediction model, the unbiased recommendation model is obtained by training by combining the training method of the prediction model provided by the application, the influence of false correlation on user preference estimation can be effectively avoided, and the method is beneficial to mining the causal interest of the user.

Compared with the prior art. In order to verify the accuracy of the model obtained by training the embodiment of the application, application market business data of two weeks are obtained, a complete offline experiment is performed, and the result of the current technology on the AUC index is compared. Through experimental tests, the following conclusions are drawn: there was a 4.86% increase in AUC compared to baseline. The model trained by the embodiment of the application also executes similar experimental tests on the public data set to obtain the following conclusion: there was a 3.25% improvement in AUC over baseline and a 170% improvement in nDCG. In addition, the model obtained by training in the embodiment of the application can effectively improve the recommended hit rate of long-tail articles. In addition, the model trained by the embodiment of the application is used for carrying out experiments on interest mining and user portrayal system construction based on causal reasoning. The offline experiment results were as follows: in a user portrait system, the accuracy of an algorithm based on gender prediction is improved by more than 3% compared with that of a base line, the accuracy of an age multi-classification task is improved by nearly 8% compared with that of the base line, and the accuracy variance of each age group is reduced by 50% by introducing anti-facts and causal learning. The user interest mining based on the anti-facts recommendation replaces an algorithm based on association rule learning, so that an effective action set of a user is effectively reduced, and the interpretability of the user preference label is provided.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a recommended model training device according to an embodiment of the present application, and referring to fig. 8, the device 800 may include:

An obtaining module 801, configured to obtain a first recommendation model and a plurality of first candidate objects;

For a specific description of the acquiring module 801, reference may be made to the description of step 501 in the above embodiment, which is not repeated here,

A feed-forward module 802, configured to process the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result;

For a specific description of the feedforward module 802, reference may be made to the descriptions of step 502 and step 503 in the above embodiments, which are not repeated here.

An error determining module 803, configured to predict an error of the second recommendation result according to a similarity between the plurality of first candidate objects and the plurality of second candidate objects, and a first difference between the third recommendation result and the actual selection result, where the error is inversely related to the similarity, and the error is positively related to the first difference;

For a specific description of the error determining module 803, reference may be made to the description of step 504 in the above embodiment, which is not repeated here.

An updating module 804, configured to determine a target loss based on a third difference between the first recommendation result and the second recommendation result and the error, and update the first recommendation model according to the target loss.

For a specific description of the update module 804, reference may be made to the description of step 505 in the above embodiment, which is not repeated here.

The updating module is specifically configured to:

In one possible implementation, the acquiring module is further configured to:

the feedforward module is specifically configured to:

In one possible implementation, the acquiring module is further configured to:

Referring to fig. 9, fig. 9 is a schematic structural diagram of an execution device provided by an embodiment of the present application, and the execution device 900 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, etc., which is not limited herein. The execution device 900 may be configured with the data processing apparatus described in the corresponding embodiment of fig. 10, to implement the functions of data processing in the corresponding embodiment of fig. 10. Specifically, the execution device 900 includes: a receiver 901, a transmitter 902, a processor 903, and a memory 904 (where the number of processors 903 in the execution device 900 may be one or more), where the processor 903 may include an application processor 9031 and a communication processor 9032. In some embodiments of the application, the receiver 901, transmitter 902, processor 903, and memory 904 may be connected by a bus or other means.

Memory 904 may include read-only memory and random access memory, and provides instructions and data to the processor 903. A portion of the memory 904 may also include non-volatile random access memory (NVRAM). The memory 904 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an extended set thereof, wherein the operating instructions may include various operating instructions for implementing various operations.

The processor 903 controls the operation of the execution device. In a specific application, the individual components of the execution device are coupled together by a bus system, which may include, in addition to a data bus, a power bus, a control bus, a status signal bus, etc. For clarity of illustration, however, the various buses are referred to in the figures as bus systems.

The method disclosed in the above embodiments of the present application may be applied to the processor 903 or implemented by the processor 903. The processor 903 may be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry of hardware in the processor 903 or instructions in the form of software. The processor 903 may be a general-purpose processor, a digital signal processor (DIGITAL SIGNAL processing, DSP), a microprocessor or microcontroller, a visual processor (vision processing unit, VPU), a tensor processor (tensor processing unit, TPU), or the like, which is suitable for AI operation, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component. The processor 903 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 904 and the processor 903 reads the information in the memory 904 and in combination with its hardware executes the model trained in the corresponding embodiment of fig. 5.

The receiver 901 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 902 is operable to output numeric or character information via a first interface; the transmitter 902 is further operable to send instructions to the disk stack via the first interface to modify data in the disk stack; the transmitter 902 may also include a display device such as a display screen.

Referring to fig. 10, fig. 10 is a schematic structural diagram of the training device according to the embodiment of the present application, specifically, the training device 1000 is implemented by one or more servers, where the training device 1000 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 1010 (e.g., one or more processors) and a memory 1032, and one or more storage mediums 1030 (e.g., one or more mass storage devices) storing application programs 1042 or data 1044. Wherein memory 1032 and storage medium 1030 may be transitory or persistent. The program stored on storage medium 1030 may include one or more modules (not shown), each of which may include a series of instruction operations for use in training apparatus. Still further, the central processor 1010 may be configured to communicate with a storage medium 1030 to perform a series of instruction operations in the storage medium 1030 on the exercise device 1000.

The training device 1000 may also include one or more power supplies 1026, one or more wired or wireless network interfaces 1050, one or more input/output interfaces 1058; or, one or more operating systems 1041, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, etc.

Specifically, the training device may perform the steps from step 501 to step 505 in the above embodiment.

Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the steps as performed by the aforementioned performing device or causes the computer to perform the steps as performed by the aforementioned training device.

The embodiment of the present application also provides a computer-readable storage medium having stored therein a program for performing signal processing, which when run on a computer, causes the computer to perform the steps performed by the aforementioned performing device or causes the computer to perform the steps performed by the aforementioned training device.

The execution device, training device or terminal device provided in the embodiment of the present application may be a chip, where the chip includes: a processing unit, which may be, for example, a processor, and a communication unit, which may be, for example, an input/output interface, pins or circuitry, etc. The processing unit may execute the computer-executable instructions stored in the storage unit to cause the chip in the execution device to perform the data processing method described in the above embodiment, or to cause the chip in the training device to perform the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, or the like, and the storage unit may also be a storage unit in the wireless access device side located outside the chip, such as a read-only memory (ROM) or other type of static storage device that may store static information and instructions, a random access memory (random access memory, RAM), or the like.

Specifically, referring to fig. 11, fig. 11 is a schematic structural diagram of a chip provided in an embodiment of the present application, where the chip may be represented as a neural network processor NPU1100, and the NPU1100 is mounted as a coprocessor on a main CPU (Host CPU), and the Host CPU distributes tasks. The core part of the NPU is an arithmetic circuit 1103, and the controller 1104 controls the arithmetic circuit 1103 to extract matrix data in the memory and perform multiplication.

The NPU 1100 may implement the recommended model training method provided in the embodiment depicted in fig. 5 through inter-cooperation between the various devices inside.

More specifically, in some implementations, the arithmetic circuit 1103 in the NPU 1100 includes a plurality of processing units (PEs) inside. In some implementations, the operational circuit 1103 is a two-dimensional systolic array. The arithmetic circuit 1103 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1103 is a general purpose matrix processor.

For example, assume that there is an input matrix a, a weight matrix B, and an output matrix C. The arithmetic circuit fetches the data corresponding to the matrix B from the weight memory 1102 and buffers the data on each PE in the arithmetic circuit. The arithmetic circuit takes matrix a data from the input memory 1101 and performs matrix operation with matrix B, and the obtained partial result or final result of the matrix is stored in an accumulator (accumulator) 1108.

The unified memory 1106 is used for storing input data and output data. The weight data is carried directly through the memory cell access controller (Direct Memory Access Controller, DMAC) 1105, the DMAC into the weight memory 1102. The input data is also carried into the unified memory 1106 through the DMAC.

BIU is Bus Interface Unit, bus interface unit 1110, for the AXI bus to interact with the DMAC and finger memory (Instruction Fetch Buffer, IFB) 1109.

The bus interface unit 1110 (Bus Interface Unit, abbreviated as BIU) is configured to fetch the instruction from the external memory by the instruction fetch memory 1109, and further configured to fetch the raw data of the input matrix a or the weight matrix B from the external memory by the memory unit access controller 1105.

The DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1106 or to transfer weight data to the weight memory 1102 or to transfer input data to the input memory 1101.

The vector calculation unit 1107 includes a plurality of operation processing units, and further processes the output of the operation circuit 1103, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like, as needed. The method is mainly used for non-convolution/full-connection layer network calculation in the neural network, such as Batch Normalization (batch normalization), pixel-level summation, up-sampling of a characteristic plane and the like.

In some implementations, the vector computation unit 1107 can store the vector of processed outputs to the unified memory 1106. For example, the vector calculation unit 1107 may perform a linear function; alternatively, a nonlinear function is applied to the output of the arithmetic circuit 1103, such as linear interpolation of the feature planes extracted by the convolutional layer, and then such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 1107 generates normalized values, pixel-level summed values, or both. In some implementations, the vector of processed outputs can be used as an activation input to the operational circuitry 1103, e.g., for use in subsequent layers in a neural network.

An instruction fetch memory (instruction fetch buffer) 1109 connected to the controller 1104 for storing instructions used by the controller 1104;

The unified memory 1106, the input memory 1101, the weight memory 1102 and the finger memory 1109 are all On-Chip memories. The external memory is proprietary to the NPU hardware architecture.

The processor mentioned in any of the above may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the above-mentioned programs.

It should be further noted that the above-described apparatus embodiments are merely illustrative, and that the units described as separate units may or may not be physically separate, and that units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. In addition, in the drawings of the embodiment of the device provided by the application, the connection relation between the modules represents that the modules have communication connection, and can be specifically implemented as one or more communication buses or signal lines.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general purpose hardware, or of course by means of special purpose hardware including application specific integrated circuits, special purpose CPUs, special purpose memories, special purpose components, etc. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions can be varied, such as analog circuits, digital circuits, or dedicated circuits. But a software program implementation is a preferred embodiment for many more of the cases of the present application. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk or an optical disk of a computer, etc., comprising several instructions for causing a computer device (which may be a personal computer, a training device, a network device, etc.) to perform the method according to the embodiments of the present application.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, training device, or data center to another website, computer, training device, or data center via a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device, a data center, or the like that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk (Solid STATE DISK, SSD)), etc.

Claims

A recommendation model training method, the method comprising:

Acquiring a first recommendation model and a plurality of first candidate objects;

Processing the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result;

Processing the plurality of first candidate objects through a second recommendation model to obtain a second recommendation result; the second recommendation model is trained based on operation data of a target user, the operation data comprises a plurality of second candidate objects and real selection results of the target user for the plurality of second candidate objects, the second candidate objects are different from the first candidate objects, and the results obtained by processing the plurality of second candidate objects by the second recommendation model are third recommendation results;

Predicting an error of the second recommendation result according to the similarity between the first candidate objects and the second candidate objects and the first difference between the third recommendation result and the real selection result, wherein the error is inversely related to the similarity, and the error is positively related to the first difference;

a target loss is determined based on a third difference between the first recommendation and the second recommendation, and the error, and the first recommendation model is updated according to the target loss.
The method of claim 1, wherein the first recommendation model is an initialized model.
The method according to claim 1 or 2, wherein the first plurality of candidate objects are objects that are not presented to the target user and the second plurality of candidate objects are objects that have been presented to the target user.
A method according to any one of claims 1 to 3, wherein the plurality of second candidate objects are randomly selected from a plurality of objects that have been presented to the target user, and the plurality of first candidate objects are randomly selected from a plurality of objects that have not been presented to the target user.
The method of any one of claims 1 to 4, wherein the error is further inversely related to a number of second candidates in the plurality of second candidates.
The method of any of claims 1-5, wherein the error comprises a corresponding bias term for the second recommendation, a corresponding variance term for the second recommendation, the bias term being inversely related to the similarity, and the variance term being inversely related to a number of second candidates in the plurality of second candidates, and the first difference sum.
The method of any one of claims 1 to 6, wherein the first recommendation and the second recommendation each comprise a recommendation score for each of the first candidate objects; or alternatively

The first recommendation result and the second recommendation result respectively include a target recommendation object selected from the plurality of first candidate objects.
The method according to any one of claims 1 to 7, further comprising:

Processing the plurality of second candidate objects through the first recommendation model to obtain a fourth recommendation result;

The determining a target loss based on a third difference between the first recommendation and the second recommendation, and the error, comprising:

a target loss is determined based on a third difference between the first recommendation and the second recommendation, a fourth difference between the fourth recommendation and the actual selection, and the error.
The method according to any one of claims 1 to 8, further comprising:

obtaining a user attribute of the target user, wherein the user attribute comprises at least one of the following: gender, age, occupation, income, hobbies, education level;

the processing the plurality of first recommended objects by the first recommendation model includes:

Processing the plurality of first recommended objects and the user attribute through the first recommendation model;

The processing the plurality of first recommended objects through the second recommendation model includes:

Processing the plurality of first recommended objects and the user attribute through the second recommendation model.
The method according to any one of claims 1 to 9, further comprising:

The operation data are acquired by the terminal equipment based on the operation of the target user on a target interface, wherein the target interface comprises a first interface and a second interface, the first interface comprises a control, the control is used for indicating whether to start collection of random traffic, the operation comprises a first operation of the target user aiming at the first control, the second interface is an interface displayed in response to the first operation, the first operation is used for indicating to start collection of the random traffic, the second interface comprises a plurality of second candidate objects, the operation further comprises a second operation of the target user aiming at a plurality of second candidate objects, and the second operation is used for determining the real selection result.
A recommendation model training device, the device comprising:

the acquisition module is used for acquiring the first recommendation model and a plurality of first candidate objects;

The feedforward module is used for processing the plurality of first candidate objects through the first recommendation model to obtain a first recommendation result;

Processing the plurality of first candidate objects through a second recommendation model to obtain a second recommendation result; the second recommendation model is trained based on operation data of a target user, the operation data comprises a plurality of second candidate objects and real selection results of the target user for the plurality of second candidate objects, the second candidate objects are different from the first candidate objects, and the results obtained by processing the plurality of second candidate objects by the second recommendation model are third recommendation results;

An error determining module, configured to predict an error of the second recommendation result according to a similarity between the plurality of first candidate objects and the plurality of second candidate objects, and a first difference between the third recommendation result and the actual selection result, where the error is inversely related to the similarity, and the error is positively related to the first difference;

and the updating module is used for determining a target loss based on a third difference between the first recommendation result and the second recommendation result and the error, and updating the first recommendation model according to the target loss.
The apparatus of claim 11, wherein the first recommendation model is an initialized model.
The apparatus of claim 11 or 12, wherein the first plurality of candidate objects are objects that are not presented to the target user and the second plurality of candidate objects are objects that have been presented to the target user.
The apparatus of any of claims 11 to 13, wherein the second plurality of candidates are randomly selected from a plurality of objects that have been presented to the target user, and the first plurality of candidates are randomly selected from a plurality of objects that have not been presented to the target user.
The apparatus of any of claims 11 to 14, wherein the error is further inversely related to a number of second candidates in the plurality of second candidates.
The apparatus of any of claims 11 to 15, wherein the error comprises a corresponding bias term for the second recommendation, a corresponding variance term for the second recommendation, the bias term being inversely related to the similarity, and the variance term being inversely related to a number of second candidates in the plurality of second candidates, and the first difference sum.
The apparatus of any one of claims 11 to 16, wherein the first recommendation and the second recommendation each comprise a recommendation score for each of the first candidate objects; or alternatively

The first recommendation result and the second recommendation result respectively include a target recommendation object selected from the plurality of first candidate objects.
The apparatus of any of claims 11 to 17, wherein the feed forward module is further configured to:

Processing the plurality of second candidate objects through the first recommendation model to obtain a fourth recommendation result;

The updating module is specifically configured to:

a target loss is determined based on a third difference between the first recommendation and the second recommendation, a fourth difference between the fourth recommendation and the actual selection, and the error.
The apparatus of any one of claims 11 to 18, wherein the acquisition module is further configured to:

obtaining a user attribute of the target user, wherein the user attribute comprises at least one of the following: gender, age, occupation, income, hobbies, education level;

the feedforward module is specifically configured to:

Processing the plurality of first recommended objects and the user attribute through the first recommendation model;

Processing the plurality of first recommended objects and the user attribute through the second recommendation model.
The apparatus of any one of claims 11 to 19, wherein the acquisition module is further configured to:

The operation data are acquired by the terminal equipment based on the operation of the target user on a target interface, wherein the target interface comprises a first interface and a second interface, the first interface comprises a control, the control is used for indicating whether to start collection of random traffic, the operation comprises a first operation of the target user aiming at the first control, the second interface is an interface displayed in response to the first operation, the first operation is used for indicating to start collection of the random traffic, the second interface comprises a plurality of second candidate objects, the operation further comprises a second operation of the target user aiming at a plurality of second candidate objects, and the second operation is used for determining the real selection result.
A computing device, the computing device comprising a memory and a processor; the memory stores code, the processor being configured to retrieve the code and perform the method of any of claims 1 to 10.
A computer storage medium storing one or more instructions which, when executed by one or more computers, cause the one or more computers to implement the method of any one of claims 1 to 10.
A computer program product comprising code for implementing the method of any of claims 1 to 10 when said code is executed.