CN117349505A

CN117349505A - Multi-target recommendation method and device

Info

Publication number: CN117349505A
Application number: CN202210726363.8A
Authority: CN
Inventors: 梁瀚明; 马骊; 傅妍玫; 赵忠; 赵光耀; 户维波; 何新昇
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2024-01-05

Abstract

The embodiment of the application provides a multi-target recommendation method and device, which are used for eliminating the influence of user bias, enhancing the capability of personalized sequencing of users and improving the accuracy of estimated results. Comprising the following steps: acquiring a first characteristic of a recommending body and a second characteristic of an object to be recommended; inputting the first features and the second features into a multi-target recommendation model to obtain the estimated score of the object to be recommended, wherein the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body feature information of the sample recommendation main body and object feature information of the object to be recommended, and the main body feature information comprises historical behavior features of the sample recommendation main body and attribute features of the sample recommendation main body; recommending the object to be recommended to the recommending body according to the estimated score of the object to be recommended. The embodiment of the application can be applied to an artificial intelligence scene.

Description

Multi-target recommendation method and device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a multi-target recommendation method and device.

Background

Recommended as a technical means for solving information overload and mining potential demands of users, plays an important role in various fields, such as: e-commerce, news information, movie recommendations, etc. In order to cope with the application scenario of the user, the multi-objective recommendation model is widely applied in the recommendation system. The multi-target recommendation model obtains additional benefits which cannot be obtained by the single-target model through simultaneously optimizing the effects of multiple service targets, wherein the effects of the service targets comprise click rate, conversion rate, click conversion rate and the like.

However, the commonly used multi-objective recommendation model usually performs a loss calculation for each business objective, which results in a user bias having a larger influence on the prediction results of the multi-objective recommendation model.

There is thus a great need for a multi-objective recommendation model that eliminates user bias.

Disclosure of Invention

The embodiment of the application provides a multi-target recommendation method and device, which are used for eliminating the influence of user bias, so that a training result and online prediction obtain consistent estimated scores, the capability of a multi-target recommendation model for personalized sequencing of users is further enhanced, and the accuracy of the multi-target estimated results is improved.

In view of this, the present application provides, in one aspect, a multi-objective recommendation method, including:

acquiring a first characteristic of a recommending body and a second characteristic of an object to be recommended;

inputting the first feature and the second feature into a multi-target recommendation model to obtain the estimated score of the object to be recommended, wherein the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body feature information of the sample recommendation main body and object feature information of the object to be recommended, and the main body feature information of the sample recommendation main body comprises historical behavior features of the sample recommendation main body aiming at the object to be recommended and attribute features of the sample recommendation main body; recommending the object to be recommended to the recommending body according to the estimated score of the object to be recommended.

Another aspect of the present application provides a multi-objective recommendation apparatus, including:

the acquisition module is used for acquiring the first characteristics of the recommended main body and the second characteristics of the object to be recommended;

the processing module is used for inputting the first feature and the second feature into a multi-target recommendation model to obtain the estimated score of the object to be recommended, the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body feature information of the sample recommendation main body and object feature information of the object to be recommended, and the main body feature information of the sample recommendation main body comprises historical behavior features of the sample recommendation main body aiming at the object to be recommended and attribute features of the sample recommendation main body;

And the recommending module is used for recommending the object to be recommended to the recommending main body according to the estimated score of the object to be recommended.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the obtaining module is further configured to obtain the set of sample pairs;

the processing module is also used for calculating sample pre-estimated scores obtained by inputting sample data of each sample pair in the sample pair set into the initial target model; calculating a loss value of the initial target model passing through each sample pair in the sample pair set according to the sample pre-estimated score; updating model parameters of the initial target model according to the loss value of each sample pair to obtain the multi-target recommended model

In one possible design, in another implementation of another aspect of the embodiments of the present application, the processing module is specifically configured to obtain a weight value of each sample pair in the set of sample pairs; and calculating a loss value of the initial target model passing through each sample pair in the sample pair set according to the pre-estimated score and the weight value.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is specifically configured to obtain a tag value of sample data of each sample pair in the set of sample pairs, where the tag value is determined according to an operation target of the recommended object and a target value corresponding to the operation target of the recommended object; and determining a weight value of each sample pair in the sample pair set according to the label value.

In another implementation manner of another aspect of the embodiments of the present application, the processing module is specifically configured to obtain a tag value of sample data of each sample pair in the set of sample pairs and an exposure position corresponding to the sample data of each sample pair in the set of sample pairs, where the tag value is determined according to an operation target of the recommended object and a target value corresponding to the operation target of the recommended object, and the exposure position is used to indicate a display position of the recommended object on a display page; and determining a weight value of each sample pair in the sample pair set according to the label value and the exposure position.

In one possible design, in another implementation manner of another aspect of the embodiments of the present application, the processing module is specifically configured to set an operation target for the recommended object and a target value corresponding to the operation target according to a rule of equalizing a loss value of the pair of samples; and assigning a tag value to the sample data of each sample pair in the set of sample pairs according to the operational target and the target value.

In one possible design, in another implementation of another aspect of the embodiments of the present application, the multi-objective recommendation model is a single tower structural model based on the Pairwise penalty.

Another aspect of the present application provides a computer device comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

the processor is used for executing the program in the memory, and the processor is used for executing the method according to the aspects according to the instructions in the program code;

the bus system is used to connect the memory and the processor to communicate the memory and the processor.

Another aspect of the present application provides a computer-readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the methods of the above aspects.

In another aspect of the present application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the above aspects.

From the above technical solutions, the embodiments of the present application have the following advantages: the sample data is classified according to the same user, so that different sample data of the same user in one sample pair and ordering information of the two sample data are realized, the training process is consistent with the data in on-line prediction, the influence of user bias can be eliminated, the training result and on-line prediction obtain more consistent estimated scores, the capability of the multi-target recommendation model for individually ordering the user is further enhanced, and the multi-target estimated result is improved.

Drawings

FIG. 1 is a schematic view of an implementation environment of a training method of a multi-objective recommendation model according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of an implementation of a training method of a multi-objective recommendation model according to an embodiment of the present application;

FIG. 3 is a schematic diagram of one embodiment of a training method of a multi-objective recommendation model according to an embodiment of the present application;

FIG. 4 is a schematic diagram of one embodiment of a multi-objective recommendation method according to the embodiments of the present application;

FIG. 5 is an interface display diagram of a multi-objective recommendation method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of an embodiment of a multi-objective recommendation device according to an embodiment of the present application;

FIG. 7 is a schematic diagram of another embodiment of a multi-objective recommendation device according to an embodiment of the present application;

fig. 8 is a schematic diagram of another embodiment of a multi-objective recommendation device according to an embodiment of the present application.

Detailed Description

The terms "first," "second," "third," "fourth" and the like in the description and in the claims of this application and in the above-described figures, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be capable of operation in sequences other than those illustrated or described herein, for example. Furthermore, the terms "comprises," "comprising," and "includes" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

In view of the fact that certain terms are used in this application, these terms are first described below.

Recommendation system: is an information filtering system for predicting a user's score or preference for an item.

Multi-objective ranking model: the ranking model has multiple tasks and multiple business objectives simultaneously.

Loss value: the samples generated by the recommender system are mapped to non-negative real numbers to represent the loss of the samples.

Pointwise loss: the loss value is constructed from a sample.

Pairwise loss: loss values were constructed from two samples.

Recall: a set of candidate items related to the user's interests is quickly selected from a full library of items.

Sequencing: scoring the recalled items, and intercepting the first n items according to the score as a recommendation result, wherein n is a positive integer.

Multilayer perceptron (Multiple Layer Perceptron, MLP): also commonly referred to as deep neural networks (Deep Neural Network, DNN) are multi-layer fully connected neural networks.

Recommended as a technical means for solving information overload and mining potential demands of users, plays an important role in various fields, such as: e-commerce, news information, movie recommendations, etc. In order to cope with the application scenario of the user, the multi-objective recommendation model is widely applied in the recommendation system. The multi-target recommendation model obtains additional benefits which cannot be obtained by the single-target model through simultaneously optimizing the effects of multiple service targets, wherein the effects of the service targets comprise click rate, conversion rate, click conversion rate and the like. The multi-objective recommendation model currently in common use includes weighting samples, i.e., the loss of a single objective model, in one possible implementation. For example, if the model has three business targets of click, duration and interaction, the sample weights can be designed as follows: "weight of interaction positive sample is 3; the weight of the long positive sample is 2; the weight of the other samples is 1". This will likely result in a higher pre-estimated score for the same user weighted sample, making the loss of weighting difficult to guarantee. In another possible implementation, a multi-tower multi-objective model is employed for training. Thus, the input bottom layer of the multi-tower multi-target model is shared by all the connection layers, and each business target corresponds to one tower and is used for estimating the score of the business target. The penalty for each business objective is a Pointwise penalty, and the total penalty for the overall model is a weighted sum of the multiple business objective penalties. In this scheme, the loss weight of each business target is a set of parameters, and the loss weight fusion is also a set of parameters, so that the parameters of the model are more, and the parameter adjustment cost is higher. Meanwhile, each business target is subjected to loss calculation, so that user bias has a larger influence on the prediction result of the multi-target recommendation model.

In order to solve the above problems, the present application provides the following technical solutions: acquiring a first characteristic of a recommending body and a second characteristic of an object to be recommended; inputting the first feature and the second feature into a multi-target recommendation model to obtain the estimated score of the object to be recommended, wherein the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body feature information of the sample recommendation main body and object feature information of the object to be recommended, and the main body feature information of the sample recommendation main body comprises historical behavior features of the sample recommendation main body aiming at the object to be recommended and attribute features of the sample recommendation main body; recommending the object to be recommended to the recommending body according to the estimated score of the object to be recommended.

According to the technical scheme, the multi-target recommendation model classifies sample data according to the same user, so that different sample data of the same user in a sample pair and sorting information of the two sample data are achieved, namely, the multi-target recommendation model is trained by constructing loss values (also can be called Pairwise loss) through two samples, so that in the whole model training process, the training process is consistent with data in online prediction, the influence of user bias can be eliminated, the training result is more consistent with online prediction to obtain a predicted score, the capability of the multi-target recommendation model for personalized sorting of the user is enhanced, and the multi-target prediction result is improved.

It can be appreciated that in the embodiment of the present application, before performing multi-objective recommendation, a training method of a multi-objective recommendation model may also be provided in the embodiment of the present application. In an exemplary aspect, the training method of the multi-objective recommendation model provided in the embodiments of the present application can be executed by a computer device. Next, an implementation environment of a training method of a multi-objective recommendation model provided in an embodiment of the present application is introduced, and fig. 1 is a schematic diagram of an implementation environment of a training method of a multi-objective recommendation model provided in an embodiment of the present application. Referring to fig. 1, the implementation environment includes a terminal device 101 and a server 102. The terminal device 101 and the server 102 can be directly or indirectly connected through wired or wireless communication, which is not limited herein.

In some embodiments, the terminal device 101 is, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, a smart voice interaction device, a smart home appliance, a vehicle-mounted terminal, and the like. The terminal device 101 is installed and operated with a client supporting content recommendation, where the client may be operated on the terminal device 101 in the form of a browser, or may be operated on the terminal device in the form of a stand-alone Application (APP), and the specific presentation form of the client is not limited herein. In some embodiments, the server 102 is a stand-alone physical server, can be a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDN), and basic cloud computing services such as big data and artificial intelligence platforms. The server 102 is configured to provide background services for applications that support virtual scenarios. In some embodiments, the server 102 takes on primary computing effort and the terminal device 101 takes on secondary computing effort, e.g., the terminal device 101 provides sample data to the server 102, the server 102 provides a training process for the multi-objective recommendation model; alternatively, a distributed computing architecture is used for collaborative computing between the server 102 and the terminal device 101.

It will be appreciated that the number of terminal devices 101 described above may be greater or lesser. For example, the terminal apparatus 101 may be only one, or the terminal apparatus 101 may be several tens or hundreds, or more. That is, the embodiment of the present application does not limit the number and the device type of the terminal devices 101.

In the training architecture of the training method of the multi-objective recommendation model provided in the embodiment of the present application, as shown in fig. 2, the multi-objective recommendation model is a single tower structure based on the paywise loss, that is, each sample pair constructed by the sample data includes two different sample data belonging to the same user, and the sample pair further includes ordering information of the two sample data. Based on the training architecture, sample pairs in a sample pair set constructed by sample data sequentially pass through an initial target model of the multi-target recommended model, wherein when the sample pairs pass through the initial target model, the specific calculation process can be as follows: sample i and sample j in the sample pair are respectively input into the initial target model, and then a pre-estimated score i and a pre-estimated score j are respectively output; calculating a loss value according to the estimated score i and the estimated score j based on the definition of the Pairwise loss; and finally, reversely adjusting the parameters of the initial target model according to the loss value until the value of the loss value is smaller than a preset threshold value or the loss value reaches a preset convergence condition, determining the parameters adjusted according to the loss value as the parameters of the initial target model, namely, applying the output to the online predicted multi-target recommended model at the moment, and ending the model training process.

In some embodiments, the multi-objective recommendation model described above can be applied to recommendation systems that can recommend media resources, such as news, advertisements, video, etc., to users based on multi-objective recommendation methods, and can also be used to recommend goods, services, etc. to users. For example, the terminal device is a vehicle-mounted terminal, and the server pushes contents such as nearby gas stations and parking lots to a plurality of vehicle-mounted terminals based on the multi-target recommendation model. For another example, the terminal device is a smart phone, and the server recommends nearby contents such as food, scenic spots, and the like to the plurality of smart phones based on the multi-objective recommendation model.

It should be noted that, the implementation environment of the multi-objective recommendation method provided in the embodiment of the present application may be the same as or different from the implementation environment of the training method of the multi-objective recommendation model, which is not limited in the embodiment of the present application.

It will be appreciated that in the specific embodiments of the present application, data related to sample data and the like is referred to, and when the above embodiments of the present application are applied to specific products or technologies, sufficient permission or consent is required for the user to be obtained, and the collection, use and processing of the related data is required to comply with the relevant laws and regulations and standards of the relevant countries and regions.

With reference to the foregoing description, a method for training a multi-objective recommendation model in the present application will be described below, referring to fig. 3, and one embodiment of the method for training a multi-objective recommendation model in the embodiment of the present application includes:

301. and acquiring sample data and an initial target model, wherein the sample data comprises main body characteristic information of a sample recommendation main body and object characteristic information of an object to be recommended, and the main body characteristic information of the sample recommendation main body comprises historical behavior characteristics of the sample recommendation main body for the object to be recommended and attribute characteristics of the sample recommendation main body.

In this embodiment, the training device needs to acquire the sample data and the initial target model for training before performing the training of the multi-target recommendation module.

In this embodiment, the sample data includes main feature information of a sample recommendation body and object feature information of an object to be recommended by the sample recommendation body, where the main feature information of the sample recommendation body includes historical behavior features of the sample recommendation body for the object to be recommended by the sample and attribute features of the sample recommendation body. It will be appreciated that the sample recommendation entity may be a user of the application platform service, for example, the application platform may be a paymate, and the user may be a user of the online transaction using the paymate; as another example, the application platform may be a video platform, and the user may be a user viewing video using the video platform; as another example, the application platform may be an audio platform, the user may be a user listening to audio using the audio platform, and so on. The main body characteristic information of the sample recommendation main body is used for reflecting attribute characteristics of the user and historical behavior characteristics of the user. The attribute of the user refers to basic information of the user, and attribute characteristics of the user include, but are not limited to, age, gender, work type, affiliated area, registration time and the like of the user. The historical behavior characteristics of the user may specifically include historical behavior data of the user on one or more application platforms, for example, the historical behavior data of the user on the payment platform may include, but is not limited to, user interests clicked by the user, user interests used by the user, goods purchased by the user through the user interests, and amounts of the goods purchased, etc., and the historical behavior data of the user on the shopping platform may include, but is not limited to, goods for which the user has generated a specified behavior (e.g., one or more of clicking, browsing, collecting, purchasing after-clicking, etc.), costs for the user to purchase goods, etc. Historical behavior data of a user on a video platform may include, but is not limited to, the user clicking on a video for viewing, the user sending a bullet screen or commenting or collecting a video for viewing, and so forth.

The sample object to be recommended refers to a recommended object which has an association relation with the user, and for example, includes, but is not limited to, a recommended object which is recommended to the user, a recommended object which is clicked by the user and generates purchasing behavior, and the like. For example, it may be merchandise, articles, video, audio, etc. The object feature information of the sample object to be recommended may, for example, include the related features of the recommended object having an association relationship with the user and the relationship features between the user and the recommended object. Wherein the relationship features reflect what the relationship between the user and the recommended object is in particular (browse, click and produce purchase behavior, collection, etc.). The relevant features of the recommended object may specifically include attribute features of the recommended object, attribute features of other users associated with the recommended object, and attribute features of the party to whom the recommended object belongs. Wherein the attribute features of the recommended object are used to reflect basic information of the object, such as category, characteristics, region, etc., other users associated with the recommended object may include, but are not limited to, other users on the application platform that have been recommended by the recommended object, other users who have clicked on the recommended object, and other users who have clicked on the recommended object and generated purchasing behavior. For example, taking an application platform as a payment platform as an example, a recommended object recommended to a user may be a user interest, and accordingly, a recommended object feature corresponding to the user may include a category of the user interest (such as a red package class, a shopping coupon class, etc.), an attribute feature of another user associated with the user may include the attribute feature of another user using the user interest, and an attribute feature of a party to which the recommended object belongs may include a region, a score, a collection, a volume, etc. of the application platform capable of using the user interest.

The multi-objective recommendation model is used for recommending a plurality of business objectives to a user, and the business objectives can be customized according to actual requirements. For example, when a video platform recommends a video advertisement to a user, the video advertisement is expected to be clicked, focused, purchased goods through the video advertisement, and the amount of deals generated, etc., and accordingly, the plurality of business objectives may include, for example, but not limited to, the user's click-through rate, conversion rate, stay time period, and the amount of deals generated by the user through the video advertisement.

For another example, when the payment platform issues a user benefit to the user, the user may click on the user benefit after looking at the issued user benefit, and at the same time, the user may obtain the user benefit after clicking on the user benefit, and the user benefit may be used after obtaining the user benefit, and accordingly, the plurality of business objectives may include, for example, but not limited to, at least two of the following; click rate, conversion rate, stay time of user's interests, the amount of the user's deal through user's interests, etc. The user rights may refer to offers provided by the application platform to the user, such as red packages, shopping coupons, shopping benefits, incoming ring tones, etc. provided by the application platform to the user.

In this embodiment, the initial target model may be a multi-layer perceptron (Multiple Layer Perceptron, MLP), a gradient lifting tree model (eXtremeGradient Boosting, xgboost), or a depth recommendation model (deep fm), i.e. a specific model structure is not limited herein as long as multi-target recommendation can be achieved.

302. A set of sample pairs is constructed from the sample data, one sample pair of the set of sample pairs comprising two different sample data of the same recommended subject and ordering information of the two different sample data.

After the training device acquires the sample data, the training device constructs a sample pair set from the acquired sample data, and it can be understood that each sample pair includes two different sample data and ordering information of the two different sample data, and the two sample data belong to the same sample recommendation body. For example, a sample pair includes sample data 1 and sample data 2 of a user a for a video platform, where a label corresponding to the sample data 1 is a click video, and a label corresponding to the sample data 2 is the click video and sends comments to the video. At this time, the sorting information of the sample data 1 and the sample data 2 is that the exposure position of the sample data 2 is higher than that of the sample data 1.

In this embodiment, the construction of the sample pair set may specifically refer to the following scheme:

in one exemplary scenario, the training device classifies the sample data according to user attribution, i.e., the sample data of the same user is assigned to a class. Meanwhile, the training device distributes corresponding label values for each sample data according to the preset operation targets and the preset target values of the operation targets of the sample recommended objects; and finally, constructing sample pairs according to the sample data of the same user according to the label value. In this embodiment, the operation target of the sample recommended object is used to represent a real operation corresponding to the service target corresponding to the sample recommended object, and the tag value of the sample data is used to represent a real value corresponding to the service target of the sample recommended object, which may also be referred to as a target value corresponding to the operation target. The operation targets corresponding to the sample recommended objects in different business targets are different.

It can be appreciated that, when the tag value of the sample data is set, in order to make the loss of the multi-target recommendation model small, so as to improve the accuracy of the multi-target recommendation model when the multi-target recommendation model is estimated on the line, the tag value of the sample data and the model estimated score can be set to be consistent. The consistency of the label value and the estimated score can be understood as that the estimated score is high, and the label value is high; if the estimated score is low, the tag value is low. Based on the above description, the sample may be that the sample data with high tag value is ranked first and the sample data with low tag value is ranked later in the construction process; it is also possible that sample data with a low tag value is ordered before and sample data with a high tag value is ordered after. The specific situation is determined according to the recommendation rules of the multi-objective recommendation model and the setting rules of the tag values. In an exemplary scenario, when the sample recommended object is a video, the operation target may include "click on the video, watch the video for a period of time exceeding a first threshold, and generate an interaction behavior". Suppose that the recommendation rules in the multi-objective recommendation model are: the higher the estimated score corresponding to the recommended object, the more forward the ordering position corresponding to the recommended object is, and at this time, the higher the estimated score is, the higher the label value is set. If the multi-objective recommendation model is required to achieve the order of three operation objectives of the video, the order is as follows: the ranking of the "generating interactive behavior" is the first, the ranking of the "watching video for a period exceeding the first threshold value" is the second, and the ranking of the "clicking video" is the last. The target value of the operation target is ranked as follows: the "generating interactive behavior" is greater than "the duration of viewing the video exceeds the first threshold" is greater than "clicking on the video". At this time, the tag value of the sample may be set as follows: "0, user not clicking on video; 1, the user only clicks on the video; 2, the user watching the video beyond a first threshold; 3, the user only generates interaction behavior; 4, the user watches the video and generates interaction behavior. An exemplary scheme of the tag value allocated according to the operation target and the target value in each sample data at this time may be as follows: if the operation target indicated by the sample data is 'click video', the label value corresponding to the sample data is 1; if the operation target indicated by the sample data is that "the user watches the video and exceeds the first threshold", the label value corresponding to the sample data is 2.

In another exemplary scenario, when the sample recommended object is a video, the operation target may include "click on the video, watch the video for a period of time exceeding a first threshold, and generate an interaction behavior". Suppose that the recommendation rules in the multi-objective recommendation model are: the lower the estimated score corresponding to the recommended object is, the more forward the ordering position corresponding to the recommended object is, and at this time, the lower the estimated score is, the lower the label value is set. If the multi-objective recommendation model is required to achieve the order of three operation objectives of the video, the order is as follows: the ranking of the "generating interactive behavior" is the first, the ranking of the "watching video for a period exceeding the first threshold value" is the second, and the ranking of the "clicking video" is the last. The target value of the operation target is ranked as follows: the "generating interactive behavior" is greater than "the duration of viewing the video exceeds the first threshold" is greater than "clicking on the video". At this time, the tag value of the sample may be set as follows: "4, the user does not click on the video; 3, the user only clicks the video; 2, the user watching the video beyond a first threshold; 1, the user only generates interaction behavior; 0, the user views the video and generates interactive behavior. An exemplary scheme of the tag value allocated according to the operation target and the target value in each sample data at this time may be as follows: if the operation target indicated by the sample data is 'click video', the label value corresponding to the sample data is 3; if the operation target indicated by the sample data is that "the user watches the video and exceeds the first threshold", the label value corresponding to the sample data is 2. In another exemplary scenario, taking paymate as an example, the plurality of business objectives includes at least two of: click rate, conversion rate, and amount of user interest provided by the user to the paymate. For the service target of the click rate of the user interest provided by the user on the payment platform, the operation target corresponding to the sample data in the service target is used for indicating whether the user clicks the user interest, and the tag value can be set to be 0, and the user does not click the user interest; 1, clicking the user rights by the user; for the service target of the conversion rate of the user benefit provided by the user to the payment platform, the sample data is used for indicating whether the user benefit is used after the user clicks the user benefit or not at the operation target corresponding to the service target, and the tag value can be set to be 2, so that the user does not use the user benefit; 3, the user uses the user rights "; for the business target of the transaction amount generated by the user rights and interests provided by the user through the payment platform, the operation target corresponding to the sample data is the transaction amount actually generated by the user through the user rights and interests, and the label value can be set as '4', and the transaction amount of the user is lower than a preset threshold value; and 5, the user transaction amount is higher than or equal to a preset threshold value.

In another exemplary scheme, when the sample recommended object is a video, the business targets of the video are at least 3 as follows: clicking on the video, and watching the video for a period of time exceeding a first threshold, and generating an interactive behavior. The interaction behavior comprises paying attention to an account number for publishing the video, endorsing the video, sharing the video and the like. Suppose that the recommendation rules in the multi-objective recommendation model are: the higher the output estimated score is, the more forward the corresponding ordering position is, at this time, the higher the estimated score is, the higher the tag value is, and the ordering that the three business targets need to reach is: the "generating interactive behavior" is greater than "the duration of viewing the video exceeds the first threshold" is greater than "clicking on the video". The tag value can thus be defined at this point as: "0, user not clicking on video; 1, the user only clicks on the video; 2, the user watching the video beyond a first threshold; 3, the user only generates interaction behavior; 4, the user watches the video and generates interaction behavior. Whereas for the tag value settings of the sample data described above, it is possible to generate 10 types of sample pairs for the sample data of the same user at the video platform. For example, assume that sample data corresponding to a first user is as follows: sample data 1 and the label value corresponding to sample data 1 are "0", and the user does not click on the video "; sample data 2 and the label value corresponding to sample data 2 are "1", and the user only clicks the video "; sample data 3 and the label value corresponding to sample data 3 are "2", and the user watches the video to exceed the first threshold value "; sample data 4 and the label value corresponding to sample data 4 are 3, and the user only generates interaction behavior; sample data 5 and the label value corresponding to sample data 5 are "4", and the user watches the video and generates interaction behavior ". At this time, the sample data of the first user will construct the following 10 possible sample pairs: (sample data 2, sample data 1), (sample data 3, sample data 1), (sample data 4, sample data 1), (sample data 5, sample data 1), (sample data 3, sample data 2), (sample data 4, sample data 2), (sample data 5, sample data 2), (sample data 4, sample data 3), (sample data 5, sample data 3), and (sample data 5, sample data 4).

303. And calculating the sample pre-estimated score obtained by inputting the sample data of each sample pair in the sample pair set into the initial target model.

After the training device acquires corresponding training data, the sample pair set is sequentially input into the initial target model for training, and sample pre-estimated scores obtained by inputting sample data of each sample pair in the sample pair set into the initial target model are calculated.

Training device in this embodiment, in the training process, when calculating the estimated score of each sample data, the training device may further generate an independent training architecture, which is used to calculate the estimated score for the sample data of the same user, and then construct the sample pair for the sample data of the same user, and obtain a sample pair set and the estimated score corresponding to each sample pair in the sample pair set; and finally, calculating a loss value according to the estimated score. Thus, although the same sample data can appear in a plurality of sample pairs, the calculation amount of model training is not increased because the estimated scores of all sample data are already calculated at one time.

304. And calculating the loss value of the initial target model passing through each sample pair in the sample pair set according to the sample pre-estimated score.

In this embodiment, the specific manner of the training device in calculating the loss value of the initial target model passing through each sample pair in the sample pair set according to the sample pre-estimated score may be as follows: the training device firstly acquires the weight of each sample pair, and then calculates the loss value according to the two sample estimated scores corresponding to the sample pair and the weight value. For example, after the training device inputs a first sample pair and obtains a first sample pre-estimated score and a second sample pre-estimated score corresponding to the first sample pair, simultaneously obtaining a weight of the first sample pair; and finally, the training device calculates the loss value of the multi-target recommendation model according to the first sample pre-estimated score, the second sample pre-estimated score and the weight of the first sample pair. It will be appreciated that the training apparatus may calculate the loss value of the initial target model from the sample pre-estimated score and the weight value of the sample pair based on the definition of the paywise loss. In one exemplary scenario, the training device may calculate using equation 1 as follows:

equation 1:

wherein, the i and the j are used for indicating the exposure position of the object corresponding to the sample data, and the y _i Tag value for indicating ith sample data, y _j A tag value for indicating the jth sample data, the s _i For indicating the estimated score of the ith sample data after passing through the multi-objective recommendation model, the s _j Indicating the estimated score of the j-th sample data after passing through the multi-target recommendation model, the w _ij For indicating the weight of the sample pair (i, j)Heavy.

In this embodiment, the weight of the sample pair may be determined according to the following manner:

in one possible implementation, the weights of the sample pair are determined from tag values of two different sample data in the sample pair. For example, the weights of the sample pairs satisfy the following formula:for example, the first sample pair of the first user is (sample data 5, sample data 2), then the weight of the sample pair is |2 ⁴ -2 ¹ The second sample pair of the first user is (sample data 4, sample data 2), then the weight of the sample pair is |2 ³ -2 ¹ |=6. That is, in this scheme, the higher the tag value corresponding to the sample data, the greater the weight of the sample pair; meanwhile, the larger the label value difference corresponding to the two sample data is, the larger the weight of the sample pair is. Thus, more accurate estimation of high-value sample data can be ensured.

In another possible implementation, the weights of the sample pair are determined according to the tag values of two different sample data in the sample pair and the exposure positions of the recommended objects indicated by the two different sample data. For example, the weights of the sample pairs satisfy the following formula: In this scheme, a small exposure position indicates a position that is forward, so the weight is greater as the exposure position of the sample recommended object indicated by the sample data is forward; the greater the difference in the two exposure positions of the recommended object indicated by the sample data, the greater the weight. Thus, more accurate prediction of sample data at the front position can be ensured.

305. And updating the model parameters of the initial target model according to the loss value of each sample pair to obtain the multi-target recommended model.

In this embodiment, the model parameters that update the initial target model may be iterative updates. In an exemplary scheme, before training is started, the initial target model has initial parameters, a first loss value is obtained after a first sample pair in the sample pair set is input into the initial multi-target model, then the initial parameters of the initial target model are reversely updated according to the first loss value to obtain first parameters, and the initial target model is updated to be the first target model; inputting a second sample pair in the sample pair set into the first target model to obtain a second loss value; then reversely updating the first parameters of the first multi-target model into second parameters according to the second loss value, and updating the first target model into a second target model at the moment; and the like, ending the training process of the recommendation model until the loss value obtained by the sample pair in the input sample pair set reaches a convergence condition, and reversely updating the parameters of the target model according to the final loss value to obtain the final output multi-target recommendation model.

In the process of parameter iterative updating of the multi-target recommendation model, when the loss value of the multi-target recommendation model reaches a preset threshold value, the training device determines that the parameters of the multi-target recommendation model are converged, at this time, the training device can end training, output the multi-target recommendation model at this time, and apply the multi-target recommendation model to online estimation.

It can be understood from the above description that, in the training of the multi-objective recommendation model, the loss value is calculated according to the estimated score and the weight of the sample pair based on the Pairwise loss definition, that is, the super parameter in the multi-objective recommendation model has only the weight of the sample pair, and the weight of the sample pair can be adjusted to only the tag value of the sample data, so that only one set of parameters is included in the multi-objective recommendation model, thereby reducing the parameter adjustment cost of the multi-objective recommendation model.

In this embodiment, in the training process of the multi-target recommendation model, the sample data are classified according to the same user, so that different sample data of the same user in a sample pair and ordering information of the two sample data are realized, the training process is consistent with data in on-line prediction, so that the influence of user bias can be eliminated, the training result and on-line prediction obtain more consistent estimated scores, and the capability of the multi-target recommendation model for individually ordering the user is further enhanced, so that the multi-target estimated result is improved. Meanwhile, a label value is allocated to the sample data, and the weight of the sample pair is determined according to the label value, so that the super parameter in the multi-target recommendation model only has the parameter of the sample label value, and the parameter adjustment cost of the multi-target recommendation model is reduced. When the weight of the sample pair is determined, the influence of the sample exposure position in each sample data in the sample pair is considered, so that the sample estimation is ensured to be more accurate.

The training method of the multi-objective recommendation model in the embodiment of the present application is described above, and the application of the multi-objective recommendation model is described below, referring specifically to fig. 4, and one embodiment of the multi-objective recommendation method in the embodiment includes:

401. the method comprises the steps of acquiring first characteristics of a recommending body and second characteristics of an object to be recommended.

In this embodiment, the recommendation body may be a user who uses a service provided by the application platform. For example, the application platform may be a paymate, and the user may be a user who uses the paymate to conduct online transactions; as another example, the application platform may be a video platform, the user may be a user viewing video using the video platform, and so on. The first feature is used to reflect the attribute features of the user. The attribute of the user refers to basic information of the user, and attribute characteristics of the user include, but are not limited to, age, gender, work type, affiliated area, registration time and the like of the user. It will be appreciated that in this embodiment, the first feature may further include tag information for the user, for example, the tag of the user may include: "like history", "like music", "pay for game advertisement with a high probability", and the like.

The object to be recommended refers to a recommended object having an association relationship with the user, and includes, for example, but is not limited to, a recommended object recommended to the user, a recommended object clicked by the user, a recommended object that the user clicks and generates purchasing behavior, and the like. For example, it may be merchandise, articles, video, audio, etc. The second feature of the object to be recommended may specifically include an attribute feature of the object to be recommended, an attribute feature of other users associated with the object to be recommended, and an attribute feature of a party to which the object to be recommended belongs. The attribute features of the object to be recommended are used to reflect basic information of the object to be recommended, such as a category, a feature, a region to which the object to be recommended belongs, and other users associated with the object to be recommended may include, but are not limited to, other users who have been recommended by the object to be recommended on the application platform, other users who have clicked by the object to be recommended, and other users who have clicked by the object to be recommended and generated purchasing behavior. For example, taking an application platform as a payment platform as an example, an object to be recommended to a user may be a user interest, and accordingly, an object to be recommended feature corresponding to the user may include a category of the user interest (such as a red package class, a shopping coupon class, etc.), an attribute feature of another user associated with the user may include the attribute feature of another user using the user interest, and an attribute feature of a party to which the object to be recommended belongs may include a region, a score, a collection, a volume of a transaction, etc. of the application platform capable of using the user interest. For another example, taking the application platform as an audio playing platform, the object to be recommended to the user may be audio, and accordingly, the feature of the object to be recommended corresponding to the user includes the type of audio (such as song, audio book, audio, etc.), and the attribute features of other users associated with the user may include the region, score, collection, purchase amount, etc. of the application platform playing the audio.

402. Inputting the first feature and the second feature into a multi-target recommendation model to obtain the estimated score of the object to be recommended, wherein the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body feature information of the sample recommendation main body and object feature information of the object to be recommended, and the main body feature information of the sample recommendation main body comprises historical behavior features of the sample recommendation main body aiming at the object to be recommended and attribute features of the sample recommendation main body.

In this embodiment, after the first feature and the second feature are acquired, the first feature and the second feature are input into a trained multi-objective recommendation model.

The multi-objective recommendation model in this embodiment is obtained by training the method of the embodiment shown in fig. 3.

403. Recommending the object to be recommended to the recommending body according to the estimated score of the object to be recommended.

And sequencing the objects to be recommended according to the estimated scores of the objects to be recommended, recommending the objects to be recommended to the terminal equipment of the user, and displaying. It can be appreciated that in this embodiment, the multi-target recommendation device may sort the objects to be recommended according to the pre-estimated score according to a rule set by the user when training the multi-target recommendation model. For example, when the multi-target recommendation device is trained, the sample data with lower sample pre-estimated score is arranged to be ranked more forward, and the label value corresponding to the sample data is lower, the multi-target recommendation device ranks the objects to be recommended according to the pre-estimated score from low to high. If the sample data with higher sample pre-estimated score is set to be ranked more forward when the multi-target recommendation device is trained, and the label value corresponding to the sample data is higher, the multi-target recommendation device ranks the objects to be recommended according to the pre-estimated score from high to low.

In an exemplary scenario, if the multi-objective recommendation device is used to recommend news articles to the user, the business objective may include multiple types, such as clicking on a news article, reading a news article for a period of time half of a predicted time according to the news article, commenting on a news article, and so on. The first feature of the user includes: the 23 years old, women and working places are Shenzhen, the historical attention information includes information such as stars, histories, current events, movies and the like, and the user labels comprise like history, like music and pay for game advertisements with great probability. The news article information to be recommended currently includes: star news, historical description of events occurring today, policy of the day, movie reviews, etc. The multi-target recommending device is assumed to obtain the estimated scores of all news articles according to a multi-target recommending model as follows: the estimated score corresponding to the star scarlet is 8, the estimated score corresponding to the historical event explanation occurring today is 9, the estimated score corresponding to the policy of the same day is 6, and the estimated score corresponding to the film evaluation is 7. When the user browses the news interface of a certain social software through the smart phone, the recommended interface of the multi-target recommending device can be shown in fig. 5, and the most top of the interface is the event which occurs 6 months and 9 days historically, and then the news articles are orderly sequenced as XX stars entering XX movie theatres "," the opinion about movie YY ", and" the present high-examination questions are harder ".

Referring to fig. 6, fig. 6 is a schematic diagram of an embodiment of a multi-objective recommendation device according to an embodiment of the present application, where the multi-objective recommendation device 20 includes:

an obtaining module 201, configured to obtain a first feature of a recommendation body and a second feature of an object to be recommended;

the processing module 202 is configured to input the first feature and the second feature into a multi-target recommendation model to obtain an estimated score of the object to be recommended, where the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set includes two different sample data of the same recommendation body and ordering information of the two different sample data, the sample data includes main feature information of the sample recommendation body and object feature information of the object to be recommended, and the main feature information of the sample recommendation body includes historical behavior features of the sample recommendation body for the object to be recommended and attribute features of the sample recommendation body;

and the recommending module 203 is configured to recommend the object to be recommended to the recommending body according to the estimated score of the object to be recommended. In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, the sample data are classified according to the same user, so that different sample data of the same user in one sample pair and ordering information of the two sample data are realized, the training process is consistent with the data in on-line prediction, so that the influence of user bias can be eliminated, the training result and the on-line prediction obtain more consistent estimated score, the capability of the multi-target recommendation model for personalized ordering of the user is further enhanced, and the multi-target estimated result is improved.

Alternatively, based on the embodiment corresponding to fig. 6, in another embodiment of the multi-objective recommendation apparatus 20 provided in the embodiment of the present application,

the obtaining module 201 is further configured to obtain the set of sample pairs;

the processing module 202 is further configured to calculate a sample pre-estimated score obtained by inputting the sample data of each sample pair in the set of sample pairs into the initial target model; calculating a loss value of the initial target model passing through each sample pair in the sample pair set according to the sample pre-estimated score; and updating the model parameters of the initial target model according to the loss value of each sample pair to obtain the multi-target recommended model.

In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, in the model training process, the sample data are classified according to the same user, so that different sample data of the same user in one sample pair and the ordering information of the two sample data are realized, the training process is consistent with the data in on-line prediction, so that the influence of user bias can be eliminated, the training result and the on-line prediction obtain more consistent estimated score, the capability of the multi-target recommendation model for individually ordering the user is further enhanced, and the multi-target estimated result is improved. Simultaneously inputting sample data in each sample pair into an initial target model to obtain two sample estimated scores, calculating a loss value of the initial target model according to the sample estimated scores, and iteratively updating the loss value so as to realize parameter adjustment of the initial target model.

Optionally, in another embodiment of the multi-objective recommendation device 20 provided in the embodiment of the present application, based on the embodiment corresponding to fig. 6, the processing module 202 is specifically configured to obtain a weight value of each sample pair in the sample pair set; and calculating a loss value of the initial target model passing through each sample pair in the sample pair set according to the pre-estimated score and the weight value.

In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, the loss is calculated according to the weight and the estimated score of the sample pair, so that the loss is consistent with the estimated score, and the accuracy of online estimation is improved.

the processing module 202 is specifically configured to obtain a tag value of sample data of each sample pair in the set of sample pairs, where the tag value is determined according to an operation target of the recommended object and a target value corresponding to the operation target of the recommended object; and determining a weight value of each sample pair in the sample pair set according to the label value.

In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, the weight of the sample pair is determined by using the label value of the sample data, so that the high-value sample data can be estimated more accurately, and the label value of the sample data can be adjusted according to the requirement, so that the influence of the sample on the loss is ensured to be at an equilibrium level, and the bias of a user is eliminated.

the processing module 202 specifically obtains a tag value of sample data of each sample pair in the sample pair set and an exposure position corresponding to the sample data of each sample pair in the sample pair set, where the tag value is determined according to an operation target of the recommended object and a target value corresponding to the operation target of the recommended object, and the exposure position is used to indicate a display position of the recommended object on a display page; and determining a weight value of each sample pair in the sample pair set according to the label value and the exposure position.

In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, the weight of the sample pair is determined by using the label value of the sample data, so that the high-value sample data can be estimated more accurately, and the label value of the sample data can be adjusted according to the requirement, so that the influence of the sample on the loss is ensured to be at an equilibrium level, and the bias of a user is eliminated. Alternatively, based on the embodiment corresponding to fig. 6, in another embodiment of the multi-objective recommendation apparatus 20 provided in the embodiment of the present application,

The processing module 202 is specifically configured to set an operation target for the recommended object and a target value corresponding to the operation target according to a rule of balancing the loss value of the sample pair; and assigning a tag value to the sample data of each sample pair in the set of sample pairs according to the operational target and the target value.

In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, the corresponding sample tag value is set according to the recommended object, so that only one group of parameters of the sample tag value in the multi-target recommended model is provided, and the parameter adjustment cost of the multi-target recommended model is reduced

Alternatively, in another embodiment of the multi-objective recommendation apparatus 20 provided in the embodiment of the present application, based on the embodiment corresponding to fig. 6, the multi-objective recommendation model is a single tower structure model based on the Pairwise loss.

In an embodiment of the present application, a multi-objective recommendation apparatus is provided. By adopting the device, the multi-target recommendation model adopts a single-tower structure, so that the prediction result can have consistency, and the applicability of the multi-target recommendation model is improved.

Referring to fig. 7, fig. 7 is a schematic diagram of a server structure provided in an embodiment of the present application, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more central processing units (central processing units, CPU) 322 (e.g., one or more processors) and a memory 332, one or more storage media 330 (e.g., one or more mass storage devices) storing application programs 342 or data 344. Wherein the memory 332 and the storage medium 330 may be transitory or persistent. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations on a server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 and execute a series of instruction operations in the storage medium 330 on the server 300.

The Server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input/output interfaces 358, and/or one or more operating systems 341, such as Windows Server ^TM ，Mac OS X ^TM ，Unix ^TM ,Linux ^TM ，FreeBSD ^TM Etc.

The steps performed by the multi-objective recommendation apparatus in the above embodiments may be based on the server structure shown in fig. 7.

The multi-objective recommendation apparatus provided in the present application may be used in a terminal device, please refer to fig. 8, which only shows a portion related to an embodiment of the present application for convenience of explanation, and specific technical details are not disclosed, please refer to a method portion of an embodiment of the present application. In the embodiment of the present application, a terminal device is taken as a smart phone as an example to describe:

fig. 8 is a block diagram illustrating a part of a structure of a smart phone related to a terminal device provided in an embodiment of the present application. Referring to fig. 8, a smart phone includes: radio Frequency (RF) circuitry 410, memory 420, input unit 430, display unit 440, sensor 450, audio circuitry 460, wireless fidelity (wireless fidelity, wiFi) module 470, processor 480, and power supply 490. Those skilled in the art will appreciate that the smartphone structure shown in fig. 8 is not limiting of the smartphone and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

The following describes each component of the smart phone in detail with reference to fig. 8:

the RF circuit 410 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, in particular, after receiving downlink information of the base station, the downlink information is processed by the processor 480; in addition, the data of the design uplink is sent to the base station. In general, RF circuitry 410 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier (low noise amplifier, LNA), a duplexer, and the like. In addition, the RF circuitry 410 may also communicate with networks and other devices via wireless communications. The wireless communications may use any communication standard or protocol including, but not limited to, global system for mobile communications (global system of mobile communication, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), long term evolution (long term evolution, LTE), email, short message service (short messaging service, SMS), and the like.

The memory 420 may be used to store software programs and modules, and the processor 480 may perform various functional applications and data processing of the smartphone by executing the software programs and modules stored in the memory 420. The memory 420 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data (such as audio data, phonebooks, etc.) created according to the use of the smart phone, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.

The input unit 430 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the smart phone. In particular, the input unit 430 may include a touch panel 431 and other input devices 432. The touch panel 431, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on the touch panel 431 or thereabout using any suitable object or accessory such as a finger, a stylus, etc.), and drive the corresponding connection device according to a predetermined program. Alternatively, the touch panel 431 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 480, and can receive commands from the processor 480 and execute them. In addition, the touch panel 431 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The input unit 430 may include other input devices 432 in addition to the touch panel 431. In particular, other input devices 432 may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.

The display unit 440 may be used to display information input by a user or information provided to the user and various menus of the smart phone. The display unit 440 may include a display panel 441, and optionally, the display panel 441 may be configured in the form of a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), or the like. Further, the touch panel 431 may cover the display panel 441, and when the touch panel 431 detects a touch operation thereon or nearby, the touch operation is transmitted to the processor 480 to determine the type of the touch event, and then the processor 480 provides a corresponding visual output on the display panel 441 according to the type of the touch event. Although in fig. 8, the touch panel 431 and the display panel 441 are two separate components to implement the input and input functions of the smart phone, in some embodiments, the touch panel 431 and the display panel 441 may be integrated to implement the input and output functions of the smart phone.

The smartphone may also include at least one sensor 450, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel 441 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 441 and/or the backlight when the smartphone is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for identifying the application of the gesture of the smart phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration identification related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the smart phone are not described in detail herein.

Audio circuitry 460, speaker 461, microphone 462 can provide an audio interface between the user and the smartphone. The audio circuit 460 may transmit the received electrical signal after the audio data conversion to the speaker 461, and the electrical signal is converted into a sound signal by the speaker 461 and output; on the other hand, microphone 462 converts the collected sound signals into electrical signals, which are received by audio circuit 460 and converted into audio data, which are processed by audio data output processor 480, and transmitted via RF circuit 410 to, for example, another smart phone, or which are output to memory 420 for further processing.

WiFi belongs to a short-distance wireless transmission technology, and a smart phone can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 470, so that wireless broadband Internet access is provided for the user. Although fig. 8 shows a WiFi module 470, it is understood that it does not belong to the necessary constitution of a smart phone, and can be omitted entirely as needed within the scope of not changing the essence of the invention.

The processor 480 is a control center of the smart phone, connects various parts of the entire smart phone using various interfaces and lines, and performs various functions and processes data of the smart phone by running or executing software programs and/or modules stored in the memory 420 and invoking data stored in the memory 420, thereby performing overall monitoring of the smart phone. Optionally, the processor 480 may include one or more processing units; alternatively, the processor 480 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 480.

The smart phone also includes a power supply 490 (e.g., a battery) for powering the various components, optionally in logical communication with the processor 480 through a power management system that performs functions such as managing charge, discharge, and power consumption.

Although not shown, the smart phone may further include a camera, a bluetooth module, etc., which will not be described herein.

The steps performed by the multi-objective recommendation apparatus in the above-described embodiments may be based on the terminal device structure shown in fig. 8.

Also provided in embodiments of the present application is a computer-readable storage medium having a computer program stored therein, which when run on a computer, causes the computer to perform the methods as described in the foregoing embodiments.

Also provided in embodiments of the present application is a computer program product comprising a program which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, which are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above embodiments are merely for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the corresponding technical solutions.

Claims

1. A multi-objective recommendation method, comprising:

inputting the first features and the second features into a multi-target recommendation model to obtain the estimated score of the object to be recommended, wherein the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body feature information of the sample recommendation main body and object feature information of the object to be recommended, and the main body feature information of the sample recommendation main body comprises historical behavior features of the sample recommendation main body aiming at the object to be recommended and attribute features of the sample recommendation main body;

recommending the object to be recommended to the recommending body according to the estimated score of the object to be recommended.

2. The method according to claim 1, wherein the method further comprises:

acquiring the sample pair set;

calculating sample pre-estimated scores obtained by inputting sample data of each sample pair in the sample pair set into an initial target model;

Calculating a loss value of the initial target model passing through each sample pair in the sample pair set according to the sample pre-estimated score;

and updating the model parameters of the initial target model according to the loss value of each sample pair to obtain the multi-target recommended model.

3. The method of claim 2, wherein calculating a loss value for the initial target model through each sample pair in the set of sample pairs based on the sample pre-estimated scores comprises:

acquiring a weight value of each sample pair in the sample pair set;

and calculating a loss value of the initial target model passing through each sample pair in the sample pair set according to the estimated score and the weight value.

4. A method according to claim 3, wherein said obtaining a weight value for each sample pair in said set of sample pairs comprises:

acquiring a tag value of sample data of each sample pair in the sample pair set, wherein the tag value is determined according to an operation target of the recommended object and a target value corresponding to the operation target of the recommended object;

and determining a weight value of each sample pair in the sample pair set according to the label value.

5. A method according to claim 3, wherein said obtaining a weight value for each sample pair in said set of sample pairs comprises:

acquiring a tag value of sample data of each sample pair in the sample pair set and an exposure position corresponding to the sample data of each sample pair in the sample pair set, wherein the tag value is determined according to an operation target of the recommended object and a target value corresponding to the operation target of the recommended object, and the exposure position is used for indicating a display position of the recommended object on a display page;

and determining a weight value of each sample pair in the sample pair set according to the label value and the exposure position.

6. The method of claim 4 or 5, wherein obtaining the tag value of the sample data for each sample pair in the set of sample pairs comprises:

setting an operation target aiming at the recommended object and a target value corresponding to the operation target according to the principle of balancing the loss value of the sample pair;

and assigning a label value to the sample data of each sample pair in the sample pair set according to the operation target and the target value.

7. The method of any one of claims 1 to 5, wherein the multi-objective recommendation model is a single tower structural model based on a Pairwise penalty.

8. A multi-objective recommendation device, comprising:

the processing module is used for inputting the first characteristics and the second characteristics into a multi-target recommendation model to obtain the estimated score of the object to be recommended, the multi-target recommendation model is obtained by training according to a sample pair set, one sample pair in the sample pair set comprises two different sample data of the same recommendation main body and ordering information of the two different sample data, the sample data comprises main body characteristic information of the sample recommendation main body and object characteristic information of the object to be recommended, and the main body characteristic information of the sample recommendation main body comprises historical behavior characteristics of the sample recommendation main body aiming at the object to be recommended and attribute characteristics of the sample recommendation main body;

9. A computer device, comprising: a memory, a processor, and a bus system;

wherein the memory is used for storing programs;

The processor being for executing a program in the memory, the processor being for executing the method of any one of claims 1 to 7 according to instructions in program code;

the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.

10. A computer readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 7.