CN117112880A

CN117112880A - Information recommendation and multi-target recommendation model training method and device and computer equipment

Info

Publication number: CN117112880A
Application number: CN202210520359.6A
Authority: CN
Inventors: 赵光耀; 何新昇; 赵忠; 梁瀚明; 马骊; 户维波
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2023-11-24

Abstract

The present application relates to an information recommendation method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: extracting bottom layer features based on object attribute features to obtain object extraction features, and carrying out deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target; performing interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the objects to be recommended on each business target; correcting the interaction probability corresponding to each business target according to the deviation degree of each business target to obtain each target interaction probability, and fusing the target interaction probabilities to obtain the fusion recommendation degree corresponding to the information to be recommended; recommending the information to be recommended to the terminal corresponding to the object to be recommended based on the fusion recommendation degree. By adopting the method, the accuracy of information recommendation can be improved.

Description

Information recommendation and multi-target recommendation model training method and device and computer equipment

Technical Field

The present application relates to the field of internet technologies, and in particular, to an information recommendation method, an information recommendation device, a multi-target recommendation model training device, a computer device, a storage medium, and a computer program product.

Background

With the development of artificial intelligence technology, an intelligent recommendation technology is presented, and when information recommendation is performed, a multi-objective recommendation model is generally used for performing information recommendation, the multi-objective recommendation model predicts a plurality of business targets simultaneously, and then prediction results of the business targets are fused to determine whether information recommendation is performed. At present, the attribute of a recommended object and the information to be recommended are generally obtained to perform global modeling, so as to obtain a multi-target recommended model, and then the multi-target recommended model through global modeling is used for information recommendation, however, the obtained multi-target recommended model can only learn global information through global modeling, the information recommendation is performed by using the global information, and the recommended information cannot be accurately determined, so that the accuracy of the recommended information is low.

Disclosure of Invention

In view of the foregoing, it is desirable to provide an information recommendation, multi-objective recommendation model training method, apparatus, computer device, computer readable storage medium, and computer program product that can improve the accuracy of information recommendation.

In one aspect, the application provides an information recommendation method. The method comprises the following steps:

acquiring object attribute characteristics and information characteristics to be recommended corresponding to the object to be recommended;

extracting bottom layer features based on object attribute features to obtain object extraction features, and carrying out deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target;

performing feature combination on the object attribute features and the information features to be recommended to obtain combined features, and performing interactive prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain the interactive possibility of the object to be recommended on each business target;

correcting the interaction probability corresponding to each business target according to the deviation degree of each business target to obtain each target interaction probability, and fusing the target interaction probabilities to obtain the fusion recommendation degree corresponding to the information to be recommended;

and recommending the information to be recommended to the terminal corresponding to the object to be recommended when the fusion recommendation degree meets the preset recommendation condition.

On the other hand, the application also provides an information recommendation device. The device comprises:

The feature acquisition module is used for acquiring object attribute features and information features to be recommended, wherein the object attribute features and the information features to be recommended correspond to the objects to be recommended;

the deflection prediction module is used for extracting bottom layer characteristics based on object attribute characteristics to obtain object extraction characteristics, and performing deflection prediction on each business target based on the object extraction characteristics to obtain the deflection degree of the object to be recommended on each business target;

the interaction prediction module is used for carrying out feature combination on the object attribute features and the information features to be recommended to obtain combined features, carrying out interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the objects to be recommended on each business target;

the fusion module is used for correcting the interaction probability corresponding to each business target through the deviation degree of each business target to obtain each target interaction probability, and fusing the target interaction probability to obtain the fusion recommendation degree corresponding to the information to be recommended;

and the recommending module is used for recommending the information to be recommended to the terminal corresponding to the object to be recommended when the fusion recommending degree accords with the preset recommending condition.

On the other hand, the application also provides computer equipment. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In another aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In another aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

According to the information recommendation method, the device, the computer equipment, the storage medium and the computer program product, the object extraction features are obtained through the bottom feature extraction of the object attribute features, deflection prediction is carried out on each business object based on the object extraction features to obtain the deflection degree of each business object to be recommended, then interaction prediction is carried out on each business object based on the combination features, the object extraction features and the information feature to be recommended to obtain the interaction possibility of each business object to be recommended, deflection correction is carried out on the interaction possibility corresponding to each business object by using the deflection degree of each business object to obtain each object interaction possibility, and fusion recommendation degrees corresponding to information to be recommended are obtained by fusing the interaction possibility of each object, namely, the object interaction possibility is corrected by separating object attribute parts during multi-object prediction and then using the deflection degree to obtain the target interaction possibility, the representation capability of the information part to be recommended is enhanced, and then the interaction possibilities of each object are fused, so that the obtained fusion recommendation degree is more accurate. And finally, when the fusion recommendation degree meets the preset recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended, thereby improving the accuracy of information recommendation.

In one aspect, the application provides a multi-objective recommendation model training method. The method comprises the following steps:

acquiring a positive sample and a negative sample corresponding to a training recommended object, wherein the positive sample comprises training recommended object characteristics and positive recommendation information characteristics, and the negative sample comprises training recommended object characteristics and negative recommendation information characteristics;

the training recommendation object characteristics and the forward recommendation information characteristics are input into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, training deflection degree and forward interaction probability of each business target by the training recommendation object are obtained, correction is conducted on each corresponding forward interaction probability through the training deflection degree of each business target, forward interaction probability of each target is obtained, forward interaction probability of each target is fused, and forward fusion recommendation degree corresponding to forward recommendation information is obtained;

the method comprises the steps of inputting training recommendation object characteristics and negative recommendation information characteristics into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and negative interaction probability of each business target by the training recommendation object, correcting the corresponding negative interaction probability by the training deflection degree of each business target, obtaining negative interaction probability of each target, fusing the negative interaction probability of each target, and obtaining negative fusion recommendation degree corresponding to negative recommendation information;

Model loss calculation is carried out based on the positive fusion recommendation degree and the negative fusion recommendation degree, and model loss information is obtained;

and reversely updating the first initial multi-target recommendation model based on the model loss information to obtain a first updated multi-target recommendation model, taking the first updated multi-target recommendation model as the first initial multi-target recommendation model, and returning to the step of obtaining the positive sample and the negative sample corresponding to the training recommendation object for execution until the first training completion condition is reached, so as to obtain the first multi-target recommendation model.

On the other hand, the application also provides a multi-target recommendation model training device. The device comprises:

the sample pair acquisition module is used for acquiring positive samples and negative samples corresponding to the training recommended objects, wherein the positive samples comprise training recommended object characteristics and positive recommended information characteristics, and the negative samples comprise training recommended object characteristics and negative recommended information characteristics;

the forward prediction module is used for inputting the characteristics of the training recommended object and the characteristics of the forward recommended information into the first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining the training deflection degree and forward interaction possibility of the training recommended object on each business target, rectifying the corresponding forward interaction possibility through the training deflection degree of each business target, obtaining the forward interaction possibility of each target, fusing the forward interaction possibilities of each target, and obtaining the forward fused recommendation degree corresponding to the forward recommended information;

The negative prediction module is used for inputting the characteristics of the training recommended object and the characteristics of the negative recommended information into the first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining the training deflection degree and the negative interaction possibility of the training recommended object on each business target, correcting the corresponding negative interaction possibility through the training deflection degree of each business target, obtaining the negative interaction possibility of each target, fusing the negative interaction possibility of each target, and obtaining the negative fusion recommended degree corresponding to the negative recommended information;

the loss calculation module is used for calculating model loss based on the positive fusion recommendation degree and the negative fusion recommendation degree to obtain model loss information;

the first iteration module is used for reversely updating the first initial multi-target recommendation model based on the model loss information to obtain a first updated multi-target recommendation model, taking the first updated multi-target recommendation model as the first initial multi-target recommendation model, and returning to the step of obtaining the positive sample and the negative sample corresponding to the training recommendation object for execution until the first training completion condition is reached, so that the first multi-target recommendation model is obtained.

According to the multi-target recommendation model training method, device, computer equipment, storage medium and computer program product, the multi-target prediction is carried out on the positive sample to obtain the positive fusion recommendation degree, the accuracy of the positive fusion recommendation degree is improved, then the multi-target prediction is carried out on the negative sample to obtain the negative fusion recommendation degree, the accuracy of the negative fusion recommendation degree is improved, then model loss calculation is carried out by using the positive fusion recommendation degree and the negative fusion recommendation degree to obtain model loss information, the accuracy of the model loss information is improved, then the model loss information is used for training a first initial multi-target recommendation model, and when the first training completion condition is reached, the first multi-target recommendation model is obtained, so that the accuracy of the first multi-target recommendation model is improved, and the accuracy of information recommendation is improved by the first multi-target recommendation model.

acquiring a training sample, wherein the training sample comprises training recommendation object information, training recommendation information and each interaction label;

inputting training recommendation object characteristics and training recommendation information characteristics into a second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and training interaction possibility of each business target by a training recommendation object, correcting corresponding training interaction possibility through the training deflection degree of each business target, obtaining training interaction possibility of each target, fusing the training interaction possibility of each target, obtaining training fusion recommendation degree corresponding to training recommendation information, fusing the training deflection degree of each business target, obtaining training deflection recommendation degree corresponding to training recommendation information, calculating the sum of the training deflection recommendation degree and the training fusion recommendation degree, and obtaining training target recommendation degree;

fusion loss calculation is carried out based on each interactive label and the recommended degree of the training target to obtain fusion loss information, and deviation adjustment is carried out on the fusion loss information by using each training deviation degree to obtain target loss information;

And reversely updating the second initial multi-target recommendation model based on the target loss information to obtain a second updated multi-target recommendation model, taking the second updated multi-target recommendation model as the second initial multi-target recommendation model, and returning to the step of obtaining the training sample for execution until a second training completion condition is reached, so as to obtain the second multi-target recommendation model.

the sample acquisition module is used for acquiring a training sample, wherein the training sample comprises training recommendation object information, training recommendation information and each interaction label;

the training module is used for inputting the training recommendation object characteristics and the training recommendation information characteristics into a second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and training interaction possibility of each business target by the training recommendation object, correcting the corresponding training interaction possibility through the training deflection degree of each business target, obtaining each target training interaction possibility, fusing the target training interaction possibility, obtaining training fusion recommendation degree corresponding to training recommendation information, fusing the training deflection degree of each business target, obtaining training deflection recommendation degree corresponding to training recommendation information, calculating the sum of the training deflection recommendation degree and the training fusion recommendation degree, and obtaining the training target recommendation degree;

The target loss calculation module is used for carrying out fusion loss calculation based on each interactive label and the recommended degree of the training target to obtain fusion loss information, and carrying out deflection adjustment on the fusion loss information by using each training deflection degree to obtain target loss information;

the second iteration module is used for reversely updating the second initial multi-target recommendation model based on the target loss information to obtain a second updated multi-target recommendation model, taking the second updated multi-target recommendation model as the second initial multi-target recommendation model, and returning to the step of obtaining the training sample for execution until a second training completion condition is reached to obtain the second multi-target recommendation model.

According to the multi-target recommendation model training method, device, computer equipment, storage medium and computer program product, the training recommendation object characteristics and the training recommendation information characteristics are input into the second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each service target, the training deflection degree and the training interaction possibility of each service target by the training recommendation object are obtained, the corresponding training interaction possibility is rectified through the training deflection degree of each service target, the target training interaction possibility is obtained, the target training interaction possibility is fused, the training fusion recommendation degree corresponding to training information is obtained, the training deflection degree of each service target is fused, the training deflection recommendation degree corresponding to training recommendation information is obtained, the sum of the training deflection recommendation degree and the training fusion recommendation degree is calculated, the accuracy of the training target recommendation degree is improved, then the fusion loss information is calculated by using the training target recommendation degree, the fusion loss information is obtained by using the training deflection degree, the accuracy of the obtained target loss information is improved, the second initial multi-target training information is used, and when the second initial multi-target recommendation model is completed, the accuracy of the second multi-target model is obtained.

Drawings

FIG. 1 is an application environment diagram of an information recommendation method in one embodiment;

FIG. 2 is a flow chart of an information recommendation method according to an embodiment;

FIG. 3 is a flow chart illustrating the obtaining of fusion recommendation level in one embodiment;

FIG. 4 is a diagram illustrating a network architecture of a first multi-objective predictive network in one embodiment;

FIG. 5 is a schematic diagram of a network architecture of an interactive converged network in one embodiment;

FIG. 6 is a schematic diagram of a network structure for biasing recommendation in one embodiment;

FIG. 7 is a flow chart of a multi-objective recommendation model training method in one embodiment;

FIG. 8 is a flow diagram of obtaining model loss information in one embodiment;

FIG. 9 is a flowchart of a multi-objective recommendation model training method according to another embodiment;

FIG. 10 is a flow chart illustrating the obtaining of recommended training targets according to one embodiment;

FIG. 11 is a flow diagram of obtaining target loss information in one embodiment;

FIG. 12 is a flow diagram of obtaining fusion loss information in one embodiment;

FIG. 13 is a flow chart of obtaining sample weights in one embodiment;

FIG. 14 is a schematic diagram illustrating a second multi-objective recommendation model for performing rectification in one embodiment;

FIG. 15 is a schematic diagram of a news recommendation page in the embodiment of FIG. 14;

FIG. 16-A is a diagram showing a comparison of the click rate of the text in one embodiment;

FIG. 16-B is a diagram showing the comparison of the test result indicators in the embodiment of FIG. 16-A;

FIG. 17-A is a diagram showing a comparison of the click rate of the text in another embodiment;

FIG. 17-B is a diagram showing the comparison of the test result indicators in the embodiment of FIG. 17-A;

FIG. 18 is a block diagram showing an information recommending apparatus in one embodiment;

FIG. 19 is a block diagram of a multi-objective recommendation model training apparatus in one embodiment;

FIG. 20 is a block diagram of a multi-objective recommendation model training apparatus in accordance with another embodiment;

FIG. 21 is an internal block diagram of a computer device in one embodiment;

fig. 22 is an internal structural view of a computer device in another embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The information recommendation method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on the cloud or other servers. The server 104 acquires object attribute characteristics and information characteristics to be recommended corresponding to the object to be recommended from the data storage system; the server 104 performs bottom layer feature extraction based on the object attribute features to obtain object extraction features, and performs deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target; the server 104 performs feature combination on the object attribute features and the information features to be recommended to obtain combined features, and performs interactive prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain the interaction possibility of the objects to be recommended on each business target; the server 104 rectifies the interaction probability corresponding to each business target through the deviation degree of each business target to obtain each target interaction probability, and fuses each target interaction probability to obtain a fusion recommendation degree corresponding to the information to be recommended; when the fusion recommendation degree meets the preset recommendation condition, the server 104 recommends the information to be recommended to the terminal 102 corresponding to the object to be recommended. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, an information recommendation method is provided, and is described by taking an example that the method is applied to the server in fig. 1 as an application, it is to be understood that the method can also be applied to the server, and can also be applied to a system including a terminal and a server, and is implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

step 202, obtaining object attribute features and information features to be recommended corresponding to the object to be recommended.

The object to be recommended refers to an object requiring recommendation information, wherein the object comprises a real object or a virtual object, and the real object comprises unlimited people, animals and the like. The virtual object may refer to an object that is webcast using an avatar. Object attribute features refer to features corresponding to attributes of an object to be recommended, and the object attribute features can be used for characterizing the object to be recommended. The object attribute features include, but are not limited to, basic attribute features, social relationship features, and consumption capability features of the object to be recommended, and the like. The basic attribute features are used for representing basic attributes of the object to be recommended, the social relationship features are used for representing social relationships of the object to be recommended, and the consumption capability features are used for representing consumption capability of the object to be recommended. The feature of the information to be recommended refers to the feature of the information to be recommended, the information to be recommended refers to information required to judge whether to recommend to the object to be recommended, and the information can be multimedia data, including but not limited to video, image, text, voice and the like.

Specifically, the server may obtain, from the database, an object attribute feature and an information feature corresponding to the object to be recommended. The server may obtain, from a server providing the data service, an object attribute feature and an information feature to be recommended corresponding to the object to be recommended. The server can also acquire object attribute characteristics and information characteristics to be recommended corresponding to the object to be recommended, which are uploaded by the terminal. The server can also acquire the object attribute characteristics corresponding to the object to be recommended and the information characteristics to be recommended from the Internet. In one embodiment, the server may obtain object attribute information and information to be recommended corresponding to the object to be recommended, and then perform data preprocessing on the object data information and the information to be recommended to obtain object attribute features and information features to be recommended.

And 204, extracting bottom layer features based on the object attribute features to obtain object extraction features, and carrying out deflection prediction on each business target based on the object extraction features to obtain the deflection degree of the object to be recommended on each business target.

The object extraction feature is obtained by extracting semantic features of object attribute features, and is a low-dimensional representation of the object. Business objectives refer to objectives of recommended business to be reached after recommendation, which may include clicking, reading, interacting, exposing, etc. The bias degree is used for representing the tendency of the object to be recommended to the business target, and the higher the bias degree of the business target is, the higher the tendency of the object to be recommended to the business target is, which means that the business target is more likely to be realized. Different objects to be recommended tend to be different business objectives. For example, elderly recommended objects tend to read for long periods of time, and low-age recommended objects tend to interact.

Specifically, the server uses the deep neural network to perform bottom layer feature extraction, namely, compresses object attribute features through the deep neural network to obtain object extraction features. And then performing multi-target deflection prediction through a deep neural network by using the object extraction characteristics, namely performing deflection prediction on each business target to obtain the deflection degree of the object to be recommended on each business target.

And 206, performing feature combination on the object attribute features and the information features to be recommended to obtain combined features, and performing interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the object to be recommended on each business target.

Wherein the feature combination is a composite feature formed by combining (multiplying or calculating a cartesian product) the individual features. The feature combinations help represent the non-linear relationship. The combined feature refers to a feature obtained by performing feature combination or feature interleaving.

The interaction possibility is used for representing the possibility of interaction of the object to be recommended to the corresponding business target. The higher the interaction possibility of the business object is, the more likely the object to be recommended and the information to be recommended are interacted correspondingly to the business object. For example, the higher the interaction probability of clicking, the higher the probability that the corresponding information to be recommended is clicked by the object to be recommended.

Specifically, the server multiplies the object attribute feature and the information feature to be recommended or calculates a Cartesian product to obtain a combined feature, and the server can also perform cross exchange on the object attribute feature and the information feature to be recommended to obtain the combined feature. And then carrying out multi-target interaction prediction on the combined features, the object extraction features and the information features to be recommended through a deep neural network, namely carrying out interaction prediction on each business target to obtain the interaction possibility of the output objects to be recommended on each business target. In one embodiment, the network structure of the multi-objective bias prediction and the multi-objective interaction prediction are the same, and the network parameters are different.

And step 208, correcting the interaction probability corresponding to each business target according to the deviation degree of each business target to obtain each target interaction probability, and fusing each target interaction probability to obtain the fusion recommendation degree corresponding to the information to be recommended.

The target interaction possibility refers to interaction possibility after the interaction possibility is adjusted by using the deviation degree, and the target interaction possibility is used for representing interaction possibility of a part irrelevant to an object to be recommended, namely, the part is predicted by information to be recommended. And correcting the interaction possibility corresponding to each business target by using the deviation degree of the business target to obtain the target interaction possibility, and improving the accuracy of the obtained target interaction possibility. Each business objective gets a corresponding objective interaction likelihood. The fusion recommendation degree is used for representing the recommendation degree of the information to be recommended to the object to be recommended, and the higher the fusion recommendation degree is, the more the information to be recommended corresponding to the object to be recommended can be recommended to the object to be recommended.

Specifically, the server may correct the interaction probability corresponding to each service target by using the deviation degree of each service target, which may be to calculate the difference between the interaction probability of each service target and the deviation degree corresponding to each service target, to obtain target interaction probability, and then may fuse each target interaction probability to obtain a fused recommendation degree corresponding to the information to be recommended, where the server may fuse each target interaction probability by using a deep neural network to obtain a fused recommendation degree. The server may also weight each target interaction probability by using a preset weight of the service target, and calculate a weighted sum to obtain the fusion recommendation degree.

And 210, recommending the information to be recommended to the terminal corresponding to the object to be recommended when the fusion recommendation degree meets the preset recommendation condition.

The preset recommendation condition refers to a preset condition for recommendation, and may be a condition for reaching a recommendation degree threshold, or a condition for reaching a target sorting position in a fusion recommendation degree sequence corresponding to each piece of information to be recommended by fusion recommendation degree.

Specifically, the server may determine whether the fusion recommendation degree meets a preset recommendation condition, compare the fusion recommendation degree with a recommendation degree threshold, and when the fusion recommendation degree exceeds the recommendation degree threshold, determine a position of the fusion recommendation degree in the fusion recommendation degree sequence, and when the position is in a target sorting position, and meet the recommendation condition, recommend the information to be recommended to the terminal corresponding to the object to be recommended at the moment. The terminal comprises, but is not limited to, a mobile phone, a computer, an intelligent voice interaction device, an intelligent household appliance, a vehicle-mounted terminal, an aircraft and the like.

According to the information recommendation method, the information recommendation device, the computer equipment, the storage medium and the computer program product, the bottom layer feature extraction is carried out through the object attribute features, the object extraction features are obtained, deflection prediction is carried out on each business object based on the object extraction features, the deflection degree of each business object is obtained, interaction prediction is carried out on each business object based on the combination features, the object extraction features and the information feature to be recommended, the interaction possibility of each business object is obtained, deviation correction is carried out on the interaction possibility corresponding to each business object by using the deflection degree of each business object, each object interaction possibility is obtained, fusion recommendation degree corresponding to information to be recommended is obtained, namely, the object attribute parts are separated during multi-object prediction, then the interaction possibility is corrected by using the deflection degree, the target interaction possibility is obtained, the representation capacity of the information part to be recommended is enhanced, and the interaction possibilities of each business object are fused, so that the obtained fusion recommendation degree is more accurate. And finally, when the fusion recommendation degree meets the preset recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended, thereby improving the accuracy of information recommendation.

In one embodiment, as shown in fig. 3, the information recommendation method further includes:

step 302, inputting object attribute characteristics and information characteristics to be recommended into a first multi-target recommendation model;

step 304, extracting bottom layer features based on object attribute features through a first multi-target recommendation model to obtain object extraction features, and carrying out deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target;

step 306, performing feature combination on the object attribute features and the information features to be recommended through a first multi-target recommendation model to obtain combined features, and performing interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the object to be recommended on each business target;

step 308, correcting the interaction probability corresponding to each business target by using the deviation degree of each business target through the first multi-target recommendation model to obtain each target interaction probability, and fusing each target interaction probability to obtain the fused recommendation degree corresponding to the information to be recommended.

The first multi-target recommendation model is a multi-target prediction model established by using a deep neural network and is used for recommending based on a plurality of business targets. The first multi-objective recommendation model may be trained using a training sample pair comprising a positive sample and a negative sample, the positive sample being a sample having recommendation information that has been recommended to a training subject. The negative sample refers to a sample having recommendation information that is not recommended to the training object. The training target of the first multi-target recommendation model during training is to maximize the difference between the fusion recommendation degree corresponding to the positive sample and the fusion recommendation degree corresponding to the negative sample.

Specifically, the server trains the obtained first multi-target recommendation model by using the training sample in advance, and then deploys the first multi-target recommendation model. After the server acquires the object attribute characteristics and the information characteristics to be recommended, a first multi-target recommendation model is called, the object attribute characteristics and the information characteristics to be recommended are used as the input of the first multi-target recommendation model, deflection prediction is carried out on each business target when the first multi-target recommendation model acquires the input object attribute characteristics and the information characteristics to be recommended, the deflection degree of each business target by the object to be recommended is obtained, interaction prediction is carried out on each business target at the same time, the interaction possibility of the object to be recommended on each business target is obtained, and fusion is carried out after deviation correction, so that the fusion recommendation degree corresponding to the information to be recommended is obtained. Recommendation prediction is carried out by calling the trained first multi-target recommendation model, so that fusion recommendation degree corresponding to information to be recommended is obtained, and recommendation efficiency can be improved.

In one embodiment, the first multi-objective recommendation model includes a first multi-objective prediction network and an interaction fusion network, the first multi-objective prediction network including a first bias prediction sub-network and a first interaction prediction sub-network:

Step 304, extracting bottom layer features based on object attribute features to obtain object extraction features, and performing deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target, including:

inputting the object attribute feature into a first bias prediction sub-network in a first multi-target prediction network; and performing bottom-layer feature extraction on the object attribute features through the first deflection prediction sub-network to obtain object extraction features, and performing deflection prediction on each business target based on the object extraction features to obtain the deflection degree of the object to be recommended on each business target.

Wherein the first multi-objective prediction network refers to a deep neural network for multi-objective prediction. The first deflection prediction sub-network refers to a deep neural network for predicting the deflection degree of each business object. The first bias prediction subnetwork is a subnetwork in a first multi-objective prediction network. The deep neural network may be a DNN (Deep Neural Networks, deep neural network) network, or may be a CNN (Recurrent Neural Network ) network, RNN (Convolutional Neural Network, convolutional neural network) network, or the like.

Specifically, the server uses a first deflection prediction sub-network in the first multi-objective recommendation model to extract bottom layer characteristics, then carries out deflection prediction on each business objective, and outputs the deflection degree of the object to be recommended on each business objective.

Step 306, performing interaction prediction on each business object based on the combined feature, the object extraction feature and the information feature to be recommended to obtain interaction possibility of the object to be recommended on each business object, comprising the steps of:

inputting the combined characteristics, the object extraction characteristics and the information characteristics to be recommended into a first interaction prediction sub-network in a first multi-target prediction network; and carrying out interaction prediction on each business target by using the combined characteristics, the object extraction characteristics and the information characteristics to be recommended through the first interaction prediction sub-network to obtain the interaction possibility of the object to be recommended on each business target.

The first interaction prediction sub-network refers to a deep neural network for performing interaction possibility prediction of each business target.

Specifically, the server can predict the deviation degree of each business target and simultaneously predict the interaction of the business targets, namely, the combination features, the object extraction features and the information features to be recommended are input into a first interaction prediction sub-network in a first multi-target prediction network, so that the interaction possibility of the output objects to be recommended to each business target is obtained.

Step 308, fusing the interaction possibilities of the targets to obtain a fused recommendation degree corresponding to the information to be recommended, including the steps of:

and splicing all the target interaction possibilities to obtain a spliced vector, and inputting the spliced vector into an interaction fusion network to fuse, so as to obtain a fusion recommendation degree corresponding to the information to be recommended.

The spliced vector is a vector obtained by connecting the head and the tail. An interaction fusion network is a deep neural network used to fuse interaction possibilities.

Specifically, the server takes each target interaction possibility as a vector element, sequentially connects the targets end to obtain a spliced vector, and then inputs the spliced vector into an interaction fusion network to fuse, so as to obtain the fusion recommendation degree corresponding to the output information to be recommended.

In a specific embodiment, the first multi-objective prediction network may be a multi-objective prediction network using PLE (a new hierarchical extraction multi-objective learning network structure) structure, as shown in fig. 4, which is a network structure diagram of the first multi-objective prediction network, and the first multi-objective prediction network is used for making news recommendations, where the news recommendations have multiple business targets, including click business targets, duration business targets, and interaction business targets. The first multi-target prediction network comprises two layers of network structures, each layer is a multi-gate mixed expert network, the multi-gate mixed expert network is a common network structure for multi-target learning, the expert network is a DNN network structure, a plurality of expert networks are used for extracting different characteristics, and gating is used for distributing the weight of each expert. Inputting the features of the object to be recommended into a first multi-target prediction network, extracting low-level features through a multi-gate control hybrid expert network of the first layer to obtain object extraction features, and predicting each interaction deflection degree prediction task through a multi-task network to obtain each output interaction deflection degree. And simultaneously, splicing the combined features, the object extraction features and the information features to be recommended, inputting the spliced features and the information features to be recommended into a multi-gate control hybrid expert network of a second layer to extract low-level features, and predicting each interaction possibility to obtain each interaction possibility. Each predictive task predicts a corresponding interaction likelihood and bias. The first multi-target prediction network finally outputs the deflection score and the interaction score corresponding to the click service target, the deflection score and the interaction score corresponding to the duration service target and the deflection score and the interaction score corresponding to the interaction service target. Further, as shown in fig. 5, a network structure diagram of an interaction fusion network is provided, wherein the interaction likelihood and bias degree can be represented by probability or score. The method comprises the steps of subtracting a deflection score from an interaction score of a click service target to obtain a click target interaction score, subtracting the deflection score from an interaction score of a duration service target to obtain a duration target interaction score, subtracting the deflection score from an interaction score of an interaction service target to obtain an interaction score of an interaction target, and then inputting the click target interaction score, the duration target interaction score and the interaction score into an interaction fusion network DNN for fusion after splicing to obtain an output fusion score.

In the above embodiment, the output fusion recommendation degree is obtained by performing multi-target prediction by using the first multi-target prediction network, and the target interaction possibility corresponding to each business target is obtained by performing multi-target prediction by using the first multi-target prediction network, and then the target interaction possibilities are fused by the interaction fusion network, so that the accuracy of the obtained fusion recommendation degree is improved.

In one embodiment, correcting the interaction probability corresponding to each business object according to the deviation degree of each business object to obtain each object interaction probability, including the steps of:

and calculating the difference between the interaction possibility of each business target and the corresponding deviation degree to obtain the target interaction possibility of each business target.

Specifically, the server subtracts the deviation degree corresponding to the business target from the interaction probability of the business target to obtain the target interaction probability of the business target, traverses each business target to obtain the target interaction probability of each business target, and improves the accuracy of the obtained target interaction probability. In a specific embodiment, the target interaction likelihood may be calculated using equation (1) below.

Wherein,refers to the target interaction likelihood of the business target. />Refers to the interaction possibilities of business objectives. />Refers to the degree of bias of the business objectives. Task refers to business objectives.

In one embodiment, the information recommendation method further comprises the steps of:

fusing the deflection degrees of all the business targets to obtain deflection recommendation degrees corresponding to the information to be recommended; and calculating the sum of the deviation recommendation degree and the fusion recommendation degree to obtain the target recommendation degree.

The biased recommendation degree is the possibility of recommending information to be recommended, which is obtained by using the biased degree. The target recommendation degree is used for representing the possibility of recommending the information to be recommended to the object to be recommended.

Specifically, the server may further fuse the bias degrees of the service targets, where the bias recommendation degrees corresponding to the information to be recommended may be obtained by fusing the bias degrees through a deep neural network or fusing the bias degrees by using preset weights of the service targets.

In a specific embodiment, the target recommendation level may be calculated using the following formula (2).

log it＝log it _bias +log it _debias Formula (2)

Where log it refers to the target recommendation level. log it _bias Refers to biasing toward the recommended level. log it _debias Refers to the degree of fusion recommendation.

Step 210, when the fusion recommendation degree meets the preset recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended, including the steps of:

and when the target recommendation degree accords with the preset target recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended.

The preset target recommendation condition refers to a preset condition for recommending information to be recommended to a terminal corresponding to an object to be recommended, and the preset target recommendation condition includes, but is not limited to, reaching a preset recommendation threshold or reaching a preset sorting position.

Specifically, the server judges whether the target recommendation degree meets a preset target recommendation condition, and when the target recommendation degree meets the preset target recommendation condition, for example, when the target recommendation degree reaches a preset recommendation threshold, the server recommends the information to be recommended to the terminal corresponding to the object to be recommended.

In one embodiment, the bias degree of each business target is fused to obtain a bias recommendation degree corresponding to the information to be recommended, and the method further includes the steps of:

and splicing the deflection degrees of the business targets to obtain deflection splicing vectors, and inputting the deflection splicing vectors into a deflection fusion network to fuse so as to obtain deflection recommendation degrees corresponding to the information to be recommended.

The bias convergence network refers to a deep neural network for converging the bias degree of each business target.

Specifically, the server takes the deflection degree of each business target as an element in the vector, sequentially performs head-to-tail splicing to obtain a deflection splicing vector, and then inputs the deflection splicing vector into a deflection fusion network to fuse by using network parameters to obtain the deflection recommendation degree corresponding to the output information to be recommended.

In a specific embodiment, as shown in fig. 6, in order to obtain a network structure schematic diagram of the bias recommendation degree, the server splices the bias score of the click service target, the bias score of the duration service target and the bias score of the interaction service target, inputs the bias splice into the bias fusion network to obtain an output bias fusion score, calculates the sum of the bias fusion score and the interaction fusion score to obtain a target score, and then determines whether to recommend information to a terminal corresponding to an object to be recommended according to the target score, for example, if the target score exceeds a score threshold 95, the server recommends the information to the terminal corresponding to the object to be recommended. For example, when the target score of the information to be recommended is the first three in the sequence of the target score sequences corresponding to all the information to be recommended, recommending is performed to the terminal corresponding to the object to be recommended.

In one embodiment, the server may also directly use the interaction fusion score to compare with the preset target recommendation condition, and recommend the information to be recommended to the terminal corresponding to the object to be recommended when the interaction fusion score meets the preset target recommendation condition.

In the above embodiment, the multi-target recommendation fusion is performed on the recommended object attribute and the information to be recommended by separating the recommended object attribute and the information to be recommended during the multi-target fusion, and finally the deflection recommendation degree and the fusion recommendation degree are obtained. The method has the advantages that the representation capability of the information to be recommended is enhanced, the accuracy of the obtained deflection recommendation degree and the fusion recommendation degree is improved, then the sum of the deflection recommendation degree and the fusion recommendation degree is calculated, the target recommendation degree is obtained, and the accuracy of the obtained target recommendation degree is improved.

In one embodiment, as shown in fig. 7, a multi-objective recommendation model training method is provided, and is described by taking the application of the method to the server in fig. 1 as an example, it is to be understood that the method can also be applied to the server, and can also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

Step 702, a positive sample and a negative sample corresponding to the training recommended object are obtained, wherein the positive sample comprises training recommended object characteristics and positive recommendation information characteristics, and the negative sample comprises training recommended object characteristics and negative recommendation information characteristics.

The training recommendation object refers to an object to be recommended in training. The training recommended object features refer to features corresponding to the training recommended object attribute information. The forward recommendation information feature refers to a feature of information that has been recommended to a training recommendation object. The negative recommendation information feature refers to a feature of information which is not recommended to the training recommended object, and the recommendation degree corresponding to the information which is not recommended to the training recommended object does not accord with a preset recommendation condition.

Specifically, the server may obtain positive samples and negative samples corresponding to the training recommended object from the database, or may obtain positive samples and negative samples corresponding to the training recommended object from a server providing the data service. The server may also obtain positive samples and negative samples corresponding to the training recommended object from the terminal.

Step 704, inputting the training recommendation object features and the forward recommendation information features into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degrees and forward interaction possibilities of the training recommendation objects on each business target, correcting the corresponding forward interaction possibilities through the training deflection degrees of each business target, obtaining forward interaction possibilities of each target, fusing the forward interaction possibilities of each target, and obtaining forward fused recommendation degrees corresponding to forward recommendation information.

The first initial multi-target recommendation model refers to a first multi-target recommendation model initialized by model parameters. The training bias degree refers to a bias degree predicted during training, and the forward interaction probability refers to an interaction probability predicted during training by using a positive sample. The target forward interaction probability refers to the target interaction probability predicted by using a positive sample during training. The forward fusion recommendation degree refers to a fusion recommendation degree obtained by using positive sample prediction in training.

Specifically, the training recommended object features and the forward recommended information features are input into a first initial multi-target recommended model, the first initial multi-target recommended model uses the training recommended object features to conduct bottom feature extraction to obtain training object extraction features, deflection prediction is conducted on each business target based on the training object extraction features, and training deflection degree of each business target by the training recommended object is obtained. And combining the training recommended object features and the forward recommended information features to obtain forward combined features, and performing interaction prediction on each business target by using the forward combined features, the training object extraction features and the forward recommended information features to obtain the forward interaction possibility of the training recommended object on each business target. Correcting the corresponding forward interaction probability through the training deviation degree of each business target to obtain the forward interaction probability of each target, and fusing the forward interaction probability of each target to obtain the forward fusion recommendation degree corresponding to the forward recommendation information.

Step 706, inputting the training recommendation object features and the negative recommendation information features into the first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and negative interaction probability of each business target by the training recommendation object, correcting the corresponding negative interaction probability by the training deflection degree of each business target, obtaining negative interaction probability of each target, fusing the negative interaction probability of each target, and obtaining the negative fusion recommendation degree corresponding to the negative recommendation information.

Wherein, the negative interaction probability refers to the interaction probability predicted by using a negative sample in training. The target negative interaction probability refers to the target interaction probability predicted by using a negative sample in training. The negative fusion recommendation degree refers to the fusion recommendation degree obtained by using negative sample prediction in training.

Specifically, the training recommended object features and the negative recommendation information features are input into a first initial multi-target recommendation model, the first initial multi-target recommendation model uses the training recommended object features to conduct bottom feature extraction to obtain training object extraction features, deflection prediction is conducted on each business target based on the training object extraction features, and training deflection degree of the training recommended objects on each business target is obtained. And combining the training recommended object features and the negative recommended information features to obtain negative combined features, and performing interactive prediction on each business target by using the negative combined features, the training object extraction features and the negative recommended information features to obtain the negative interaction possibility of the training recommended object on each business target. And correcting the corresponding negative interaction probability through the training deviation degree of each business target to obtain the negative interaction probability of each target, and fusing the negative interaction probability of each target to obtain the negative fusion recommendation degree corresponding to the negative recommendation information.

Step 708, performing model loss calculation based on the positive fusion recommendation degree and the negative fusion recommendation degree to obtain model loss information.

The model loss information is used for representing errors between the positive fusion recommendation degree and the negative fusion recommendation degree.

Specifically, the server calculates an error between the positive fusion recommendation degree and the negative fusion recommendation degree by using the loss function, and model loss information is obtained.

Step 710, reversely updating the first initial multi-target recommendation model based on the model loss information to obtain a first updated multi-target recommendation model, taking the first updated multi-target recommendation model as the first initial multi-target recommendation model, and returning to the step of obtaining the positive sample and the negative sample corresponding to the training recommendation object for execution until the first training completion condition is reached, thereby obtaining the first multi-target recommendation model.

The first training completion condition refers to a condition for training to obtain a first multi-target recommended model, and includes, but is not limited to, training reaching a maximum iteration upper limit, model loss reaching a preset threshold, or model parameters no longer changing. The first multi-target recommendation model is a multi-target recommendation model obtained through training and is used for recommending information to a recommendation object.

Specifically, when model loss information is obtained, the server judges whether a first training completion condition is reached, and when the training completion condition is not reached, the initial model parameters are reversely updated by using a gradient descent algorithm, namely, model parameters in a first initial multi-target recommendation model are reversely updated by using the model loss information, and an updated first initial multi-target recommendation model, namely, a first updated multi-target recommendation model is obtained. And then taking the first updated multi-target recommendation model as a first initial multi-target recommendation model, and returning to the step of acquiring the positive sample and the negative sample corresponding to the training recommendation object for execution until the first training completion condition is reached, and taking the first initial multi-target recommendation model when the first training completion condition is reached as the first multi-target recommendation model.

According to the multi-target recommendation model training method, the positive sample is subjected to multi-target prediction to obtain the positive fusion recommendation degree, the accuracy of the positive fusion recommendation degree is improved, then the negative sample is subjected to multi-target prediction to obtain the negative fusion recommendation degree, the accuracy of the negative fusion recommendation degree is improved, then the model loss calculation is carried out by using the positive fusion recommendation degree and the negative fusion recommendation degree to obtain the model loss information, the accuracy of the model loss information is improved, then the first initial multi-target recommendation model is trained by using the model loss information, and when the first training completion condition is reached, the first multi-target recommendation model is obtained, so that the accuracy of the first multi-target recommendation model is improved, and the accuracy of information recommendation is improved by the first multi-target recommendation model.

In one embodiment, the first initial multi-objective recommendation model includes a first training multi-objective prediction network and a first training interaction fusion network;

step 704, inputting the training recommended object features and the forward recommended information features into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and forward interaction possibility of each business target by the training recommended object, correcting the corresponding forward interaction possibility through the training deflection degree of each business target, obtaining forward interaction possibility of each target, fusing the forward interaction possibility of each target, and obtaining forward fused recommendation degree corresponding to forward recommended information, and the method comprises the following steps:

and inputting the characteristics of the training recommended object and the characteristics of the forward recommended information into a first training multi-target prediction network to conduct deflection prediction and interaction prediction on each business target, so as to obtain training deflection degree and forward interaction possibility of the training recommended object on each business target. Correcting the training deviation degree of each business target to the corresponding forward interaction probability to obtain the forward interaction probability of each target, inputting the forward interaction probability of each target into a first training interaction fusion network to fuse, and obtaining the forward fusion recommendation degree corresponding to the forward recommendation information.

The first training multi-target prediction network refers to a first multi-target prediction network which needs to be trained. The first training interaction fusion network refers to an interaction fusion network needing training.

Specifically, the server inputs the training recommended object characteristics and the forward recommended information characteristics into a first training multi-target prediction network to conduct deflection prediction and interaction prediction on each business target, so as to obtain training deflection degree and forward interaction possibility of each business target by the training recommended object, and then fuses the forward interaction possibility of each target through a first training interaction fusion network, so as to obtain forward fusion recommendation degree corresponding to forward recommended information.

In a specific embodiment, the network structure of the first training multi-objective prediction network may be as shown in fig. 4. The network structure of the first training interaction fusion network may be as shown in fig. 5.

In one embodiment, as shown in fig. 8, step 708, performing model loss calculation based on the positive fusion recommendation level and the negative fusion recommendation level to obtain model loss information includes:

step 802, determining a positive sequence position corresponding to the positive sample based on the forward fusion recommendation degree, and performing damage accumulation gain calculation based on the positive sequence position to obtain positive sequencing quality corresponding to the positive sample.

The positive sequence position refers to the position of the positive fusion recommendation degree corresponding to the positive sample in the fusion recommendation degree sequence. The fusion recommendation degree sequence is obtained by sequencing the fusion recommendation degrees corresponding to a batch of training samples. Each training sample in the batch of training samples comprises a positive sample or a negative sample. The positive ordering quality is used to characterize the accuracy of positive sample ordering. The ranking quality may be calculated using a ranking evaluation index, for example, NDCG (Normalized Discounted cumulative gain, normalized impairment cumulative gain) or the like may be used as the evaluation index.

Specifically, the server uses the forward fusion recommendation degree to obtain a positive sequence position corresponding to the positive sample. The method comprises the steps of training a model according to batches, simultaneously using a plurality of training samples for each batch, training each training sample to obtain corresponding fusion recommendation degrees, sequencing the fusion recommendation degrees corresponding to the training samples of the batch from large to small to obtain a fusion recommendation degree sequence, and obtaining the position of the forward fusion recommendation degree corresponding to a positive sample in the fusion recommendation degree sequence to obtain a positive sequence position. And then, performing damage accumulation gain calculation by using the positive sequence position to obtain positive sequence quality corresponding to the positive sample.

Step 804, determining a negative sequence position corresponding to the negative sample based on the negative fusion recommendation degree, and performing damage accumulation gain calculation based on the negative sequence position to obtain negative sorting quality corresponding to the negative sample.

The negative sequence position refers to the position of the negative fusion recommendation degree corresponding to the negative sample in the fusion recommendation degree sequence. Negative ranking quality is used to characterize the accuracy of negative sample ranking.

Specifically, the server obtains a negative sequence position corresponding to the negative sample according to the negative fusion recommendation degree, wherein the position where the negative fusion recommendation degree is located can be determined from the fusion recommendation degree sequence, and the negative sequence position is obtained. And then, performing damage accumulation gain calculation by using the negative sequence position to obtain negative sorting quality corresponding to the negative sample.

Step 806, calculating the difference between the positive ranking quality and the negative ranking quality to obtain a relative quality.

Wherein the relative quality is used to characterize the difference in positive and negative sample ordering accuracy, the smaller the difference the higher the evaluation.

Specifically, the server calculates the difference between the positive ranking quality and the negative ranking quality, resulting in a relative quality.

And step 808, calculating the sample pair loss based on the positive fusion recommendation degree and the negative fusion recommendation degree to obtain sample pair loss information, and weighting the sample pair loss information by using the relative quality to obtain model loss information.

The sample pair loss information is used for representing errors of positive fusion recommendation degree and negative fusion recommendation degree.

Specifically, the server may calculate a loss between the positive fusion recommendation level and the negative fusion recommendation level using a loss function to obtain sample pair loss information, where the loss function may use a logarithmic loss function and a cross entropy loss function. The loss information is then weighted using the relative mass to the samples to obtain model loss information.

In one particular embodiment, the relative mass may be calculated using equation (3) as shown below and the sample pair loss information may be calculated using equation (4) as shown below.

Wherein w is _ndcg Refers to the relative mass, rank _positive It is meant that the position of the positive sequence,refers to the positive ordering quality, rank _negative Refers to the negative sequence position,/->Refers toNegative ranking quality.

loss _pair ＝-logsigmoid(log it _positive -logit _negative ) Formula (4)

Wherein loss is _pair Refers to the sample pair loss information, log it _positive Refers to the forward fusion recommended degree, log it _negative Refers to the degree of negative fusion recommendation.

In the above embodiment, the model loss information is obtained by calculating the relative mass and the sample pair loss and then weighting the sample pair loss by using the relative mass, thereby improving the accuracy of the obtained model loss information.

In one embodiment, step 808, weighting the loss information using the relative mass to the samples results in model loss information, comprising:

sample types corresponding to the positive sample and the negative sample are obtained, and sample pair weights are obtained based on the sample types; and weighting the loss information by using the sample weight and the relative mass to obtain model loss information.

The sample type refers to the type of the sample pair, and the service targets with the highest priorities corresponding to the positive sample and the negative sample are the same.

Specifically, the server acquires a service target with the highest priority corresponding to the positive sample and the negative sample, and takes the service target as a sample type corresponding to the positive sample and the negative sample. The higher the priority, the more important the corresponding traffic is explained. The priority may be set according to the need. For example, the priority of the interaction target may be set to be the highest, the next priority is a duration target and a click target in sequence, and if the sample pair has the highest priority for interaction with the service target, the sample type of the sample pair is the interaction type. And then acquiring a preset weight corresponding to the service target with the highest priority to obtain a sample pair weight, and calculating the product of the sample pair weight, the relative mass and the sample pair loss information to obtain model loss information.

In a specific embodiment, the sample pair weights are obtained using the following equation (5), and the model loss information may be calculated using the following equation (6).

Wherein w is _pair The weight of the sample pair is represented, and when the sample pair is of the interactive sample type, the weight of the sample pair takes a value of 8. When the sample pair is of a long sample type, the sample pair weight takes a value of 2. When the sample pair is a click sample type, the sample pair weight takes a value of 1.

loss1＝w _ndcg w _pair loss _pair Formula (6)

Where loss1 is model loss information. And calculating the product of the relative mass, the sample pair weight and the sample pair loss to obtain model loss information, and improving the accuracy of the obtained model loss information.

In one embodiment, as shown in fig. 9, a multi-objective recommendation model training method is provided, and is described by taking the application of the method to the server in fig. 1 as an example, it is to be understood that the method can also be applied to the server, and can also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:

in step 902, a training sample is obtained, where the training sample includes training recommendation object information, training recommendation information, and each interaction tag.

Wherein, the training sample is a sample used in training, and the training sample can be a plurality of samples. Training recommendation information refers to information that has been historically recommended to training recommendation objects, including but not limited to video, images, text, and speech. Training recommendation object information refers to information used in training that can characterize a training recommendation object, including, but not limited to, basic attribute information, social relationship information, consumption capability information, behavior information, and the like of the recommendation object. The basic attribute information is used for representing basic attributes of the recommended object, the social relationship information is used for representing social relationship of the recommended object, the consumption capability information is used for representing consumption capability of the recommended object, and the behavior information is used for representing behavior of the recommended object. Interactions refer to interactions of recommended objects with recommended information, including but not limited to clicking, reading, interacting, and the like. Training recommended objects refer to recommended objects at the time of training, which include real objects including unlimited persons, animals, and the like, or virtual objects. The virtual object may refer to an object that is webcast using an avatar.

The interactive label is a label used in training and used for representing the interactive result of the training recommended object on the training recommended information. The training recommendation object has different interactions on the training recommendation information, and different interactions can generate different interaction results, namely different interaction labels, wherein the interactions can be business targets, for example, the interaction labels can be click labels, and the business targets are the training recommendation objects click the training recommendation information.

Specifically, the server may obtain a training sample from the database, where the training sample includes training recommendation object information, training recommendation information corresponding to the training recommendation object information, and each interaction tag corresponding to the training recommendation object information, where each interaction tag is obtained after an interaction result of the training recommendation object on the training recommendation information is obtained. The server may also obtain training samples from a training sample set. The server may also obtain training samples from a service party providing the data service, and the server may also obtain training samples from a service party. The server can also obtain training samples uploaded by the terminal.

Step 904, inputting the training recommendation object characteristics and the training recommendation information characteristics into a second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degrees and training interaction possibilities of each business target by the training recommendation object, correcting the corresponding training interaction possibilities through the training deflection degrees of each business target, obtaining each target training interaction possibility, fusing the target training interaction possibilities, obtaining training fusion recommendation degrees corresponding to training recommendation information, fusing the training deflection degrees of each business target, obtaining training deflection recommendation degrees corresponding to training recommendation information, calculating the sum of the training deflection recommendation degrees and the training fusion recommendation degrees, and obtaining the training target recommendation degrees.

The second initial information recommendation model is an information recommendation model initialized by model parameters, and is used for recommending and predicting information and is an artificial intelligent model established by using a neural network. Model parameter initialization may be random initialization, zero initialization, gaussian distribution initialization, etc. The second initial information recommendation model is different from the first initial information recommendation model in network structure. The training bias degree refers to bias degree obtained by multi-service target prediction during training, and the training interaction possibility refers to interaction possibility obtained by multi-service target prediction during training. The training fusion recommendation degree refers to the fusion recommendation degree obtained when the second initial information recommendation model is trained. The training bias recommended degree refers to a bias recommended degree obtained during training. The training target recommendation degree refers to a target recommendation degree obtained during training.

Specifically, the server inputs the training recommendation object characteristics and the training recommendation information characteristics into a second initial information recommendation model, the second initial information recommendation model performs multi-task learning, namely, performs interactive deviation prediction by using the training recommendation object characteristics to obtain training deviation degrees of the training recommendation objects on all service targets, then performs all interactive prediction by using the training recommendation object characteristics and the training recommendation information characteristics to obtain training interaction possibility of the training recommendation objects on all service targets, finally performs deviation correction on the corresponding training interaction possibility by using the training deviation degrees of all service targets to obtain all target training interaction possibility, fuses all target training interaction possibility to obtain training fusion recommendation degrees corresponding to training recommendation information, fuses the training deviation degrees of all service targets to obtain training deviation recommendation degrees corresponding to training recommendation information, and calculates the sum of the training deviation recommendation degrees and the training fusion recommendation degrees to obtain the training target recommendation degrees.

Step 906, performing fusion loss calculation based on each interactive label and the recommended degree of the training target to obtain fusion loss information, and performing deflection adjustment on the fusion loss information by using each training deflection degree to obtain target loss information.

The fusion loss information is used for representing errors between the target recommendation degree and the real recommendation result of the training output. The target loss information is obtained by weighting the fusion loss information by using the training bias degree.

Specifically, the server may calculate errors between each interaction label and the fusion recommendation degree by using the loss function, so as to obtain loss of each learning task during multi-task learning, and then perform fusion loss calculation to obtain fusion loss information. And then the server acquires the weight of the training sample by using the deflection degree of each training, and performs weighted calculation on the fusion loss information by using the weight of the training sample to obtain the target loss information.

Step 908, reversely updating the second initial multi-target recommendation model based on the target loss information to obtain a second updated multi-target recommendation model, taking the second updated multi-target recommendation model as the second initial multi-target recommendation model, and returning to the step of obtaining the training sample for execution until the second training completion condition is reached, so as to obtain the second multi-target recommendation model.

The second training completion condition refers to a condition that training is completed by a second initial information recommendation model, and may include that training reaches a maximum iteration number, training loss information reaches a maximum threshold value, parameters of the model are not changed any more, and the like. The second updated information recommendation model refers to an information recommendation model after model parameter updating. The second multi-objective recommendation model refers to a trained second initial information recommendation model, and is used for information recommendation.

Specifically, the server may determine whether the second training completion condition is reached, and when the second training completion condition is not reached, reversely update the model parameters in the second initial information recommendation model by using a gradient descent algorithm based on the target loss information, to obtain a second updated information recommendation model. And then taking the second updated information recommendation model as an initial information recommendation model, returning to the step of obtaining the training sample for iterative execution, and taking the second initial information recommendation model when the second training completion condition is reached as a second multi-target recommendation model when the second training completion condition is reached. And then, the trained second multi-target recommendation model can be used for information recommendation, namely the second multi-target recommendation model is deployed, and the second multi-target recommendation model is directly called when the second multi-target recommendation model needs to be used.

According to the multi-target recommendation model training method, the training recommendation object characteristics and the training recommendation information characteristics are input into the second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, the training deflection degree and the training interaction possibility of each business target by the training recommendation object are obtained, the corresponding training interaction possibility is rectified through the training deflection degree of each business target, the target training interaction possibility is obtained, the target training interaction possibility is fused, the training fusion recommendation degree corresponding to training recommendation information is obtained, the training deflection degree of each business target is fused, the training deflection recommendation degree corresponding to training recommendation information is obtained, the sum of the training deflection recommendation degree and the training fusion recommendation degree is calculated, the training target recommendation degree is obtained, the accuracy of the training target recommendation degree is improved, then the training target recommendation degree is used to calculate fusion loss information, the fusion loss information is used to obtain the target loss information, the accuracy of the obtained target loss information is improved, the second initial multi-target recommendation model is trained by using the target loss information, and the accuracy of the second multi-target recommendation model is improved when the second completion condition is reached.

In one embodiment, the second initial multi-objective recommendation model includes a second training multi-objective prediction network, a second training interaction fusion network, and a training bias fusion network;

as shown in fig. 10, step 904, namely, inputting training recommendation object features and training recommendation information features into a second initial multi-target recommendation model to perform deflection prediction and interaction prediction on each service target, obtaining training deflection degrees and training interaction possibilities of each service target by the training recommendation object, correcting the corresponding training interaction possibilities by the training deflection degrees of each service target, obtaining target training interaction possibilities, fusing the target training interaction possibilities, obtaining training fusion recommendation degrees corresponding to training recommendation information, fusing the training deflection degrees of each service target, obtaining training deflection recommendation degrees corresponding to training recommendation information, calculating the sum of the training deflection recommendation degrees and the training fusion recommendation degrees, and obtaining training target recommendation degrees, including:

step 1002, inputting the training recommended object features and the training recommended information features into a second training multi-target prediction network to perform deflection prediction and interaction prediction on each business target, so as to obtain training deflection degree and training interaction possibility of the training recommended object on each business target.

The second training multi-target prediction network refers to a second multi-target prediction network that needs to be trained, and the network structure of the second multi-target prediction network is the same as the network structure of the first multi-target prediction network, and the network parameters are different.

Specifically, the server may input the training recommendation object features and the training recommendation information features as integral features of the training samples into the second training multi-target prediction network to perform deflection prediction and interaction prediction on each service target, so as to obtain training deflection degree and training interaction possibility of the output training recommendation object on each service target.

And step 1004, correcting the training interaction probability corresponding to each service target according to the training deviation degree of each service target to obtain each target training interaction probability, and inputting each target training interaction probability into a second training interaction fusion network to fuse so as to obtain the training fusion recommendation degree corresponding to the training recommendation information.

The second training interaction fusion network refers to a second interaction fusion network needing training, and the network structure of the second training interaction fusion network is different from that of the first training interaction fusion network. The target training interaction possibility refers to the target interaction possibility obtained during training.

Specifically, the server rectifies the input before fusion, namely rectifies the training interaction possibilities corresponding to each service target according to the training deviation degree of each service target, so as to obtain the training interaction possibilities of each target, and then inputs the training interaction possibilities of each target into a second training interaction fusion network for fusion, so as to obtain the training fusion recommendation degree corresponding to the training recommendation information.

Step 1006, inputting the training deviation degree of each business target into a training deviation fusion network for fusion to obtain the training deviation recommendation degree corresponding to the training recommendation information, and calculating the sum of the training deviation recommendation degree and the training fusion recommendation degree to obtain the training target recommendation degree.

The training bias fusion network is a bias fusion network used when training the second multi-target prediction network.

Specifically, the server directly inputs the training deviation degree of each business target into a training deviation fusion network to be fused, so as to obtain the training deviation recommendation degree corresponding to the training recommendation information, and then outputs deviation correction after fusion, namely, calculates the sum of the training deviation recommendation degree and the training fusion recommendation degree, so as to obtain the training target recommendation degree. In a specific embodiment, the network structure of the second training multi-objective prediction network may be as shown in fig. 4, and the network structure of the second training interaction fusion network may be as shown in fig. 6.

In the embodiment, the multi-target prediction is performed by using the second training multi-target prediction network, then the interactive fusion is performed by using the second training interactive fusion network, and the deflection fusion is performed by using the training deflection fusion network, so that the recommendation degree of the training target is finally obtained, and the accuracy of the obtained recommendation degree of the training target is improved.

In one embodiment, as shown in fig. 11, step 906, performing fusion loss calculation based on each interaction label and the recommended degree of the training target to obtain fusion loss information, and performing bias adjustment on the fusion loss information by using each training bias degree to obtain target loss information, includes:

and 1102, performing interaction loss calculation of each business target based on each interaction label and the fusion recommendation degree to obtain each interaction loss information.

The interaction loss information is used for representing errors between the fusion recommendation degree and the interaction label, and the smaller the interaction loss information is, the closer the fusion recommendation degree is to the real interaction result corresponding to the interaction label.

Specifically, the server may calculate the loss between each interaction tag and the fusion recommendation degree by using a cross entropy loss function, so as to obtain interaction loss information corresponding to each interaction tag. Each business target has a corresponding interaction label, and interaction loss information corresponding to each business target is calculated and obtained respectively. Wherein different cross entropy loss functions may be used for different tasks. When the interactive tag is a tag of a task of a class, the loss information of the task of a class may be calculated using a cross-class entropy loss function, for example, when the interactive tag is a tag of a click prediction task, the task of a click prediction task of a class may be calculated using a cross-class entropy loss function. When the interactive label is a label of a multi-class task, the multi-class cross entropy loss function may be used to calculate loss information for the multi-class task, e.g., when the interactive label is a label of a duration prediction task, the duration prediction task may be a multi-class task, and then the multi-class cross entropy loss function may be used to calculate loss information. In one embodiment, the server may also use a linear regression loss function, at least a loss function, or the like, to calculate the loss information between the interaction tag and the fusion recommendation level. The server can calculate the cross entropy loss of the interaction label corresponding to each business target and the fusion recommendation degree, and obtain the loss information corresponding to each business target.

And 1104, acquiring the interaction weights corresponding to the interaction tags, and performing fusion loss calculation based on the interaction weights and the interaction loss information to obtain fusion loss information.

The interaction weight is used for representing the importance degree of the business target corresponding to the interaction label. Different business targets have different interaction weights, and the interaction weights are preset and can be adjusted according to requirements.

Specifically, the server may store the interaction weights in the database in advance, and when the interaction weights need to be used, the server directly obtains the interaction weights corresponding to the interaction tags from the database. The server can also acquire the interaction weights corresponding to the interaction tags uploaded by the terminal in real time. The server may also obtain the interaction weights corresponding to the respective interaction tags from the service party. And then weighting the corresponding interaction loss information by using the interaction weight, and then calculating a weighted sum to obtain the fusion loss information.

In a specific embodiment, the interactive labels include, but are not limited to, click labels, reading duration labels, interactive labels and exposure labels, and the server calculates cross entropy loss of the click labels and fusion recommendation degrees by using a two-class cross entropy loss function to obtain click prediction loss information corresponding to the click labels. And simultaneously, calculating cross entropy loss of the reading time length label and the fusion recommendation degree by using a two-class cross entropy loss function, and obtaining the reading time length prediction loss information corresponding to the reading time length label. Simultaneously, calculating cross entropy loss of the interactive label and the fusion recommendation degree by using a two-class cross entropy loss function to obtain interactive prediction loss information corresponding to the interactive label; and simultaneously, calculating cross entropy loss of the exposure label and the fusion recommended degree by using a two-class cross entropy loss function, and obtaining exposure prediction loss information corresponding to the exposure label. And then weighting the loss information corresponding to the click label, the reading time label, the interactive label and the exposure label by using the corresponding interaction weight, and calculating the weighted sum to obtain the fusion loss information.

In step 1106, sample weights corresponding to the training samples are determined based on the interaction labels and the interaction bias degrees, and bias adjustment is performed on the fusion loss information based on the sample weights to obtain target loss information.

The sample weight is used for representing the importance degree of the training sample relative to the training recommended object, and the higher the sample weight is, the more the interaction tendency represented by the training sample can represent the interaction tendency of the training recommended object.

Specifically, the server uses each interaction tag and each interaction bias to determine a sample weight for the training sample. And then, weighting calculation is carried out on the fusion loss information by using the sample weight, so as to obtain the target loss information corresponding to the training sample.

In the above embodiment, fusion loss calculation is performed through each interaction weight and each interaction loss information to obtain fusion loss information, then each interaction label and each interaction deviation degree are used to determine the sample weight corresponding to the training sample, and deviation adjustment is performed on the fusion loss information based on the sample weight to obtain target loss information, so that accuracy of the obtained target loss information is improved.

In one embodiment, as shown in fig. 12, step 1104, obtaining the interaction weights corresponding to each interaction tag, and performing fusion loss calculation based on each interaction weight and each interaction loss information to obtain fusion loss information, where the fusion loss information includes:

Step 1202, obtaining interaction priorities corresponding to the interaction labels, and determining sample types corresponding to the training samples based on the interaction priorities.

The interaction priority is used for representing the importance degree of each business object relative to the information recommendation business, and the higher the interaction priority is, the higher the importance of the business object is. The interaction priority may be set according to requirements. For example, the interactive task may be set to have the highest priority, and the next priority is a duration task and a click task in sequence.

Specifically, the server may obtain the interaction priority corresponding to each interaction tag from the database, or may obtain the interaction priority corresponding to each interaction tag uploaded by the terminal, and then use the interaction task corresponding to the interaction tag with the highest interaction priority as the sample type corresponding to the training sample. For example, if the priority of the interaction task is highest, the sample type corresponding to the training sample is an interaction sample.

In step 1204, each interaction weight corresponding to the sample type is searched from the preset interaction matrix based on the sample type.

The preset interaction matrix refers to a preset matrix stored with interaction weights, rows of the matrix are used for representing sample types, and columns of the matrix are used for representing various business targets. For example, there are 4 sample types and 3 business targets, and the preset interaction matrix is a matrix of 4X 3.

Specifically, the server searches each interaction weight corresponding to the same sample type from the predicted interaction matrix according to the sample type.

In a specific embodiment, the preset interaction matrix may be as shown in table 1 below.

TABLE 1 preset interaction matrix

	W click	W duration	W interaction
				Interaction	0	0	1.0
Duration of time	0	0.7	0.6
				Clicking	0.1	0.9	0
Exposure to light	1.0	0	0

The interaction weight of the interaction sample comprises click weight 0, duration weight 1 and interaction weight 1. The interaction weight of the duration sample comprises click weight 0, duration weight 0.7 and interaction weight 0.6. The interaction weight of the click sample comprises click weight 0.1, duration weight 0.9 and interaction weight 0. The interaction weight of the exposure sample comprises click weight 1, duration weight 0 and interaction weight 0.

In step 1206, each interaction weight is used to weight the corresponding interaction loss information to obtain each weighted loss information, and the information sum of each weighted loss information is calculated to obtain the fusion loss information.

The weighted loss information is loss information obtained by weighting the interactive loss information by using the interactive weight.

Specifically, the server performs weighted calculation on each piece of interaction loss information by using a corresponding interaction weight, for example, a click sample, calculates the product of the interaction loss information of the click service target and the click weight of 0.1, and obtains weighted loss information of the click service target. And calculating the product of the interaction loss information of the time length business target and the time length weight 0.9 to obtain the weighted loss information of the time length business target, and calculating the product of the interaction loss information of the interaction business target and the interaction weight 0 to obtain the weighted loss information of the interaction business target. And then calculating the information sum of all the weighted loss information to obtain the fusion loss information.

In a specific embodiment, the fusion loss information may be calculated using equation (7) as shown below.

Wherein loss is _matrix Refers to fusion loss information. loss of loss _task And the interactive loss information corresponding to the business target is referred.And the interaction weight corresponding to the business target is indicated.

In the above embodiment, the sample type corresponding to the training sample is determined by using the interaction priority, then each interaction weight corresponding to the sample type is searched from the preset interaction matrix, and finally each interaction weight is used to weight each interaction loss information respectively, so as to obtain each weighted loss information, calculate the information sum of each weighted loss information, obtain the fusion loss information, and improve the accuracy of the obtained fusion loss information.

In one embodiment, as shown in fig. 13, step 1106, determining a sample weight corresponding to the training sample based on each interaction tag and each interaction bias degree, and performing bias adjustment on the fusion loss information based on the sample weight to obtain target loss information, including:

in step 1302, an interaction priority corresponding to each interaction label is obtained, and a sample type corresponding to the training sample is determined based on the interaction priority.

In step 1304, a corresponding training sample sequence is obtained based on each interactive label, and a target sample sequence is determined from each training sample sequence based on the sample type.

The training sample sequence is a sequence obtained by sequencing each training sample according to the interaction deviation degree corresponding to the training sample. Sequencing the training samples according to different business targets to obtain training sample sequences corresponding to each business target, and obtaining the training sample sequences. Each business object also corresponds to an interaction tag. The target sample sequence refers to determining a training sample sequence from among the respective training sample sequences according to the sample type. The target sample sequence is obtained by sequencing according to the interactive deviation degree of the business targets corresponding to the sample types.

Specifically, the server may obtain the interaction priorities corresponding to the interaction tags from the database, and then determine the sample types corresponding to the training samples according to the interaction priorities. And then acquiring corresponding training sample sequences based on each interactive label, and determining a target sample sequence from each training sample sequence based on the sample type.

Step 1306, determining a sequence position corresponding to the training sample from the target sample sequence, and obtaining a sample weight corresponding to the training sample as a first target sample weight when the sequence position exceeds a preset position threshold.

Step 1308, when the sequence position does not exceed the preset position threshold, obtaining the sample weight corresponding to the training sample as the second target sample weight.

The first target sample weight and the second target sample weight are preset sample weights. The first target sample weight is greater than the second target sample weight. The preset position threshold value refers to a preset position threshold value for determining the weight of the sample. For example, the preset position threshold may be 20% of the sequence position order.

Specifically, the server determines a sequence position corresponding to the training sample from the target sample sequence, namely, determines an ordering position of the training sample in the target sample sequence, then compares the sequence position corresponding to the training sample with a preset position threshold, and when the sequence position exceeds the preset position threshold, namely, the training sample is ordered in front, at this time, the first target sample weight is used as a sample weight corresponding to the training sample. When the sequence position exceeds the preset position threshold value, the training sample is proved to hit the deflection of the recommended object, and at the moment, the training sample can better represent the deflection of the recommended object, and then higher weight is given to the training sample. When the sequence position does not exceed the preset position threshold value, the training samples are ranked later, which indicates that the training samples are common samples, and then the training samples are given lower weight.

In a specific embodiment, the fusion loss information may be calculated using equation (8) as shown below.

loss2＝w _bias ·loss _matrix Formula (8)

Where loss2 refers to target loss information, loss _matrix Refers to fusion loss information, w _bias The training sample weight is a sample weight corresponding to the training sample, and the value of the sample weight can be shown in the following formula (9):

wherein,representing the sequence position of the training sample at the first twenty percent of the training sample sequence, the sample weight selected is 4. The sequence position of the training sample is not weighted 1 by the sample selected at the first twenty percent of the training sample sequence.

In the above embodiment, the target sample sequence is determined from each training sample sequence through the sample type, then the sequence position corresponding to the training sample is determined from the target sample sequence, and finally the sample weight corresponding to the training sample is determined according to the sequence position, so that the obtained sample weight can enhance the tendency of the training recommended object, thereby improving the model training effect and improving the accuracy of information recommendation of the model obtained by training.

In one embodiment, step 1304, that is, obtaining a corresponding training sample sequence based on each interaction tag, includes the steps of:

Acquiring each training sample and each interaction deviation degree corresponding to each training sample; and sequencing the training samples according to the interaction deviation degree corresponding to each training sample based on the interaction labels to obtain training sample sequences corresponding to the interaction labels.

Specifically, the server may acquire each training sample, and sequentially input each training sample into the second initial information recommendation model to obtain each interaction deviation degree corresponding to each training sample. The server can also directly obtain each training sample and each interaction deviation degree corresponding to each training sample from the database. And then sequencing all training samples according to all interaction bias degrees corresponding to the business targets, namely sequencing from large to small, and finally obtaining a training sample sequence of the business targets corresponding to each interaction label. The training samples are sequenced by using the interactive deviation degree, so that training sample sequences corresponding to the interactive labels are obtained, and the accuracy of the obtained training sample sequences is improved.

In a specific embodiment, the information recommendation method is applied to news recommendation scenes, specifically: when the news platform carries out news recommendation, news to be recommended and object information to be recommended are obtained, the news to be recommended and the object information to be recommended are input into a news recommendation model, the information recommendation model is a multi-target recommendation model, as shown in fig. 14, a schematic diagram of deviation correction is carried out for a second multi-target recommendation model, wherein the click deviation score and the click recommendation score corresponding to the click service target, the duration deviation score and the duration recommendation score corresponding to the duration service target, and the interaction deviation score and the interaction recommendation score corresponding to the interaction service target are obtained through inputting the news to be recommended and the object information to be recommended into a multi-target estimated network in the second multi-target recommendation model. Then subtracting the click recommendation score from the click deviation score to obtain a click target recommendation score, subtracting the time length recommendation score from the time length deviation score to obtain a time length target recommendation score, subtracting the interaction recommendation score from the interaction deviation score to obtain an interaction target recommendation score, correcting the input before fusion, fusing the click target recommendation score, the time length target recommendation score and the interaction target recommendation score to obtain an interaction fusion score, and fusing the click deviation score, the time length deviation score and the interaction deviation score to obtain a deviation fusion score. And correcting the deviation of the fused output, namely calculating the sum of the interactive fusion score and the deviation fusion score, and obtaining the final fusion recommendation score. And when the fusion recommendation score exceeds a preset recommendation threshold, the news platform recommends the news to be recommended to the recommendation object terminal corresponding to the information of the object to be recommended. The fusion recommendation score corresponding to each candidate news can be obtained through the second multi-objective recommendation model, then each candidate news is ranked according to the fusion recommendation score from large to small to obtain a candidate news sequence, and then the candidate news in the front of the ranking is sequentially selected from the candidate news sequence to serve as news to be recommended, for example, the candidate news in the front 10 of the ranking is selected to serve as news to be recommended. And then sending the selected news to be recommended to the recommended object terminal. When the recommended news is received by the recommended object terminal, the recommended news is displayed on a news page, as shown in fig. 15, which is a schematic diagram of the news recommended page, and the recommended news is displayed in the form of an image and text information stream.

In a specific embodiment, the information recommendation method is applied to news functions in instant messaging applications, namely, news recommendation is performed on the first multi-objective recommendation model through AB test, and obtained test results comprise a test comparison diagram of image-text click quantity shown in FIG. 16-A and a comparison diagram of test result indexes shown in FIG. 16-B. Wherein the line graph in fig. 16-a shows a comparative schematic of the experimental group of picture click-through amount versus the control group. The bar graph in fig. 16-a shows the relative difference, i.e., the ratio of the value calculated from the experimental group minus the control group to the control group. Obviously, after the news recommendation is performed by adopting the first multi-target recommendation model, the image-text click quantity and various indexes of the point news are obviously improved. Namely, the first multi-target recommendation model of the application obviously improves the accuracy of news recommendation.

In a specific embodiment, the information recommendation method is applied to news functions in instant messaging applications, namely, news recommendation is performed on the second multi-objective recommendation model through AB test, and obtained test results comprise a test comparison diagram of image-text click quantity shown in FIG. 17-A and a comparison diagram of test result indexes shown in FIG. 17-B. Wherein the line graph in fig. 17-a shows a comparative schematic of the experimental group of picture click-through amounts. The bar graph in fig. 17-a shows the relative difference, i.e., the ratio of the value calculated from the experimental group minus the control group to the control group. Obviously, after the news recommendation is performed by adopting the second multi-target recommendation model, the image-text click quantity and various indexes of the point news are obviously improved. Namely, the second multi-objective recommendation model of the application obviously improves the accuracy of news recommendation.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides an information recommendation device for realizing the above-mentioned related information recommendation method and a multi-target recommendation model training device for the multi-target recommendation model training method. The implementation scheme of the solution provided by the device is similar to the implementation scheme recorded in the method, so the specific limitation in the embodiment of the information recommendation device or the embodiment of the multi-target recommendation model training device provided below can be referred to the limitation of the information recommendation method or the multi-target recommendation model training method hereinabove, and the description is omitted here.

In one embodiment, as shown in fig. 18, there is provided an information recommendation apparatus 1800 including: a feature acquisition module 1802, a bias prediction module 1804, an interaction prediction module 1806, a fusion module 1808, and a recommendation module 1810, wherein:

the feature acquisition module 1802 is configured to acquire an object attribute feature and an information feature to be recommended, where the object attribute feature corresponds to an object to be recommended;

the deflection prediction module 1804 is configured to perform bottom level feature extraction based on the object attribute feature to obtain an object extraction feature, and perform deflection prediction on each business target based on the object extraction feature to obtain a deflection degree of the object to be recommended on each business target;

the interaction prediction module 1806 is configured to perform feature combination on the object attribute feature and the information feature to be recommended to obtain a combined feature, and perform interaction prediction on each service target based on the combined feature, the object extraction feature and the information feature to be recommended to obtain interaction possibility of the object to be recommended on each service target;

the fusion module 1808 is configured to rectify the interaction probabilities corresponding to the business targets according to the deviation degrees of the business targets, obtain each target interaction probability, and fuse the target interaction probabilities to obtain a fusion recommendation degree corresponding to the information to be recommended;

And the recommending module 1810 is used for recommending the information to be recommended to the terminal corresponding to the object to be recommended when the fusion recommending degree accords with the preset recommending condition.

In one embodiment, the information recommendation apparatus 1800 further includes:

the model recommending module is used for inputting the object attribute characteristics and the information characteristics to be recommended into the first multi-target recommending model; performing bottom layer feature extraction based on object attribute features through a first multi-target recommendation model to obtain object extraction features, and performing deflection prediction on each business target based on the object extraction features to obtain deflection degree of an object to be recommended on each business target; combining the object attribute characteristics and the information characteristics to be recommended through a first multi-target recommendation model to obtain combined characteristics, and performing interactive prediction on each business target based on the combined characteristics, the object extraction characteristics and the information characteristics to be recommended to obtain the interaction possibility of the object to be recommended on each business target; correcting the interaction probability corresponding to each business target by using the deviation degree of each business target through the first multi-target recommendation model to obtain each target interaction probability, and fusing the target interaction probabilities to obtain the fusion recommendation degree corresponding to the information to be recommended.

the model recommendation module is also used for inputting object attribute characteristics into a first deflection prediction sub-network in the first multi-target prediction network; performing bottom-layer feature extraction on the object attribute features through a first deflection prediction sub-network to obtain object extraction features, and performing deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target;

the model recommendation module is also used for inputting the combined characteristic, the object extraction characteristic and the information characteristic to be recommended into a first interaction prediction sub-network in the first multi-target prediction network; performing interaction prediction on each business target by using the combined characteristics, the object extraction characteristics and the information characteristics to be recommended through a first interaction prediction sub-network to obtain interaction possibility of the object to be recommended on each business target;

the model recommendation module is also used for splicing the interaction possibilities of the targets to obtain splicing vectors, and inputting the splicing vectors into the interaction fusion network for fusion to obtain fusion recommendation degrees corresponding to the information to be recommended.

In one embodiment, the fusion module 1808 is further configured to calculate a difference between the interaction likelihood of each business object and the corresponding bias degree, so as to obtain an object interaction likelihood of each business object.

the deflection fusion module is used for fusing the deflection degree of each business target to obtain the deflection recommendation degree corresponding to the information to be recommended; calculating the sum of the deviation recommendation degree and the fusion recommendation degree to obtain a target recommendation degree;

the recommending module 1810 is further configured to recommend the information to be recommended to a terminal corresponding to the object to be recommended when the target recommending degree meets a preset target recommending condition.

In one embodiment, the bias fusion module is further configured to splice the bias degrees of the service targets to obtain bias splice vectors, and input the bias splice vectors into a bias fusion network to fuse the bias splice vectors to obtain bias recommendation degrees corresponding to the information to be recommended.

In one embodiment, as shown in fig. 19, a multi-objective recommendation model training apparatus 1900 is provided, comprising: a sample pair acquisition module 1902, a positive prediction module 1904, a negative prediction module 1906, a loss calculation module 1908, and a first iteration module 1910, wherein:

The sample pair obtaining module 1902 is configured to obtain a positive sample and a negative sample corresponding to a training recommendation object, where the positive sample includes a training recommendation object feature and a positive recommendation information feature, and the negative sample includes a training recommendation object feature and a negative recommendation information feature;

the forward prediction module 1904 is configured to input the training recommendation object features and the forward recommendation information features into the first initial multi-objective recommendation model to perform deflection prediction and interaction prediction on each service objective, obtain training deflection degrees and forward interaction probabilities of the training recommendation objects on each service objective, correct the corresponding forward interaction probabilities through the training deflection degrees of each service objective, obtain forward interaction probabilities of each objective, and fuse the forward interaction probabilities of each objective to obtain forward fusion recommendation degrees corresponding to forward recommendation information;

the negative prediction module 1906 is configured to input the training recommendation object feature and the negative recommendation information feature into the first initial multi-target recommendation model to perform deflection prediction and interaction prediction on each service target, obtain training deflection degrees and negative interaction possibilities of the training recommendation object on each service target, correct the corresponding negative interaction possibilities through the training deflection degrees of each service target, obtain negative interaction possibilities of each target, and fuse the negative interaction possibilities of each target to obtain a negative fusion recommendation degree corresponding to the negative recommendation information;

The loss calculation module 1908 is configured to perform model loss calculation based on the positive fusion recommendation degree and the negative fusion recommendation degree, so as to obtain model loss information;

the first iteration module 1910 is configured to reversely update the first initial multi-target recommendation model based on the model loss information to obtain a first updated multi-target recommendation model, take the first updated multi-target recommendation model as the first initial multi-target recommendation model, and return to the step of obtaining the positive sample and the negative sample corresponding to the training recommendation object for execution until the first training completion condition is reached, thereby obtaining the first multi-target recommendation model.

the forward prediction module 1904 is further configured to input the training recommendation object feature and the forward recommendation information feature into the first training multi-target prediction network to perform deflection prediction and interaction prediction on each service target, so as to obtain training deflection degree and forward interaction possibility of the training recommendation object on each service target; correcting the training deviation degree of each business target to the corresponding forward interaction probability to obtain the forward interaction probability of each target, inputting the forward interaction probability of each target into a first training interaction fusion network to fuse, and obtaining the forward fusion recommendation degree corresponding to the forward recommendation information.

In one embodiment, the loss calculation module 1908 is further configured to determine a positive sequence position corresponding to the positive sample based on the forward fusion recommendation level, and perform a break cumulative gain calculation based on the positive sequence position, to obtain a positive sorting quality corresponding to the positive sample; determining a negative sequence position corresponding to the negative sample based on the negative fusion recommendation degree, and performing damage accumulation gain calculation based on the negative sequence position to obtain negative sorting quality corresponding to the negative sample; and calculating the difference between the positive sorting quality and the negative sorting quality to obtain the relative quality. And carrying out sample pair loss calculation based on the positive fusion recommendation degree and the negative fusion recommendation degree to obtain sample pair loss information, and weighting the sample pair loss information by using relative quality to obtain model loss information.

In one embodiment, the loss calculation module 1908 is further configured to obtain sample types corresponding to the positive sample and the negative sample, and obtain a sample pair weight based on the sample types; and weighting the loss information by using the sample weight and the relative mass to obtain model loss information.

In one embodiment, as shown in fig. 20, there is provided a multi-objective recommendation model training apparatus 2000 comprising: a sample acquisition module 2002, a training module 2004, a target loss calculation module 2006, and a second iteration module 2008, wherein:

The sample acquisition module 2002 is configured to acquire a training sample, where the training sample includes training recommendation object information, training recommendation information, and each interaction label;

the training module 2004 is configured to input training recommendation object features and training recommendation information features into a second initial multi-target recommendation model to perform deflection prediction and interaction prediction on each service target, obtain training deflection degrees and training interaction possibilities of each service target by the training recommendation object, correct the corresponding training interaction possibilities by the training deflection degrees of each service target, obtain each target training interaction possibility, fuse each target training interaction possibility, obtain a training fusion recommendation degree corresponding to training recommendation information, fuse the training deflection degrees of each service target, obtain a training deflection recommendation degree corresponding to training recommendation information, calculate a sum of the training deflection recommendation degrees and the training fusion recommendation degrees, and obtain a training target recommendation degree;

the target loss calculation module 2006 is configured to perform fusion loss calculation based on each interaction tag and the recommended degree of the training target to obtain fusion loss information, and perform bias adjustment on the fusion loss information by using each training bias degree to obtain target loss information;

The second iteration module 2008 is configured to reversely update the second initial multi-target recommendation model based on the target loss information to obtain a second updated multi-target recommendation model, take the second updated multi-target recommendation model as the second initial multi-target recommendation model, and return to the step of obtaining the training sample for execution until a second training completion condition is reached, thereby obtaining the second multi-target recommendation model.

the training module 2004 is further configured to input training recommendation object features and training recommendation information features into the second training multi-target prediction network to perform bias prediction and interaction prediction on each service target, so as to obtain training bias degree and training interaction possibility of the training recommendation object on each service target; correcting the training interaction probability corresponding to each business target according to the training deviation degree of each business target to obtain each target training interaction probability, inputting each target training interaction probability into a second training interaction fusion network to fuse, and obtaining a training fusion recommendation degree corresponding to training recommendation information; the training deviation degree of each business target is input into a training deviation fusion network to be fused, the training deviation recommendation degree corresponding to the training recommendation information is obtained, and the sum of the training deviation recommendation degree and the training fusion recommendation degree is calculated, so that the training target recommendation degree is obtained.

In one embodiment, the objective loss calculation module 2006 is further configured to perform an interaction loss calculation of each service objective based on each interaction label and the fusion recommendation degree, so as to obtain each interaction loss information; acquiring interaction weights corresponding to the interaction tags, and performing fusion loss calculation based on the interaction weights and the interaction loss information to obtain fusion loss information; and determining sample weights corresponding to the training samples based on the interaction labels and the interaction bias degrees, and performing bias adjustment on the fusion loss information based on the sample weights to obtain target loss information.

In one embodiment, the target loss calculation module 2006 is further configured to obtain an interaction priority corresponding to each interaction tag, and determine a sample type corresponding to the training sample based on the interaction priority; searching each interaction weight corresponding to the sample type from a preset interaction matrix based on the sample type; and weighting the corresponding interaction loss information by using each interaction weight to obtain each weighted loss information, and calculating the information sum of each weighted loss information to obtain the fusion loss information.

In one embodiment, the target loss calculation module 2006 is further configured to obtain an interaction priority corresponding to each interaction tag, and determine a sample type corresponding to the training sample based on the interaction priority; acquiring corresponding training sample sequences based on each interactive label, and determining a target sample sequence from each training sample sequence based on the sample type; determining a sequence position corresponding to a training sample from a target sample sequence, and obtaining a sample weight corresponding to the training sample as a first target sample weight when the sequence position exceeds a preset position threshold; and when the sequence position does not exceed the preset position threshold value, obtaining the sample weight corresponding to the training sample as a second target sample weight.

In one embodiment, the target loss calculation module 2006 is further configured to obtain each training sample and each interaction bias degree corresponding to each training sample; and sequencing the training samples according to the interaction deviation degree corresponding to each training sample based on the interaction labels to obtain training sample sequences corresponding to the interaction labels.

The above-described information recommendation apparatus and each module in the multi-objective recommendation model training apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 21. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing training sample data, recommended object attribute information, information to be recommended and the like. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements an information recommendation method or a multi-objective recommendation model training method.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 22. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements an information recommendation method or a multi-objective recommendation model training method. The display unit of the computer equipment is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device, wherein the display screen can be a liquid crystal display screen or an electronic ink display screen, the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on a shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structures shown in fig. 21 or 22 are merely block diagrams of portions of structures associated with the present inventive arrangements and are not limiting of the computer device to which the present inventive arrangements may be implemented, and that a particular computer device may include more or fewer components than shown, or may be combined with certain components, or may have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region. The pushed information can be refused by the user or the pushed information can be conveniently refused.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. An information recommendation method, the method comprising:

extracting bottom layer characteristics based on the object attribute characteristics to obtain object extraction characteristics, and carrying out deflection prediction on each business target based on the object extraction characteristics to obtain the deflection degree of the object to be recommended on each business target;

Performing feature combination on the object attribute features and the information features to be recommended to obtain combined features, and performing interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the objects to be recommended on each business target;

correcting the interaction probability corresponding to each business target according to the deviation degree of each business target to obtain each target interaction probability, and fusing the target interaction probability to obtain the fusion recommendation degree corresponding to the information to be recommended;

and when the fusion recommendation degree meets a preset recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended.

2. The method according to claim 1, characterized in that the method further comprises:

inputting the object attribute characteristics and the information characteristics to be recommended into a first multi-target recommendation model;

extracting bottom layer characteristics based on the object attribute characteristics through the first multi-target recommendation model to obtain object extraction characteristics, and carrying out deflection prediction on each business target based on the object extraction characteristics to obtain deflection degree of the object to be recommended on each business target;

Performing feature combination on the object attribute features and the information features to be recommended through the first multi-target recommendation model to obtain combined features, and performing interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the object to be recommended on each business target;

correcting the interaction probability corresponding to each business target by using the deviation degree of each business target through the first multi-target recommendation model to obtain each target interaction probability, and fusing the target interaction probability to obtain the fusion recommendation degree corresponding to the information to be recommended.

3. The method of claim 2, wherein the first multi-objective recommendation model comprises a first multi-objective prediction network and an interaction fusion network, the first multi-objective prediction network comprising a first bias prediction sub-network and a first interaction prediction sub-network:

extracting the bottom layer feature based on the object attribute feature to obtain an object extraction feature, and performing deflection prediction on each business target based on the object extraction feature to obtain the deflection degree of the object to be recommended on each business target, wherein the method comprises the following steps:

Inputting the object attribute feature into the first bias prediction subnetwork in the first multi-objective prediction network;

performing bottom layer feature extraction on the object attribute features through the first deflection prediction sub-network to obtain object extraction features, and performing deflection prediction on each business target based on the object extraction features to obtain deflection degree of the object to be recommended on each business target;

the performing interaction prediction on the business targets based on the combination feature, the object extraction feature and the information feature to be recommended to obtain interaction possibility of the object to be recommended on the business targets, including:

inputting the combined feature, the object extraction feature and the information feature to be recommended into the first interaction prediction sub-network in a first multi-objective prediction network;

performing interaction prediction on each business target by using the combined characteristic, the object extraction characteristic and the information characteristic to be recommended through the first interaction prediction sub-network to obtain interaction possibility of the object to be recommended on each business target;

the step of fusing the target interaction possibilities to obtain the fusion recommendation degree corresponding to the information to be recommended includes:

And splicing the target interaction possibilities to obtain a spliced vector, and inputting the spliced vector into the interaction fusion network to fuse to obtain the fusion recommendation degree corresponding to the information to be recommended.

4. A method according to any one of claims 1 to 3, wherein said correcting the interaction likelihood corresponding to each of said business objects by the deviation degree of each of said business objects to obtain each object interaction likelihood comprises:

5. The method according to claim 1, characterized in that the method further comprises:

fusing the deflection degrees of the business targets to obtain deflection recommendation degrees corresponding to the information to be recommended;

calculating the sum of the deviation recommendation degree and the fusion recommendation degree to obtain a target recommendation degree;

when the fusion recommendation degree meets a preset recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended comprises the following steps:

and when the target recommendation degree meets a preset target recommendation condition, recommending the information to be recommended to the terminal corresponding to the object to be recommended.

6. The method of claim 5, wherein the fusing the bias levels of the business objectives to obtain the bias recommendation level corresponding to the information to be recommended further comprises:

and splicing the deflection degrees of the business targets to obtain deflection splicing vectors, and inputting the deflection splicing vectors into a deflection fusion network to fuse to obtain deflection recommendation degrees corresponding to the information to be recommended.

7. A multi-objective recommendation model training method, the method comprising:

the training recommendation object characteristics and the forward recommendation information characteristics are input into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, training deflection degrees and forward interaction possibilities of the training recommendation objects on each business target are obtained, correction is conducted on the corresponding forward interaction possibilities through the training deflection degrees of each business target, forward interaction possibilities of each target are obtained, the forward interaction possibilities of each target are fused, and forward fusion recommendation degrees corresponding to the forward recommendation information are obtained;

The training recommendation object characteristics and the negative recommendation information characteristics are input into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, training deflection degree and negative interaction possibility of each business target by the training recommendation object are obtained, correction is conducted on the corresponding negative interaction possibility through the training deflection degree of each business target, negative interaction possibility of each target is obtained, the negative interaction possibility of each target is fused, and negative fusion recommendation degree corresponding to the negative recommendation information is obtained;

and reversely updating the first initial multi-target recommendation model based on the model loss information to obtain a first updated multi-target recommendation model, taking the first updated multi-target recommendation model as the first initial multi-target recommendation model, and returning to the step of obtaining the positive sample and the negative sample corresponding to the training recommendation object for execution until a first training completion condition is reached to obtain the first multi-target recommendation model.

8. The method of claim 7, wherein the first initial multi-objective recommendation model comprises a first training multi-objective prediction network and a first training interaction fusion network;

inputting the training recommended object characteristics and the forward recommended information characteristics into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and forward interaction possibility of each business target by the training recommended object, rectifying the corresponding forward interaction possibility through the training deflection degree of each business target, obtaining forward interaction possibility of each target, fusing the forward interaction possibility of each target, and obtaining forward fused recommendation degree corresponding to the forward recommended information, wherein the method comprises the following steps:

inputting the training recommended object characteristics and the forward recommended information characteristics into a first training multi-target prediction network to perform deflection prediction and interaction prediction on each business target, so as to obtain training deflection degree and forward interaction possibility of the training recommended object on each business target;

correcting the training deviation degree of each business target to the corresponding forward interaction probability to obtain each target forward interaction probability, inputting each target forward interaction probability into a first training interaction fusion network to fuse, and obtaining the forward fusion recommendation degree corresponding to the forward recommendation information.

9. The method of claim 7, wherein the performing model loss calculation based on the positive fusion recommendation level and the negative fusion recommendation level to obtain model loss information includes:

determining a positive sequence position corresponding to the positive sample based on the forward fusion recommendation degree, and performing damage accumulation gain calculation based on the positive sequence position to obtain positive sorting quality corresponding to the positive sample;

determining a negative sequence position corresponding to the negative sample based on the negative fusion recommendation degree, and performing damage accumulation gain calculation based on the negative sequence position to obtain negative sequencing quality corresponding to the negative sample;

calculating the difference between the positive sorting quality and the negative sorting quality to obtain relative quality;

and carrying out sample pair loss calculation based on the positive fusion recommendation degree and the negative fusion recommendation degree to obtain sample pair loss information, and weighting the sample pair loss information by using the relative mass to obtain the model loss information.

10. The method of claim 9, wherein weighting the loss information for the samples using the relative mass results in the model loss information, comprising:

Obtaining sample types corresponding to the positive sample and the negative sample, and obtaining sample pair weights based on the sample types;

and weighting the loss information of the sample by using the sample weight and the relative quality to obtain the model loss information.

11. A multi-objective recommendation model training method, the method comprising:

inputting the training recommendation object characteristics and the training recommendation information characteristics into a second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degrees and training interaction possibilities of the training recommendation objects on each business target, rectifying the corresponding training interaction possibilities through the training deflection degrees of each business target, obtaining each target training interaction possibility, fusing the target training interaction possibilities, obtaining training fusion recommendation degrees corresponding to the training recommendation information, fusing the training deflection degrees of each business target, obtaining training deflection recommendation degrees corresponding to the training recommendation information, calculating the sum of the training deflection recommendation degrees and the training fusion recommendation degrees, and obtaining the training target recommendation degrees;

Fusion loss calculation is carried out based on the interactive labels and the training target recommendation degree to obtain fusion loss information, and deviation adjustment is carried out on the fusion loss information by using the training deviation degree to obtain target loss information;

and reversely updating the second initial multi-target recommendation model based on the target loss information to obtain a second updated multi-target recommendation model, taking the second updated multi-target recommendation model as the second initial multi-target recommendation model, and returning to the step of obtaining the training sample for execution until a second training completion condition is reached to obtain a second multi-target recommendation model.

12. The method of claim 11, wherein the second initial multi-objective recommendation model comprises a second training multi-objective prediction network, a second training interaction fusion network, and a training bias fusion network;

the training recommendation object characteristics and the training recommendation information characteristics are input into a second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, training deflection degrees and training interaction possibilities of the training recommendation objects on each business target are obtained, correction is conducted on the corresponding training interaction possibilities through the training deflection degrees of each business target, the target training interaction possibilities are obtained, the target training interaction possibilities are fused, training fusion recommendation degrees corresponding to training recommendation information are obtained, training deflection degrees of the business targets are fused, training deflection recommendation degrees corresponding to training recommendation information are obtained, and the sum of the training deflection recommendation degrees and the training fusion recommendation degrees is calculated to obtain training target recommendation degrees, and the training target recommendation degrees are obtained, including:

Inputting the training recommended object characteristics and the training recommended information characteristics into the second training multi-target prediction network to conduct deflection prediction and interaction prediction on each business target, and obtaining training deflection degree and training interaction possibility of the training recommended object on each business target;

correcting the training interaction probability corresponding to each business target according to the training deviation degree of each business target to obtain each target training interaction probability, inputting each target training interaction probability into the second training interaction fusion network to fuse, and obtaining the training fusion recommendation degree corresponding to the training recommendation information;

inputting the training deviation degree of each business target into a training deviation fusion network for fusion to obtain the training deviation recommended degree corresponding to the training recommended information, and calculating the sum of the training deviation recommended degree and the training fusion recommended degree to obtain the training target recommended degree.

13. The method of claim 11, wherein the calculating the fusion loss based on the respective interactive labels and the training target recommendation level to obtain fusion loss information, and using the respective training bias levels to bias the fusion loss information to obtain target loss information, comprises:

Performing interaction loss calculation of each business target based on each interaction label and the fusion recommendation degree to obtain each interaction loss information;

acquiring interaction weights corresponding to the interaction tags, and performing fusion loss calculation based on the interaction weights and the interaction loss information to obtain fusion loss information;

and determining sample weights corresponding to the training samples based on the interaction labels and the interaction bias degrees, and performing bias adjustment on the fusion loss information based on the sample weights to obtain target loss information.

14. The method of claim 13, wherein the obtaining the interaction weights corresponding to the interaction tags, and performing fusion loss calculation based on the interaction weights and the interaction loss information to obtain fusion loss information, includes:

acquiring interaction priorities corresponding to the interaction labels, and determining sample types corresponding to the training samples based on the interaction priorities;

searching each interaction weight corresponding to the sample type from a preset interaction matrix based on the sample type;

and weighting the corresponding interaction loss information by using the interaction weights to obtain weighted loss information, and calculating the information sum of the weighted loss information to obtain the fusion loss information.

15. The method of claim 13, wherein determining the sample weight corresponding to the training sample based on the respective interaction labels and the respective interaction bias levels, and biasing the fusion loss information based on the sample weight to obtain target loss information comprises:

acquiring corresponding training sample sequences based on the interactive labels, and determining target sample sequences from the training sample sequences based on the sample types;

determining a sequence position corresponding to the training sample from the target sample sequence, and obtaining a sample weight corresponding to the training sample as a first target sample weight when the sequence position exceeds a preset position threshold;

and when the sequence position does not exceed the preset position threshold value, obtaining the sample weight corresponding to the training sample as a second target sample weight.

16. The method of claim 15, wherein the obtaining a corresponding training sample sequence based on the respective interaction tags comprises:

Acquiring each training sample and each interaction deviation degree corresponding to each training sample;

and sequencing the training samples according to the interaction deviation degree corresponding to each training sample based on the interaction labels to obtain training sample sequences corresponding to the interaction labels.

17. An information recommendation device, characterized in that the device comprises:

the deflection prediction module is used for extracting bottom layer characteristics based on the object attribute characteristics to obtain object extraction characteristics, and performing deflection prediction on each business target based on the object extraction characteristics to obtain the deflection degree of the object to be recommended on each business target;

the interaction prediction module is used for carrying out feature combination on the object attribute features and the information features to be recommended to obtain combined features, and carrying out interaction prediction on each business target based on the combined features, the object extraction features and the information features to be recommended to obtain interaction possibility of the object to be recommended on each business target;

and the recommending module is used for recommending the information to be recommended to the terminal corresponding to the object to be recommended when the fusion recommending degree accords with a preset recommending condition.

18. A multi-objective recommendation model training apparatus, the apparatus comprising:

the sample pair acquisition module is used for acquiring positive samples and negative samples corresponding to the training recommended objects, wherein the positive samples comprise the training recommended object characteristics and the positive recommended information characteristics, and the negative samples comprise the training recommended object characteristics and the negative recommended information characteristics;

the forward prediction module is used for inputting the training recommended object characteristics and the forward recommended information characteristics into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and forward interaction possibility of the training recommended object on each business target, correcting the corresponding forward interaction possibility through the training deflection degree of each business target, obtaining forward interaction possibility of each target, fusing the forward interaction possibility of each target, and obtaining forward fusion recommended degree corresponding to the forward recommended information;

The negative prediction module is used for inputting the training recommendation object characteristics and the negative recommendation information characteristics into a first initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degree and negative interaction possibility of each business target by the training recommendation object, correcting the corresponding negative interaction possibility through the training deflection degree of each business target, obtaining negative interaction possibility of each target, fusing the negative interaction possibility of each target, and obtaining negative fusion recommendation degree corresponding to the negative recommendation information;

the first iteration module is used for reversely updating the first initial multi-target recommendation model based on the model loss information to obtain a first updated multi-target recommendation model, taking the first updated multi-target recommendation model as the first initial multi-target recommendation model, and returning to the step of obtaining the positive sample and the negative sample corresponding to the training recommendation object for execution until a first training completion condition is reached to obtain the first multi-target recommendation model.

19. A multi-objective recommendation model training apparatus, the apparatus comprising:

the training module is used for inputting the training recommendation object characteristics and the training recommendation information characteristics into a second initial multi-target recommendation model to conduct deflection prediction and interaction prediction on each business target, obtaining training deflection degrees and training interaction possibilities of the training recommendation objects on each business target, correcting the corresponding training interaction possibilities through the training deflection degrees of each business target, obtaining each target training interaction possibility, fusing the target training interaction possibilities, obtaining training fusion recommendation degrees corresponding to training recommendation information, fusing the training deflection degrees of each business target, obtaining training deflection recommendation degrees corresponding to training recommendation information, calculating the sum of the training deflection recommendation degrees and the training fusion recommendation degrees, and obtaining training target recommendation degrees;

The target loss calculation module is used for carrying out fusion loss calculation based on the interactive labels and the training target recommendation degree to obtain fusion loss information, and carrying out deflection adjustment on the fusion loss information by using the training deflection degree to obtain target loss information;

and the second iteration module is used for reversely updating the second initial multi-target recommendation model based on the target loss information to obtain a second updated multi-target recommendation model, taking the second updated multi-target recommendation model as the second initial multi-target recommendation model, and returning to the step of obtaining the training sample for execution until a second training completion condition is reached to obtain a second multi-target recommendation model.

20. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 16 when the computer program is executed.

21. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 16.

22. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 16.