CN109345302B

CN109345302B - Machine learning model training method and device, storage medium and computer equipment

Info

Publication number: CN109345302B
Application number: CN201811133216.XA
Authority: CN
Inventors: 谭奔; 牟蕾; 尤爱华; 刘大鹏; 肖磊
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-09-27
Filing date: 2018-09-27
Publication date: 2023-04-18
Anticipated expiration: 2038-09-27
Also published as: CN109345302A

Abstract

The application relates to a machine learning model training method, a device, a storage medium and computer equipment, wherein the method comprises the following steps: obtaining model training data and corresponding labels; the model training data comprises characteristic data of a user sample and characteristic data of a recommended object sample; determining the weight of the model training data according to the generation time of the model training data; inputting the model training data into a machine learning model to obtain user behavior prediction results corresponding to the user sample and the recommended object sample; determining a training target according to the user behavior prediction result, the difference of the labels and the weight; and adjusting the model parameters of the machine learning model according to the direction of optimizing the training target, and continuing training until the training stopping condition is met. The machine learning model trained by the scheme provided by the application has a good prediction effect.

Description

Machine learning model training method and device, storage medium and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a machine learning model training method, apparatus, storage medium, and computer device.

Background

With the development of computer technology, in the process of recommending objects, a machine learning model is generally used for performing recommendation prediction on the recommended objects. Before recommendation prediction of a recommendation object is carried out through a machine learning model, the machine learning model for the recommendation prediction needs to be trained.

However, in the conventional machine learning model training process, the difference between training samples is not considered, so that when the machine learning model trained by the conventional method is used for recommendation prediction, the problem of inaccurate prediction may occur.

Disclosure of Invention

Therefore, it is necessary to provide a machine learning model training method, apparatus, storage medium, and computer device, in order to solve the problem that inaccurate prediction may occur when a machine learning model trained by a conventional method is used for performing recommendation prediction.

A machine learning model training method, comprising:

obtaining model training data and corresponding labels; the model training data comprises characteristic data of a user sample and characteristic data of a recommended object sample;

determining the weight of the model training data according to the generation time of the model training data;

inputting the model training data into a machine learning model to obtain user behavior prediction results corresponding to the user sample and the recommended object sample;

determining a training target according to the user behavior prediction result, the difference of the labels and the weight;

and adjusting the model parameters of the machine learning model according to the direction of optimizing the training target, and continuing training until the training stopping condition is met.

A machine learning model training apparatus comprising:

the acquisition module is used for acquiring model training data and corresponding labels; the model training data comprises characteristic data of a user sample and characteristic data of a recommended object sample;

the determining module is used for determining the weight of the model training data according to the generation time of the model training data; inputting the model training data into a machine learning model to obtain user behavior prediction results corresponding to the user samples and the recommended object samples; determining a training target according to the user behavior prediction result, the difference of the labels and the weight;

and the adjusting module is used for adjusting the model parameters of the machine learning model according to the direction for optimizing the training target and continuing training until the training stopping condition is met.

A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the steps of the above-described machine learning model training method.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described machine learning model training method.

According to the machine learning model training method, the device, the computer readable storage medium and the computer equipment, the feature data of the user sample and the feature data of the recommended object sample are jointly used as model training data, so that a machine learning model capable of predicting the user behavior result of the user to the recommended object is trained. When the training target is determined, the influence of the change of the model training data along with the time on the prediction result is considered, and the weight is given to the model training data according to the time generated by the model training data, so that the model training data generated in a certain time period can be mainly learned on the basis of learning all data, and the accuracy of model estimation is improved; the effect is particularly obvious under the condition that the behavior of the user on the recommended object changes obviously along with the time.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a method for machine learning model training in one embodiment;

FIG. 2 is a diagram illustrating the structure of a machine learning model in one embodiment;

FIG. 3 is a flow diagram that illustrates a methodology for using a machine learning model in one embodiment;

FIG. 4 is a diagram of an application environment for training and using machine learning models, according to one embodiment;

FIG. 5 is a schematic flow diagram of the training and use of machine learning models in one embodiment;

FIG. 6 is a block diagram showing the structure of a machine learning model training apparatus according to an embodiment;

FIG. 7 is a block diagram showing the construction of a machine learning model training apparatus according to another embodiment;

FIG. 8 is a block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

As shown in FIG. 1, in one embodiment, a machine learning model training method is provided. The embodiment is mainly illustrated by applying the method to computer equipment. The computer device may specifically be a server or a terminal. Referring to fig. 1, the machine learning model training method specifically includes the following steps:

s102, obtaining model training data and corresponding labels; the model training data includes feature data of the user sample and feature data of the recommended object sample.

The model training data is characteristic data of a sample used for training a model, and is data input to the model during model training. The label corresponding to the model training data is the expected output of the model during model training. Different models differ in learning ability or use, and the samples required for training, the characteristic data of the samples, and the output of the models also differ. For example, a sample required for model training for recognizing a face image is the face image, the feature data of the sample is the face feature data, and the output of the model is a face recognition result. For another example, the samples required for identifying the model of the sound are audio data, the feature data of the samples are acoustic feature data, and the output of the model is a sound identification result.

In this embodiment, the computer device is intended to train a machine learning model that can predict the user behavior result of the user for the recommended object. Then, as can be understood by those skilled in the art, in the embodiment of the present invention, the samples include a user sample and a recommended object sample; the model training data, namely the characteristic data of the sample, comprises the characteristic data of the user sample and the characteristic data of the recommended object sample. In this way, the computer device can train the model to learn the feature data of the user sample and the feature data of the recommended object sample to predict the user behavior result of the user on the recommended object.

The recommendation object is an object for making a recommendation to a user. The recommendation object may specifically be promotion information, an application program, video, audio, news, or a commodity. The promotion information may specifically be an advertisement.

And S104, determining the weight of the model training data according to the generation time of the model training data.

The weight of the model training data is used for reflecting the importance degree of the model training data in the training process. As can be appreciated by those skilled in the art, the user behavior result of the recommended object by the user may change along with the change of time. For example, the user interest feature may change with time, and the user interest feature is a user feature that may affect the user behavior result of the user on the recommendation object. As another example, after many days of ad placement, the click-through rate or conversion rate may tend to increase or decrease over time. Therefore, when the computer device trains the machine learning model by adopting the model training data, different sample weights can be given to the model training data generated at different time points, so that the samples generated in a certain time period can be mainly learned on the basis of learning all the samples, and the accuracy of model estimation is improved. It is understood that there are various arrangements for giving different sample weights to the model training data generated at different time points, and the specific arrangement can refer to the related description in the following embodiments.

Specifically, the computer device may set a mapping relationship in which the weight of the model training data changes with the generation time of the model training data in advance based on a priori knowledge or empirical data. In this way, when the computer device generates the model training data or when the model training data is required to train the model, the weight of the model training data can be determined according to the mapping relationship. The mapping relationship may be a functional relationship. The functional relationship can be a positive correlation functional relationship or a negative correlation functional relationship; that is, the weight of the model training data may be a positive correlation change or a negative correlation change with the generation time change of the model training data.

And S106, inputting the model training data into the machine learning model to obtain user behavior prediction results corresponding to the user samples and the recommended object samples.

Wherein, machine Learning English is called Machine Learning, ML for short. The machine learning model may have particular capabilities through sample learning. The machine learning model can adopt a neural network model, a support vector machine, a logistic regression model, a random forest model or a gradient lifting tree model and the like. The machine learning model in the embodiment of the invention is a model which can predict the behavior result of the user to the recommended object after training.

The user behavior prediction result is a result of predicting that the user performs a certain behavior on the recommended object. The user behavior prediction result is used as the output of the machine learning model and is in one-to-one correspondence with the model training data input into the machine learning model; the user behavior prediction results corresponding to the user sample and the recommended object sample represent results of a user from which the model training data is predicted performing a certain behavior on the recommended object from which the model training data originates.

The user behavior prediction result may be a user behavior prediction probability. For example, the probability of the user viewing the recommended object or the probability of the user transforming the recommended object may be used. Specifically, the probability of the user clicking the promotion information, the probability of the user performing account registration, the probability of the user performing online transaction, or the like.

In a specific embodiment, the recommendation object is promotion information; the user behavior prediction result comprises a popularization information viewing prediction result and/or a popularization information conversion prediction result. Specifically, the popularization information viewing prediction result is a prediction result of viewing the popularization information by the user, and the popularization information conversion prediction result is a prediction result of converting the popularization information by the user. The user conversion promotion information may specifically be a behavior of performing registration or transaction and the like through the promotion information.

And S108, determining a training target according to the user behavior prediction result, the difference of the label and the weight.

Wherein the training objective is data for measuring the quality of the model. The training objective is typically a function of the difference between the actual output of the model and the expected output of the model as arguments. In embodiments of the present invention, the training target is typically set as a function of the arguments, both as a function of the difference between the actual output of the model and the expected output of the model, and the weights of the model training data, taking into account the effect of time on the samples.

It should be noted that the training of the machine learning model requires a large number of samples, that is, the number of model training data is multiple. The computer equipment can input all model training data into the machine learning model, and after user behavior prediction results corresponding to all the model training data are obtained, a training target is determined; or determining a training target after inputting each model training data into the machine learning model to obtain a user behavior prediction result corresponding to the model training data; and inputting a plurality of model training data into the machine learning model, and determining a training target after obtaining user behavior prediction results corresponding to the plurality of model training data.

In one embodiment, the computer device may determine the training targets each time one model training data is processed through the machine learning model. Determining a training target according to the user behavior prediction result, the difference of the label and the weight, wherein the training target comprises the following steps: taking the difference between the user behavior prediction result of the current model training data and the label as an independent variable to generate a corresponding loss function; and multiplying the loss functions by the weight of the corresponding model training data to obtain a training target. Thus, the training target is derived from data of only a single sample, and is a single sample training target.

In one embodiment, a computer device may group model training data, each group including a plurality of model training data. In this way, the computer device can determine the training objectives each time a set of model training data is processed through the machine learning model. Determining a training target according to the user behavior prediction result, the difference of the label and the weight, wherein the training target comprises the following steps: taking the difference between the user behavior prediction result of each model training data and the label as an independent variable to generate a corresponding loss function; and weighting and summing the loss functions according to the weight of the corresponding model training data to obtain a training target. Thus, the training targets are derived from data of partial samples, and are local training targets.

In one embodiment, the computer device may determine the training objectives after processing all model training data through the machine learning model. Determining a training target according to the user behavior prediction result, the difference of the label and the weight, wherein the training target comprises the following steps: taking the difference between the user behavior prediction result of each model training data and the label as an independent variable to generate a corresponding loss function; and weighting and summing the loss functions according to the weight of the corresponding model training data to obtain a training target. Thus, the training target is derived from the data of all samples and is a global training target.

Where the loss function is used to measure the difference between the actual output of the model and the expected output of the model. The loss function may specifically be a squared loss function, an exponential loss function, a negative likelihood function, or the like.

And S110, according to the direction of the optimized training target, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met, and finishing the training.

Wherein the training stop condition is a condition for ending the model training. The training stopping condition may be that a preset number of iterations is reached, or that the predicted performance index of the model after the model parameters are adjusted reaches a preset index.

Specifically, according to the direction of optimizing the training target, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met, and the method comprises the following steps: and according to the direction of the minimized training target, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met, and finishing the training.

In one embodiment, the training target is a global training target, and the global optimal solution can be obtained by optimizing the global training target in a batch gradient descent manner, that is, by minimizing the loss function of all samples.

In one particular embodiment, the global training objective is represented by the following equation:

where N is the number of model training data, x _i Training data for the ith model, y _i For the corresponding label of the ith model training data, v _i Weight for the ith model training data, f (x) _i W) is a user behavior prediction result corresponding to the ith model training data, and W is a model parameter of the machine learning model. L (y) _i ,f(x _i W)) is the corresponding loss function for the ith sample. R (W) is a regular term, and the purpose of the regular term is to improve the generalization capability of the model and prevent overfitting.

It will be appreciated that the global training objective is optimized, i.e. one W is sought, such that the actual output f (x) of the machine learning model _i W) the loss on all model training data is minimal. However, when the global training target is optimized, all model training data are used for each iteration, and the method is suitable for scenes with a small number of model training data. When the number of model training data is large, the calculation amount of each iteration is large, the iteration speed is influenced, and further the model training efficiency is influenced.

In the embodiment, all model training data are used for updating when the model parameters are adjusted every time, the obtained global optimal solution is solved, and the trained model has a good prediction effect.

In one embodiment, the training target is a single sample training target, and the process of optimizing the single sample training target may be a random gradient descent method, that is, a process of minimizing a loss function of each sample to obtain a global optimal solution. Although the loss function obtained by each iteration is not towards the global optimal direction, the large overall direction is towards the global optimal solution, and the final result is often near the global optimal solution and is suitable for the scene of a large-scale sample.

For example, if the sample size is large (e.g., hundreds of thousands), then only tens or thousands of samples may be used to iterate to the optimal solution. Compared with batch gradient descent, one iteration needs hundreds of thousands of training samples, one iteration cannot be optimal, and if the iteration is performed for 10 times, the training samples need to be traversed for 10 times, so that the calculation complexity is high.

In other embodiments, the training targets are local training targets, and the optimization of the local training targets may be performed in a small batch gradient descent manner. Specifically, a part of samples are used for updating when model parameters are adjusted, so that the defects of the two modes are overcome, and the advantages of the two methods are considered at the same time.

According to the machine learning model training method, the feature data of the user sample and the feature data of the recommended object sample are jointly used as model training data, so that a machine learning model capable of predicting the user behavior result of the user to the recommended object is trained. When the training target is determined, the influence of the change of the model training data along with time on the prediction result is considered, and the weight is given to the model training data according to the time generated by the model training data, so that the model training data generated in a certain time period can be mainly learned on the basis of learning all data, and the accuracy of model estimation is improved; especially, the effect is obvious under the condition that the change of the recommended object is obvious along with the time.

In one embodiment, obtaining model training data and corresponding labels includes: collecting characteristic data of a user sample and characteristic data of a recommended object sample to generate model training data; when target behavior data of a recommended object sample exists in the feature data of the user sample, setting a label corresponding to the model training data as a positive sample training label; and when the target behavior data of the recommended object sample does not exist in the feature data of the user sample, setting the label corresponding to the model training data as a negative sample training label.

Wherein the feature data of the user sample is data reflecting the user feature. The characteristic data of the user sample comprises user behavior data, user interest data, user attribute data and the like.

User behavior data is data that reflects characteristics of user behavior. Such as social behavior data, promotional information interaction data, web browsing data, or application usage data, etc. The user interest data is data reflecting a user interest characteristic. User interest data may be extracted from user behavior data. For example, the computer device may analyze the advertisement picture clicked by the user, and identify an object, such as a shoe or a garment, in the advertisement picture. Through analysis, if the advertisements which are clicked by one user recently all contain clothes, the advertisements which are liked by the user recently and comparatively like clothes can be reflected to a certain extent.

The user attribute data is data reflecting the basic attribute of the user. Such as name, gender, age, school calendar, device information or location information of the user terminal, etc. It can be understood that the device information may reflect the attributes of the device user to some extent, for example, the interest preferences of users of different brands of mobile phones are different; location information of the user, etc. may locate the user's location and thereby analyze his or her interest preferences.

The feature data of the recommended object sample is data reflecting the feature of the recommended object. The feature data of the recommended object sample includes recommended object attribute data, recommended object content data, and the like. The recommendation object content data includes recommendation object text feature data and recommendation object image feature data. The recommended object attribute data is data reflecting basic attributes of the recommended object. Such as recommending an object type, etc.

The target behavior reflected by the target behavior data is the behavior predicted by the computer device by the intention training machine learning model. For example, the target behavior data reflects advertisement click behavior, and the computer device is then intended to train the machine learning model to learn to predict whether the user will click on the advertisement.

The target behavior reflected by the target behavior data may be a recommended object viewing behavior or a recommended object conversion behavior. When the target behavior data reflect different target behaviors, the meanings of labels corresponding to the model training data are different; the trained machine learning model also has different prediction capabilities. For example, the target behavior data reflects advertisement click behavior, the label corresponding to the model training data indicates that the user clicks an advertisement, and the trained machine learning model has prediction capability for predicting the click rate of the user on the advertisement. For another example, the target behavior data reflects an advertisement conversion behavior, the label corresponding to the model training data indicates that the user converts the advertisement, and the predictive ability of the trained machine learning model is to predict the conversion rate of the user to the advertisement.

Specifically, the computer device may pull feature data of the user sample from a server corresponding to each user behavior scenario. The computer equipment can respectively send the user identification of the user sample to the server corresponding to each user behavior scene, after the server corresponding to each user behavior scene receives the user identification, the server searches the characteristic data corresponding to the user identification, and then feeds the searched characteristic data back to the computer equipment. The computer device can also pull the characteristic data of the user sample from the user terminal corresponding to the user sample. In this way, the computer device collects characteristic data of the user sample.

Further, the computer device may determine a recommended object sample from the user behavior data in the feature data of the user sample, and extract the feature data of the recommended object sample. In this way, the computer device collects the feature data of the recommended object sample. For example, if a piece of user behavior data is "user a clicks on ad a, ad axxxxx (ad-specific information)", a sample of recommended objects may be determined: and extracting the characteristic data of the advertisement a from the specific advertisement related information.

Specifically, the computer device can uniformly determine the recommended object sample from all the user behavior data without distinguishing the user sample, and extract the feature data of the recommended object sample; the computer device then generates model training data from the feature data of any user sample and the feature data of any recommended object sample. The computer equipment can also distinguish user samples, determine a recommended object sample according to the user behavior data of each user sample, and extract the feature data of the recommended object sample; the computer device then generates model training data according to the feature data of the user sample and the extracted feature data of any recommended object sample. Of course, the computer device may collect the feature data of the recommended object sample in another way. The embodiment of the present invention is not limited thereto.

Further, for each model training data, the computer device may check to see whether there is target behavior data for a sample of recommended objects from which the model training data originated in the feature data of the sample of users that generated the model training data. For example, the model training data 1 is generated according to the feature data of the user sample a and the feature data of the recommended object sample a; then, the model training data 1 may be considered to be derived from the user sample a and the recommended object sample a; the computer device checks whether the characteristic data of the user sample A has target behavior data of the recommended object sample a.

When target behavior data of a recommended object sample exists in the feature data of the user sample, judging that the model training data is positive sample data, and setting a label corresponding to the model training data as a positive sample training label; and when the target behavior data of the recommended object sample does not exist in the feature data of the user sample, judging that the model training data is the negative sample data, and setting a label corresponding to the model training data as a negative sample training label.

In this embodiment, whether target behavior data of a recommended object sample from which the model training data originates is present or not is automatically checked in feature data of a user sample for generating the model training data, and a training label is set for the model training data, so that time consumption of manual calibration is avoided, and accuracy of the training label is guaranteed.

In one embodiment, the model training data is a sample feature vector; collecting feature data of a user sample and feature data of a recommended object sample to generate model training data, wherein the model training data comprises the following steps: collecting user behavior data and user attribute data of a user sample; collecting recommendation object content data and recommendation object attribute data of a recommendation object sample; mapping user behavior data and user attribute data of the user sample into vector elements respectively, and mapping recommendation object content data and recommendation object attribute data of the recommendation object sample into vector elements respectively; and combining the corresponding each vector element of the user sample and the corresponding each vector element of the recommendation object sample to generate a sample feature vector.

The sample feature vector is a vector representing features of the sample, that is, features of the sample are represented in a vector form. The sample feature vector represents a low-dimensional representation of the features of the sample, representing the features of the sample in mathematical form. Vector elements are constituent units of a vector, and vector elements may also be vectors.

Specifically, the computer device may set in advance a mapping relationship that maps non-vector data into vector data. In this way, after the computer device collects the user behavior data and the user attribute data of the user sample and the recommended object content data and the recommended object attribute data of the recommended object sample, the user behavior data and the user attribute data of the user sample can be respectively mapped into vector elements according to the preset mapping relationship, and the recommended object content data and the recommended object attribute data of the recommended object sample can be respectively mapped into vector elements; and combining the corresponding each vector element of the user sample and the corresponding each vector element of the recommendation object sample to generate a sample feature vector.

For example, non-vector data is mapped into vector data, such as "shoes" as [0,1,0, …,0]"clothes" is mapped to [1,1,0, …,0]"trousers" is mapped as [0,1,1, …,0]And the like. Sample feature vector is X = (X) ₁ ,x ₂ ,...,x _m ) Wherein x is ₁ ,x ₂ ,...,x _m For the m vector elements forming the sample feature vector, each vector element may also be a vector, representing feature data of a user sample or feature data of a recommended object sample. For example, usingThe attribute characteristics of the user sample, the behavior characteristics of the user sample, the interest characteristics of the user sample, the attribute characteristics of the recommended object sample, the image characteristics of the recommended object sample, the character characteristics of the recommended object sample and the like.

In this embodiment, the feature data of the user sample and the feature data of the recommended object sample are mapped into vector elements, and the vector elements are combined to obtain a sample feature vector, which is then used as the input of the model, so that the data volume is greatly reduced, and the training efficiency of the model is improved.

In this embodiment, the computer device may also combine the sample feature vectors and corresponding labels into a binary set. For example, a binary < X, y >, X being a sample feature vector and y being a label corresponding to X. y ∈ {1,0} indicates whether the user sample from which X originates has performed some behavior on the recommendation object sample from which X originates. For example, in click behavior prediction, y ∈ {1,0} indicates whether the user clicked on an advertisement; in the conversion behavior prediction, y ∈ {1,0} indicates whether the user has converted the advertisement.

Of course, the computer device may also directly use model training data in a non-vector form as an input of the model, which is not limited in the embodiment of the present invention.

In one embodiment, inputting the model training data into the machine learning model to obtain the user behavior prediction results corresponding to the user samples and the recommended object samples, includes: determining the object type to which the recommended object sample belongs; and inputting the model training data into the machine learning model, and controlling to output the user behavior prediction results corresponding to the user samples and the recommended object samples through an output unit corresponding to the object type in the machine learning model.

The object type is a classification type obtained by classifying the recommended object. Such as promotional information type, news type, video type, or audio type, etc. In the embodiment of the invention, the classification of the recommended objects can be multi-level classification. For example, the promotion information type may include a commodity promotion information type or an application promotion information type; the news type includes an hourly news type or an entertainment news type.

It will be appreciated that the likelihood of a user performing a particular action on different types of recommended objects may vary. For example, a user may click to view an advertisement of the merchandise promotional information type, but may not click to view an advertisement of the application promotional information type. That is, the relationship between model training data derived from different object type recommendation objects and the output of the model is different. The computer device may then set up different output units for different object types when building the model.

In this way, after the computer device inputs the model training data into the machine learning model, the computer device can control the output unit corresponding to the object type of the recommended object sample from which the model training data originates in the machine learning model to output the user behavior prediction results corresponding to the user sample from which the model training data originates and the recommended object sample. That is to say, the computer device learns the model training data according to object types by defining the structure of the machine learning model, and can learn different object types to obtain different model parameters.

In one embodiment, the machine learning model includes an input layer, an intermediate hidden layer, and an output layer; the output layer includes output units corresponding to respective ones of the plurality of object types. According to the direction of optimizing the training target, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met, wherein the method comprises the following steps: and adjusting model parameters of an input layer, an intermediate hidden layer and an output unit corresponding to the determined object type in the machine learning model according to the direction of optimizing the training target, and continuing training until the training stopping condition is met.

Specifically, the computer device may train the machine learning model by object type while classifying the model training data by the object type to which the derived recommended object sample belongs, and locally adjust model parameters related to the corresponding object type when adjusting the machine learning model. Namely, model parameters of an input layer, an intermediate hidden layer and an output unit corresponding to the object type of the recommended object sample from which the model training data is derived in the machine learning model are adjusted. Wherein, the number of the middle hidden layers can be one or more.

FIG. 2 illustrates a structural diagram of a machine learning model in one embodiment. In this embodiment, an intermediate hidden layer is taken as an example. Referring to FIG. 2, each node in the input layer represents an element x of each vector in the sample feature vector _i And the offset b of the layer, the edge between the node of the input layer and the node of the middle hidden layer represents the connection weight between the nodes, and the node of the middle hidden layer receives the signal from the node of the input layer, generates an output value after the action of an activation function and transmits the output value to the node of the next layer.

Assuming that the activation function is σ (x), the output of the intermediate hidden layer node i is:

wherein the sample feature vector comprises m vector elements,

represents the connection weight between the input level node j and the intermediate hidden level node i, is->

Indicating the offset of the input layer to the intermediate hidden layer node i.

The output of the output layer node i is:

wherein the intermediate hidden layer comprises n nodes and the output layer comprises c nodes. n may be the same value as m.

Represents the connection weight between the intermediate hidden layer node j and the output layer node i, is->

Indicating the offset of the intermediate hidden layer to the output node i. Different output layer nodes correspond to different object types. For example, when a recommended object sample from which model training data originates belongs to a commodity promotion information type, a user behavior prediction result is output by the output layer node 1; and the recommended object sample from which the model training data is derived belongs to the type of the application promotion information, and the output layer node 2 outputs a user behavior prediction result, and the like.

In another embodiment, assuming that the intermediate hidden layer is a multi-layer, and the (l-1) th layer has m nodes in total, the output of the jth node for the l-th layer is:

wherein, W ^l Is composed of

Matrix of all elements, H ^l-1 Is->

Matrix of all elements, B ^l-1 Is->

And a matrix formed by all elements, wherein l belongs to { 2., q }, and q is the number of layers of the model. And (3) sequentially substituting the node values of the input layer and the middle hidden layer into a calculation formula of the final output layer to obtain a calculation relation from the input layer to the output layer. In an embodiment of the present invention, the model output is represented using f (X, W), where W is W ^l And B ^l-1 A set of compositions.

Given a large number of sample feature vectors, i.e. a large number of dyads<X _i ，y _i >I =1, …, n. The learning process of the machine learning model is to solve the following optimization equation:

wherein, L (y) _i ,f(X _i W)) is a loss function on the doublet i, and R (W) is a regularization term to avoid overfitting the model. As can be appreciated, the first and second components,

for the training objective, the optimization training objective may be to solve the optimization equation (5), whose objective is to find a W such that the loss of the output value f (X, W) of the neural network over all doublets is minimal.

In the embodiment, the independent output unit is arranged for each type of recommended object in the output layer, so that samples of different object types can be trained respectively in the training model, and the model parameters of different object types are obtained through training, thereby improving the model training efficiency and the model prediction effect.

In one embodiment, determining weights for model training data based on times of generation of the model training data includes: inquiring generation time of model training data; acquiring a time attenuation function configured corresponding to the type of the object; weights for the model training data are calculated based on the generation time and the time decay function.

The generation time of the model training data is a time when a certain behavior of the user sample from which the model training data is derived with respect to the recommended object sample from which the model training data is derived occurs. For example, the generation time of the model training data of the advertisement viewed by the user is the time for the user to click the advertisement after browsing the advertisement; and the generation time of the model training data of the user conversion advertisement is the time of the user for converting the advertisement after browsing the advertisement.

Different object types can be configured with the same time decay function; different time decay functions may also be configured. The time decay function may directly use the generation time of the model training data as an argument, or may use the time interval between the generation time of the model training data and the current time node as an argument.

In a specific embodiment, the time decay function is an exponential function, and the specific formula is as follows:

v(t)＝N ₀ e ^-λt (6)

wherein t is the time interval between the generation time of the model training data and the current time, N ₀ Is the initial amount at t = 0.λ is the decay constant. The value of λ may be different for different object types, i.e. different object types configure different time decay functions.

In one embodiment, the time decay function is a piecewise function; when the time independent variable of the time attenuation function falls in a time interval formed by a preset time node and the current time node; the time attenuation function is a constant function; when the time independent variable of the time attenuation function is out of the time interval formed by the preset time node and the current time node, the time attenuation function is a function which is in negative correlation with the time independent variable.

The preset time node is a time node used for segmenting the independent variable. The argument is divided into two parts by a preset time node.

In one specific embodiment, the specific formula of the time decay function is as follows:

v(t)＝N ₀ (t≤k)

v(t)＝N ₀ e ^-λ(t-k) (t>k) (7)

k is a preset time node, and values of λ and k may be different under different object types, that is, different object types configure different time attenuation functions.

In this embodiment, a protection zone is set for data in a recent period of time, and attenuation is not maintained for an initial amount, so as to ensure importance of recent data, so that a machine learning model can learn data occurring in the recent period of time in a focused manner on the basis of learning all data, and accuracy of model prediction is improved.

It will be appreciated that model training data is typically extracted from data over a longer period of time, such as a week. During the week, if the data changes greatly and all the data have the same weight, the algorithm tends to adapt to all the data and cannot adapt to the latest data well, resulting in a large estimated deviation of the new data in the latest period. However, the historical data cannot be completely discarded during model training, which results in ignoring some repeated data patterns. Therefore, in the embodiment of the present invention, a concept of sample weight is introduced, and the magnitude of the sample weight is determined by the generation time of the sample. That is, each tuple has its own weight value. To account for the weight of each doublet, the optimization equation can be redefined as:

according to the formula, in the embodiment of the invention, the big-weight duplet is mainly considered in the process of solving the optimization equation, so that the neural network can learn all the duplets and can also perform important learning on part of the important duplets.

Assuming there are c types of advertisements, for the ith type of advertisement, the computer device generates n _i Model input data, i.e. n _i A two-tuple

For each sample, the optimization equation is specifically:

wherein, the first and the second end of the pipe are connected with each other,

weights of the jth model training data for the ith object type.

In the embodiment, the influence of the change of the sample along with time on the prediction result is considered, and the weight is given to the sample according to the time of the sample, so that the data generated in a certain time period can be mainly learned on the basis of learning all data, and the accuracy of model estimation is improved.

In one embodiment, after the training is finished when the training stop condition is satisfied, the machine learning model training method further includes a using step of the machine learning model, where the using step specifically includes the following steps:

s302, feature data of the target user and feature data of each candidate recommendation object are obtained.

The target user is a user to be subjected to object recommendation. The candidate recommendation objects are objects provided by a recommendation object provider for recommendation to the user.

Specifically, the computer device may be a server corresponding to an application program that can perform object recommendation. After the user terminal logs in the application program capable of object recommendation through the user identifier, the user identifier can be sent to the server corresponding to the application program. The server can query the characteristic data corresponding to the user identification to obtain the characteristic data of the target user. The server inquires the feature data of the recommended object recommended by the server corresponding to the application program to obtain the feature data of each candidate recommended object.

In further embodiments, the computer device may also be a recommendation server of a third party. The third party is a third party different from the recommendation object provider and the recommendation object recommender. After the user terminal logs in the application program capable of object recommendation through the user identifier, the user identifier can be sent to the server corresponding to the application program, and the server corresponding to the application program can forward the user identifier to the recommendation server. The recommendation server can inquire the characteristic data corresponding to the user identification to obtain the characteristic data of the target user. And the recommending server inquires the characteristic data of the recommending object recommended by the server corresponding to the application program so as to obtain the characteristic data of each candidate recommending object. The server corresponding to the application program capable of recommending the object is used for providing a recommended object recommending slot. The server for providing the recommendation object can provide the recommendation object to a server corresponding to the application program capable of object recommendation, so as to recommend the object to the user through the recommendation object recommendation slot provided by the server.

And S304, generating model input data by respectively matching the feature data of the target user with the feature data of each recommended object.

Specifically, the computer device generates model input data by acquiring feature data of a target user and feature data of each candidate recommended object and respectively comparing the feature data of the target user with the feature data of each recommended object. That is, the model input data corresponds to the recommended objects one to one, and the number of the model input data is the same as the number of the recommended objects.

And S306, inputting the input data of each model into the trained machine learning model to obtain user behavior prediction results corresponding to the target user and each recommended object.

Specifically, the computer device may input each model input data separately into the trained machine learning model. For each model input data, a user behavior prediction result output by the machine learning model corresponding to the target user and corresponding to the recommended object from which the model input data originated may be obtained.

In one embodiment, the number of machine learning models that are trained is multiple. Different machine learning models teach different predictive capabilities, but the model input data is the same. Training the completed machine learning model may include viewing a behavior prediction model and/or transforming a behavior prediction model.

In one embodiment, S306 includes: inputting the model input data into the trained machine learning model respectively; determining output units corresponding to input data of each model through a machine learning model; the output unit corresponding to each model input data corresponds to the object type of the recommended object corresponding to each model input data; and for each model input data, outputting a user behavior prediction result corresponding to the target user and the recommended object corresponding to the model input data through a corresponding output unit in the machine learning model.

Specifically, when the machine learning model processes the model input data, the machine learning model may determine, according to the processed or processed feature data, an object type to which a recommended object from which the model input data originates, so as to determine a model unit through which the machine learning model processes the model input data. In this way, for each model input data, the user behavior prediction result corresponding to the target user and the recommended object corresponding to the model input data is output through the corresponding output unit in the machine learning model.

And S308, selecting recommended objects from the candidate recommended objects for recommendation according to the user behavior prediction results.

The user behavior prediction result may specifically be a user behavior prediction probability. Specifically, when there is one machine learning model, then there is one user behavior prediction result for each model input data, that is, one user behavior prediction result for each candidate recommendation object. The computer equipment can sort the candidate recommendation objects according to the size sequence of the corresponding user behavior prediction results, and selects the recommendation objects ranked in the front from the candidate recommendation objects for recommendation.

In further embodiments, when the machine learning model is plural, then there are plural user behavior predictions for each model input data, that is, there are plural user behavior predictions for each candidate recommendation object. The computer equipment performs weighted summation on a plurality of user behavior prediction results existing in each candidate recommendation object to obtain a user behavior comprehensive prediction result, sorts the candidate recommendation objects according to the magnitude sequence of the corresponding user behavior comprehensive prediction results, and selects the recommendation objects ranked in the front from the candidate recommendation objects for recommendation.

In the above embodiment, when the machine learning model is trained, the influence of the generation time of the model training data is considered, so that the prediction accuracy of the trained model is greatly improved when the trained model is used for prediction.

FIG. 4 illustrates a schematic diagram of an application environment for training and using a machine learning model in one embodiment. Referring to fig. 4, the application environment includes a user terminal 410, a server 420 corresponding to a recommended object provider, a server 430 corresponding to a recommended object recommender, and a recommendation server 440. In this embodiment, the recommendation object may specifically be promotion information, such as an advertisement; the server corresponding to the recommendation object provider may be a server corresponding to a promotion information provider, such as an advertiser server; the server corresponding to the recommendation object recommender may specifically be a server corresponding to the promotion information sponsor, such as an advertisement space providing server; the recommendation server is a server that trains and uses machine learning models. Of course, the server 430 and the recommendation server 440 corresponding to the recommendation object recommender may be the same server, the server 420 and the recommendation server 440 corresponding to the recommendation object provider may also be the same server,

FIG. 5 illustrates a flow diagram for training and using a machine learning model in one embodiment. Referring to fig. 5, the process of training and using the machine learning model includes: a data collection phase, a data processing phase, a model training phase and a model using phase.

In the data collection stage, the recommendation server can pull the characteristic data of the user sample from the server or the user terminal corresponding to each user behavior scene; and determining a recommended object sample from the user behavior data in the feature data of the user sample, and extracting the feature data of the recommended object sample.

In the data processing stage, the recommendation server maps the user behavior data and the user attribute data of the user sample into vector elements respectively, and maps the recommendation object content data and the recommendation object attribute data of the recommendation object sample into vector elements respectively; and combining the corresponding each vector element of the user sample and the corresponding each vector element of the recommendation object sample to generate a sample feature vector. The recommendation server may add a corresponding tag to each sample feature vector and determine an object type corresponding to each sample feature vector. And the recommendation server calculates the weight of the sample feature vector according to the generation time of the sample feature vector and the time attenuation function correspondingly matched with the object type corresponding to the sample feature vector.

In the model training phase, the recommendation server sets the model structure of the machine learning model to include an input layer, an intermediate hidden layer and an output layer. Wherein the output layer comprises output units corresponding to the object types respectively. The recommendation server can perform machine learning model training according to the object types, and for the sample characteristic vector corresponding to each object type, the sample characteristic vector is input into the machine learning model to obtain a user behavior prediction result; and determining a training target according to the user behavior prediction result, the difference of the label and the weight, adjusting model parameters of an input layer, a middle hidden layer and an output unit corresponding to the type of the object to which the recommended object sample originated from the sample feature vector in the machine learning model according to the direction of optimizing the training target, and continuing training until the training stop condition is met.

In the model using stage, the recommendation server acquires the feature data of the target user and the feature data of each candidate recommendation object; respectively generating model input data by the characteristic data of the target user and the characteristic data of each recommended object; inputting the model input data into the trained machine learning model respectively; determining output units corresponding to input data of each model through a machine learning model; the output unit corresponding to each model input data corresponds to the object type of the recommended object corresponding to each model input data; and for each model input data, outputting a user behavior prediction result corresponding to the target user and the recommended object corresponding to the model input data through a corresponding output unit in the machine learning model. And the recommending server selects recommended objects from the candidate recommended objects according to the user behavior prediction results and informs the server 430 corresponding to the recommending object recommending party to recommend the selected recommended objects.

It should be understood that, although the steps in the flowcharts of the above embodiments are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above embodiments may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the sub-steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the sub-steps or the stages of other steps.

As shown in FIG. 6, in one embodiment, a machine learning model training apparatus 600 is provided. Referring to fig. 6, the machine learning model training apparatus 600 includes: an obtaining module 601, a determining module 602, and an adjusting module 603.

An obtaining module 601, configured to obtain model training data and a corresponding label; the model training data includes feature data of the user sample and feature data of the recommended object sample.

A determining module 602, configured to determine weights of the model training data according to generation times of the model training data; inputting the model training data into a machine learning model to obtain user behavior prediction results corresponding to the user samples and the recommended object samples; and determining a training target according to the user behavior prediction result, the difference of the labels and the weight.

And the adjusting module 603 is configured to adjust model parameters of the machine learning model according to the direction of the optimized training target, and continue training until the training stop condition is met.

In one embodiment, the obtaining module 601 is further configured to collect feature data of a user sample and feature data of a recommended object sample to generate model training data; when target behavior data of a recommended object sample exists in the feature data of the user sample, setting a label corresponding to the model training data as a positive sample training label; and when the target behavior data of the recommended object sample does not exist in the feature data of the user sample, setting the label corresponding to the model training data as a negative sample training label.

In one embodiment, the model training data is a sample feature vector. The obtaining module 601 is further configured to collect user behavior data and user attribute data of the user sample; collecting recommendation object content data and recommendation object attribute data of a recommendation object sample; mapping the user behavior data and the user attribute data of the user sample into vector elements respectively, and mapping the recommended object content data and the recommended object attribute data of the recommended object sample into vector elements respectively; and combining the corresponding each vector element of the user sample and the corresponding each vector element of the recommendation object sample to generate a sample feature vector.

In one embodiment, the determining module 602 is further configured to determine a type of object to which the recommended object sample belongs; and inputting the model training data into the machine learning model, and controlling to output the user behavior prediction results corresponding to the user samples and the recommended object samples through an output unit corresponding to the object types in the machine learning model.

In one embodiment, the determining module 602 is further configured to query a generation time of the model training data; acquiring a time attenuation function configured corresponding to the object type; weights for the model training data are calculated based on the generation time and the time decay function.

In one embodiment, the time decay function is a piecewise function; when the time independent variable of the time attenuation function falls in a time interval formed by a preset time node and the current time node; the time attenuation function is a constant function; when the time independent variable of the time attenuation function is outside the time interval formed by the preset time node and the current time node, the time attenuation function is a function which is negatively related to the time independent variable.

In one embodiment, the machine learning model includes an input layer, an intermediate hidden layer, and an output layer; the output layer includes output units corresponding to respective ones of the plurality of object types. The adjusting module 603 is further configured to adjust model parameters of an input layer, an intermediate hidden layer, and an output unit corresponding to the determined object type in the machine learning model according to the direction of the optimization training target, and continue training until the training is finished when a training stop condition is satisfied.

In one embodiment, the number of model training data is plural. The determining module 602 is further configured to generate a corresponding loss function by using the difference between the user behavior prediction result of each model training data and the label as an argument; and weighting and summing the loss functions according to the weight of the corresponding model training data to obtain a training target. The adjusting module 603 is further configured to adjust model parameters of the machine learning model according to the direction of the minimized training target and continue training until the training stop condition is met.

As shown in fig. 7, in an embodiment, the machine learning model training apparatus 600 further includes a use module 604, configured to obtain feature data of the target user and feature data of each candidate recommendation object; respectively generating model input data by the characteristic data of the target user and the characteristic data of each recommended object; inputting the input data of each model into the trained machine learning model to obtain user behavior prediction results corresponding to the target user and each recommended object; and selecting recommendation objects from the candidate recommendation objects for recommendation according to the prediction results of the user behaviors.

In one embodiment, the usage module 604 is further configured to input each model input data into the trained machine learning model; determining output units corresponding to input data of each model through a machine learning model; the output unit corresponding to each model input data corresponds to the object type of the recommended object corresponding to each model input data; and for each model input data, outputting a user behavior prediction result corresponding to the target user and the recommended object corresponding to the model input data through a corresponding output unit in the machine learning model.

In one embodiment, the recommendation object includes promotional information; the user behavior prediction result comprises a popularization information viewing prediction result and/or a popularization information conversion prediction result.

FIG. 8 is a diagram illustrating an internal structure of a computer device in one embodiment. As shown in fig. 8, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a machine learning model training method. The internal memory may also have stored therein a computer program that, when executed by the processor, causes the processor to perform a machine learning model training method. It will be appreciated by those skilled in the art that the configuration shown in fig. 8 is a block diagram of only a portion of the configuration associated with the present application, and is not intended to limit the computing device to which the present application may be applied, and that a particular computing device may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, the machine learning model training apparatus provided herein may be implemented in the form of a computer program that is executable on a computer device such as that shown in fig. 8. The memory of the computer device may store various program modules constituting the machine learning model training apparatus, such as the obtaining module 601, the determining module 602, and the adjusting module 603 shown in fig. 6. The program modules constitute computer programs that cause the processor to execute the steps of the machine learning model training methods of the embodiments of the present application described in the present specification.

For example, the computer device shown in fig. 8 may obtain model training data and corresponding labels through the obtaining module 601 in the machine learning model training apparatus 600 shown in fig. 6; the model training data includes feature data of the user sample and feature data of the recommended object sample. Determining, by the determining module 602, a weight of the model training data according to a generation time of the model training data; inputting the model training data into a machine learning model to obtain user behavior prediction results corresponding to the user samples and the recommended object samples; and determining a training target according to the user behavior prediction result, the difference of the labels and the weight. And adjusting model parameters of the machine learning model and continuing training by the adjusting module 603 according to the direction of the optimized training target until the training stopping condition is met.

In one embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above-described machine learning model training method. Here, the steps of the machine learning model training method may be steps in the machine learning model training methods of the above embodiments.

In one embodiment, a computer readable storage medium is provided, storing a computer program that, when executed by a processor, causes the processor to perform the steps of the above-described machine learning model training method. Here, the steps of the machine learning model training method may be steps in the machine learning model training methods of the above embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims

1. A machine learning model training method, comprising:

inquiring generation time of the model training data;

acquiring a time attenuation function configured corresponding to the object type to which the recommended object sample belongs;

calculating the weight of the model training data according to the generation time and the time attenuation function; for model training data of which the time interval between the generation time and the current time is less than or equal to a preset time node, the corresponding weight keeps the initial quantity not to decay; for model training data with a time interval larger than a preset time node, the corresponding weight is in negative correlation with the time interval;

inputting the model training data into a machine learning model to obtain user behavior prediction results corresponding to the user samples and the recommended object samples;

constructing a training target by taking the user behavior prediction result of each model training data, the difference of the labels and the weight of the corresponding model training data as independent variables;

and according to the direction of optimizing the training target, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met.

2. The method of claim 1, wherein obtaining model training data and corresponding labels comprises:

collecting characteristic data of a user sample and characteristic data of a recommended object sample to generate model training data;

when target behavior data of the recommended object sample exists in the feature data of the user sample, setting a label corresponding to the model training data as a positive sample training label;

and when the target behavior data of the recommended object sample does not exist in the feature data of the user sample, setting the label corresponding to the model training data as a negative sample training label.

3. The method of claim 2, wherein the model training data is a sample feature vector; the collecting the feature data of the user sample and the feature data of the recommended object sample to generate model training data comprises the following steps:

collecting user behavior data and user attribute data of a user sample;

collecting recommendation object content data and recommendation object attribute data of a recommendation object sample;

mapping the user behavior data and the user attribute data of the user sample into vector elements respectively, and mapping the recommended object content data and the recommended object attribute data of the recommended object sample into vector elements respectively;

and combining the vector elements corresponding to the user samples and the vector elements corresponding to the recommended object samples to generate sample feature vectors.

4. The method of claim 1, wherein inputting the model training data into a machine learning model to obtain the user behavior prediction results corresponding to the user samples and the recommended object samples comprises:

determining the object type of the recommended object sample;

and inputting the model training data into a machine learning model, and controlling to output user behavior prediction results corresponding to the user samples and the recommended object samples through an output unit corresponding to the object type in the machine learning model.

5. The method of claim 4, wherein the machine learning model comprises an input layer, an intermediate hidden layer, and an output layer; the output layer comprises output units corresponding to a plurality of object types;

adjusting the model parameters of the machine learning model and continuing training according to the direction of optimizing the training target until the training stopping condition is met, wherein the training is finished by the following steps:

and according to the direction of optimizing the training target, adjusting model parameters of an input layer, a middle hidden layer and an output unit corresponding to the determined object type in the machine learning model, and continuing training until a training stop condition is met.

6. The method of claim 1, wherein the time decay function is a piecewise function; when the time independent variable of the time attenuation function is in a time interval formed by a preset time node and a current time node; the time decay function is a constant function; and when the time independent variable of the time attenuation function is out of the time interval formed by the preset time node and the current time node, the time attenuation function is a function which is inversely related to the time independent variable.

7. The method of claim 1, wherein the number of model training data is plural; the step of constructing a training target by taking the user behavior prediction result of each model training data, the difference of the labels and the weight of the corresponding model training data as independent variables comprises the following steps:

taking the difference between the user behavior prediction result of each model training data and the label as an independent variable to generate a corresponding loss function;

weighting and summing the loss functions according to the weight of the corresponding model training data to obtain a training target;

and according to the direction of minimizing the training target, adjusting the model parameters of the machine learning model and continuing training until the training stopping condition is met, and finishing the training.

8. The method of claim 1, wherein after finishing training when a training stop condition is satisfied, the method further comprises:

acquiring feature data of a target user and feature data of each candidate recommendation object;

respectively generating model input data by the characteristic data of the target user and the characteristic data of each recommended object;

inputting the model input data into the trained machine learning model to obtain user behavior prediction results corresponding to the target user and each recommended object;

and selecting recommended objects from the candidate recommended objects for recommendation according to the user behavior prediction results.

9. The method of claim 8, wherein inputting each of the model input data into a trained machine learning model to obtain a user behavior prediction result corresponding to each of the target user and each of the recommended objects comprises:

inputting the model input data into the trained machine learning model respectively;

determining an output unit corresponding to each model input data through the machine learning model; the output unit corresponding to each model input data corresponds to the object type of the recommendation object corresponding to each model input data;

and for each model input data, outputting a user behavior prediction result corresponding to the target user and a recommended object corresponding to the model input data through a corresponding output unit in the machine learning model.

10. The method according to any one of claims 1 to 9, wherein the recommendation object includes promotion information; the user behavior prediction result comprises a popularization information viewing prediction result and/or a popularization information conversion prediction result.

11. A machine learning model training apparatus comprising:

the determining module is used for inquiring the generation time of the model training data, acquiring a time attenuation function configured corresponding to the object type to which the recommended object sample belongs, and calculating the weight of the model training data according to the generation time and the time attenuation function; for model training data of which the time interval between the generation time and the current time is less than or equal to a preset time node, the corresponding weight keeps the initial quantity not to decay; for model training data with a time interval larger than a preset time node, the corresponding weight is in negative correlation with the time interval;

the determining module is further configured to input the model training data into a machine learning model to obtain a user behavior prediction result corresponding to the user sample and the recommended object sample; constructing a training target by taking the user behavior prediction result of each model training data, the difference of the labels and the weight of the corresponding model training data as independent variables;

12. The apparatus according to claim 11, wherein the obtaining module is further configured to collect feature data of a user sample and feature data of a recommended object sample to generate model training data; when the characteristic data of the user sample comprises target behavior data of the recommended object sample, setting a label corresponding to the model training data as a positive sample training label; and when the target behavior data of the recommended object sample does not exist in the feature data of the user sample, setting the label corresponding to the model training data as a negative sample training label.

13. The apparatus of claim 12, wherein the model training data is a sample feature vector; the acquisition module is also used for collecting user behavior data and user attribute data of the user sample; collecting recommendation object content data and recommendation object attribute data of a recommendation object sample; mapping user behavior data and user attribute data of the user sample into vector elements respectively, and mapping recommendation object content data and recommendation object attribute data of the recommendation object sample into vector elements respectively; and combining the corresponding each vector element of the user sample and the corresponding each vector element of the recommendation object sample to generate a sample feature vector.

14. The apparatus of claim 11, wherein the determining module is further configured to determine a type of object to which the recommended object sample belongs; and inputting the model training data into a machine learning model, and controlling to output user behavior prediction results corresponding to the user sample and the recommended object sample through an output unit corresponding to the object type in the machine learning model.

15. The apparatus of claim 14, wherein the machine learning model comprises an input layer, an intermediate hidden layer, and an output layer; the output layer comprises output units corresponding to a plurality of object types;

and the adjusting module is further used for adjusting the model parameters of an input layer, an intermediate hidden layer and an output unit corresponding to the determined object type in the machine learning model according to the direction for optimizing the training target, and continuing training until the training stopping condition is met.

16. The apparatus of claim 11, wherein the time decay function is a piecewise function; when the time independent variable of the time attenuation function is in a time interval formed by a preset time node and a current time node; the time decay function is a constant function; and when the time independent variable of the time attenuation function is out of the time interval formed by the preset time node and the current time node, the time attenuation function is a function which is inversely related to the time independent variable.

17. The apparatus of claim 11, wherein the number of model training data is plural; the determining module is further configured to generate a corresponding loss function by using the difference between the user behavior prediction result of each model training data and the label as an independent variable; weighting and summing the loss functions according to the weight of the corresponding model training data to obtain a training target;

and the adjusting module is further used for adjusting the model parameters of the machine learning model according to the direction of minimizing the training target and continuing training until the training stopping condition is met.

18. The apparatus of claim 11, further comprising:

the using module is used for acquiring the feature data of the target user and the feature data of each candidate recommendation object; respectively generating model input data by the characteristic data of the target user and the characteristic data of each recommended object; inputting the model input data into the trained machine learning model to obtain user behavior prediction results corresponding to the target user and each recommended object; and selecting recommended objects from the candidate recommended objects for recommendation according to the user behavior prediction results.

19. The apparatus of claim 18, wherein the using module is further configured to input each of the model input data into a trained machine learning model; determining an output unit corresponding to each model input data through the machine learning model; the output unit corresponding to each model input data corresponds to the object type of the recommended object corresponding to each model input data; and for each model input data, outputting a user behavior prediction result corresponding to the target user and a recommended object corresponding to the model input data through a corresponding output unit in the machine learning model.

20. The apparatus according to any one of claims 11 to 19, wherein the recommendation object includes promotion information; the user behavior prediction result comprises a popularization information viewing prediction result and/or a popularization information conversion prediction result.

21. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 10.

22. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 10.