CN115345311A

CN115345311A - Data processing method and device for model training, electronic equipment and storage medium

Info

Publication number: CN115345311A
Application number: CN202110511932.2A
Authority: CN
Inventors: 廖一桥; 骆明楠
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-05-11
Filing date: 2021-05-11
Publication date: 2022-11-15

Abstract

The disclosure relates to a data processing method and device for model training, an electronic device and a storage medium. The method comprises the following steps: acquiring a plurality of behavior data samples of a user account, wherein the behavior data samples comprise historical behavior data samples and online behavior data samples, and the historical behavior data samples comprise all historical behavior data samples of the user account or part of historical behavior data samples extracted from all the historical behavior data samples based on sample extraction logic; determining training data and training labels corresponding to each behavior data sample; and performing online training on the online recommendation model through the training data and the training labels, wherein the online recommendation model is a model which is trained to meet the online prediction requirement, and the trained online recommendation model is used for recommending objects to the user account online. By adding historical behavior data in the training sample, the model structure and the input of an inference stage can be not changed, and the pressure of an online recommendation system is greatly reduced.

Description

Data processing method and device for model training, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a data processing method and apparatus for model training, an electronic device, a computer-readable storage medium, and a computer program product.

Background

The recommendation system may recommend objects to the client based on Click Through Rate (CTR) estimated by the model, conversion Rate (CVR), and the like. Because the historical behavior information of the user contains the information of the interest points of the user, the model continuously learns the historical behavior information of the user when the model is trained, and the recommendation accuracy of the recommendation system is improved.

In order to enable the model to learn more user historical behavior information, in the related art, a super-long behavior sequence is formed based on all the historical behavior data of the user, and the super-long behavior sequence is used as training data to train the model. Accordingly, it is also necessary to use a very long behavior sequence formed by all the historical behavior data as input in reasoning. However, the ultra-long behavior sequence is used as the input of the model during training and reasoning, which means that the online recommendation system needs to bear great pressure, and the problem of large memory consumption exists.

Disclosure of Invention

The present disclosure provides a data processing method, apparatus, electronic device, computer-readable storage medium, and computer program product for model training, so as to at least solve the problem in the related art that an overlong behavior sequence is used as an input of a model during training and reasoning, and memory consumption of an online recommendation system is large. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a data processing method for model training, including:

acquiring a plurality of behavior data samples of a user account, wherein each behavior data sample is generated by operating each associated object by the user account, the behavior data samples comprise historical behavior data samples and online behavior data samples, and the historical behavior data samples comprise all the historical behavior data samples of the user account or part of the historical behavior data samples extracted from all the historical behavior data samples based on sample extraction logic;

determining training data and training labels corresponding to each behavior data sample;

and performing online training on an online recommendation model through the training data and the training labels, wherein the online recommendation model is a model which is trained to meet the online prediction requirement, and the trained online recommendation model is used for recommending objects to the user account online.

In one embodiment, the part of the historical behavior data samples extracted from the whole historical behavior data samples by the sample-based extraction logic is obtained by performing any one of the following processes:

extracting according to the importance of the object corresponding to the historical behavior data sample to obtain the partial historical behavior data sample;

or acquiring a target object type of an object corresponding to the online behavior data sample, and extracting historical behavior data samples under the target object type from all the historical behavior data samples to serve as the part of the historical behavior data samples;

or acquiring a first similarity of the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain the partial historical behavior data sample;

or acquiring a type diversity index of an object type, and extracting according to the type diversity index to obtain the partial historical behavior data sample.

In one embodiment, the number of the extracted part of the historical behavior data samples is determined according to the training speed of the online recommendation model.

In one embodiment, the training an online recommendation model through the training data and the training labels includes:

acquiring the weight corresponding to each behavior data sample;

inputting training data corresponding to each behavior data sample into the online recommendation model to obtain a prediction result corresponding to each behavior data sample;

determining a loss value according to the prediction result, the training label and the weight corresponding to each behavior data sample;

and adjusting the model parameters of the online recommendation model according to the loss value, and continuously inputting training data corresponding to the next behavior data sample until a training stop condition is reached.

In one embodiment, when the behavior data samples are historical behavior data samples, the obtaining the weight corresponding to each behavior data sample includes:

acquiring the time difference between the timestamp in each historical behavior data sample and the current moment, and determining the weight corresponding to each historical behavior data sample according to the time difference, wherein the weight is negatively correlated with the time difference;

or determining the weight corresponding to each historical behavior data sample according to the importance of the object corresponding to each historical behavior data sample, wherein the weight is positively correlated with the importance;

or obtaining a second similarity of each historical behavior data sample and the online behavior data sample, and determining a weight corresponding to each historical behavior data sample based on the second similarity, wherein the weight is positively correlated with the second similarity;

or acquiring a type diversity index of an object type, and determining the weight corresponding to each historical behavior data sample according to the type diversity index;

or predicting according to each historical behavior data sample through a first deep learning model to obtain corresponding weight.

In one embodiment, all the historical behavior data samples are obtained by querying a first mapping table, and the first mapping table is obtained when the online recommendation model is trained offline and is updated in real time along with online training of the online recommendation model.

In one embodiment, the determining the training data and the training labels corresponding to each behavior data sample includes:

if the behavior data sample is an online behavior data sample, generating training data corresponding to the online behavior data sample according to the online behavior data sample;

acquiring an original label from the online behavior data sample as a training label corresponding to the online behavior data sample;

if the behavior data sample is a historical behavior data sample, generating training data corresponding to the historical behavior data sample according to the historical behavior data sample;

acquiring the time difference between the timestamp in the historical behavior data sample and the current moment;

and attenuating original labels in the historical behavior data samples according to the time difference to obtain training labels corresponding to the historical behavior data samples.

In one embodiment, the attenuating the original label in the historical behavior data sample according to the time difference to obtain the training label of the historical behavior data sample includes:

inquiring a second mapping table to obtain a training label corresponding to the time difference of the historical behavior data sample, wherein the second mapping table comprises the corresponding relation between the time difference and the training label;

or attenuating the original label according to the time difference through a preset attenuation function to obtain a training label of the historical behavior data sample;

or predicting according to the historical behavior data sample through a second deep learning model to obtain a training label of the historical behavior data sample.

According to a second aspect of the embodiments of the present disclosure, there is provided a data processing apparatus for model training, including:

the obtaining module is configured to perform obtaining of a plurality of behavior data samples of a user account, each behavior data sample is generated by the user account operating on each associated object, the plurality of behavior data samples include historical behavior data samples and online behavior data samples, and the historical behavior data samples include all the historical behavior data samples of the user account or partial historical behavior data samples extracted from all the historical behavior data samples based on sample extraction logic;

a training sample generation module configured to determine training data and training labels corresponding to each behavior data sample;

and the model training module is configured to perform online training on an online recommendation model through the training data and the training labels, the online recommendation model is a model which is trained to meet online prediction requirements, and the trained online recommendation model is used for recommending objects to the user account online.

In one embodiment, the apparatus further includes a sample extraction module configured to perform extraction according to importance of an object corresponding to the historical behavior data sample, so as to obtain the partial historical behavior data sample;

or acquiring a first similarity between the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain the partial historical behavior data sample;

In one embodiment, the model training module includes:

a weight obtaining unit configured to perform obtaining of a weight corresponding to each of the behavior data samples;

the prediction unit is configured to input training data corresponding to each behavior data sample into the online recommendation model to obtain a prediction result corresponding to each behavior data sample;

a loss value determination unit configured to perform determining a loss value according to the prediction result, the training label, and the weight corresponding to each behavior data sample;

and the parameter adjusting unit is configured to adjust the model parameters of the online recommendation model according to the loss value, and continue inputting training data corresponding to the next behavior data sample until a training stopping condition is reached.

In one embodiment, when the behavior data samples are historical behavior data samples, the weight obtaining unit is configured to perform obtaining of a time difference between a timestamp in each historical behavior data sample and a current time, and determine a weight corresponding to each historical behavior data sample according to the time difference, wherein the weight is negatively correlated with the time difference;

In one embodiment, all the historical behavior data samples are obtained by querying a first mapping table, where the first mapping table is obtained when the online recommendation model is trained offline, and is updated in real time along with online training of the online recommendation model.

In one embodiment, the training sample generation module includes:

the first training data generation unit is configured to execute the step of generating training data corresponding to the online behavior data sample according to the online behavior data sample if the behavior data sample is the online behavior data sample;

a first label determining unit, configured to perform obtaining of an original label from the online behavior data sample, as a training label corresponding to the online behavior data sample;

the second training data generation unit is configured to execute the step of generating training data corresponding to the historical behavior data sample according to the historical behavior data sample if the behavior data sample is the historical behavior data sample;

an obtaining unit configured to perform obtaining a time difference between a time stamp in the historical behavior data sample and a current time;

and the second label determining unit is configured to perform attenuation on an original label in the historical behavior data sample according to the time difference to obtain a training label corresponding to the historical behavior data sample.

In one embodiment, the second label determining unit is configured to perform query to obtain a training label corresponding to the time difference of the historical behavior data sample from a second mapping table, where the second mapping table includes a correspondence between the time difference and the training label;

According to a third aspect of an embodiment of the present disclosure, there is provided an electronic apparatus including:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the data processing method of model training as described in any embodiment of the first aspect.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor of an electronic device, enable the electronic device to perform the data processing method of model training as set forth in any one of the embodiments of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the data processing method of model training as set forth in any one of the embodiments of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

for the online recommendation model which is trained to meet the online prediction requirement, the online recommendation model learns the historical behavior information of the user account, but forgets the relevant information due to disastrous forgetting and the like. Therefore, by adding the historical behavior data sample in the online training process of the online recommendation model, the online recommendation model can learn the historical behavior information of the user again, and the recommendation accuracy of the online recommendation model is improved. In addition, by adding the historical behavior data samples, the pressure of an online system can be greatly reduced and the memory consumption can be reduced compared with a mode of training and reasoning based on an overlong behavior sequence in the related technology under the condition that the model structure of the online recommendation model and the input of a reasoning stage are not changed.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a diagram illustrating an application environment for a data processing method of model training, according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of data processing for model training in accordance with an exemplary embodiment.

FIG. 3 is a flowchart illustrating a model training step in accordance with an exemplary embodiment.

FIG. 4 is a flowchart illustrating the determination of training data and training labels according to an exemplary embodiment.

FIG. 5 is a flowchart illustrating a method of data processing for model training, according to an example embodiment.

FIG. 6 is a block diagram illustrating a model trained data processing apparatus according to an exemplary embodiment.

FIG. 7 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in other sequences than those illustrated or described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The data processing method for model training provided by the present disclosure may be applied to the application environment shown in fig. 1. Wherein the terminal 110 interacts with the server 120 through a network. The terminal 110 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers. An application may be installed in the terminal 110. The application may be a social-type application, a short-video-type application, an instant messaging-type application, and the like. The terminal 110 may provide various resources to a user through an application. The resource may be a picture, music, video, game, text, web page, etc. The server 120 may be deployed with an online recommendation system, and may personalize a recommendation object to a client of an application program through an online recommendation model of the online recommendation system. The online recommendation model may be any model that can be used to predict the recommendable rate of an object, such as a linear model, a neural network model, a support vector machine, a two-tower model, and the like. The object recommendable rate may be characterized using click through rate, conversion rate, and the like.

In a specific implementation, the server 120 obtains a plurality of behavior data samples of the user account, where each behavior data sample is generated by the user account operating on each associated object. The plurality of behavior data samples comprise historical behavior data samples and online behavior data samples, and the historical behavior data samples comprise all historical behavior data samples of the user account or part of the historical behavior data samples extracted from all the historical behavior data samples based on the sample extraction logic. The server 120 determines training data and training labels for each behavior data sample. And performing online training on the online recommendation model through the training data and the training labels until a training stop condition is reached. The server 120 updates the model parameters of the online recommendation model in use by using the trained model parameters, and online recommends the object to the terminal 110 by using the trained online recommendation model. The update may be a full update or an incremental update.

Fig. 2 is a flowchart illustrating a data processing method for model training according to an exemplary embodiment, and as shown in fig. 2, the data processing method for model training may be used in a server, and includes the following steps.

In step S210, a plurality of behavior data samples of the user account are obtained, where each behavior data sample is generated by the user account operating on each associated object, and the plurality of behavior data samples include historical behavior data samples and online behavior data samples, where the historical behavior data samples include all the historical behavior data samples of the user account, or some of the historical behavior data samples extracted from all the historical behavior data samples based on the sample extraction logic.

The behavior data sample refers to a sample generated by operating an object in an application program by a user account, for example, after a user clicks a video a, a behavior data sample generated by clicking the video a by the user account is generated. The behavior data sample may include, but is not limited to, user attribute information, operation behavior information, time information, object information of an operated object, and the like. The user attribute information may include a user account identification, name, gender, region, occupation, and the like. The operation behavior information may represent operation behaviors performed by the user, including, for example, clicking, non-clicking, and the like. The time information may include an operation time stamp, an operation duration, and the like. The operated object refers to an object for the user to perform an operation behavior, and the object may be a video, an article, a commodity, music, or the like. The object information may be used to represent attributes of the object, such as object name, object identification, object type, and the like.

The online behavior data sample may be a sample generated by the user account in real time at present, for example, a sample generated by the user account after logging in the account this time; or may be a sample generated in a short time, for example, a sample generated in the first 2 hours of the current time. Online behavior data samples may be collected by way of a message queue. The Message Queue may adopt Active Message Queue (Active Message Queue, an open source Message middleware), rabbitMQ (rabbitmessage Queue, an open source Message middleware implementing a high-level Message Queue protocol), kafka (a high-throughput distributed publish-subscribe Message system), and the like. When the user account generates an online behavior data sample, the server may add the online behavior data sample to the message queue. Online behavior data samples are obtained by consuming messages in a message queue.

The historical behavior data samples may refer to behavior data samples other than online behavior data samples. The historical behavior data samples may be stored by a distributed storage system or the like. The historical behavior data samples may be further divided into long-term historical behavior data samples and short-term historical behavior data samples. The long-term historical behavior data samples and the short-term historical behavior data samples may be determined over a time dimension. The time dimension may be, but is not limited to, the occurrence time, duration, etc. of the operational behavior.

Take the occurrence time of the operation behavior as an example. The long-term historical behavior data may be data generated within a first time difference; the short-term historical behavioral data may be data generated within a second time difference. The first time difference is greater than the second time difference and the first time difference is earlier than the second time difference. The manner of determination of the first time difference and the second time difference may be preconfigured. For example, the first time difference and the second time difference are configured as fixed time differences, for example, the first time difference is from the first 5 months to the first 2 weeks of the current time; the second time difference is the first 2 weeks of the current time. The first time difference and the second time difference may also be changed according to a training requirement of the online recommendation model, for example, the first time difference may be changed to be from the previous 4 months to the previous 1 month of the current time. The second time difference is the first 1 month of the current time. Further, the first time difference and the second time difference can be obtained by performing statistical analysis according to the training speed of the current online recommended model, the sample demand of the model and the like based on an algorithm.

Specifically, the behavioral data sample corresponds to an account identification of the user account. When the online recommendation model is trained online, the server can obtain online behavior data samples and historical behavior data samples of the user account according to the account identification of the user account. Wherein, the historical behavior data sample can be the whole historical behavior data sample of the user account. The total historical behavior data sample may refer to all historical behavior data samples that the user account has generated since using the application. All historical behavior data samples may be stored in a distributed storage system. The historical behavior data sample may also be a portion of the historical behavior data sample extracted from the total historical behavior data sample based on the sample extraction logic.

In step S220, training data and training labels corresponding to each behavior data sample are determined.

Where the training data may be a vectorized representation of the behavioral data samples. Specifically, the obtained behavior data samples may be raw data without any processing, and the server needs to process each behavior data sample to obtain a vectorized representation of each behavior data sample. For example, each behavior data sample may be processed by one or more combinations of MLP (multi layer Perceptron), one-hot coding (one-hot coding), embedding Lookup (embedded vector Lookup), and the like, to obtain a vectorized representation of each behavior data sample as training data.

The training labels may be determined from the operational behavior information in each behavior data sample. Exemplarily, a sample with a click behavior in the behavior data samples may be used as a positive sample, and the training label of the positive sample is set to 1; and taking the sample without the click behavior as a negative sample, and setting the training label of the negative sample to be 0.

In step S230, an online recommendation model is trained on line through the training data and the training labels, where the online recommendation model is a model that has been trained to meet the online prediction requirement, and the trained online recommendation model is used to recommend an object to the user account online.

Specifically, after each behavior data sample is processed to obtain corresponding training data and training labels, the training data of a plurality of behavior data samples may be divided to obtain a plurality of batches of training data. And inputting the training data of each behavior data sample to the online recommendation model in turn. And predicting through an online recommendation model to obtain a prediction result of each behavior data sample. And calculating the prediction result of each behavior data sample and the loss value of the corresponding training label by using a loss function. And adjusting the model parameters of the online recommendation model in the direction of reducing the loss value until the training stopping condition is reached. The training stop condition may refer to that the loss value reaches a minimum or the number of iterations reaches a preset number, and the like. And the server uses the trained model parameters to carry out full update or incremental update on the online recommendation model in use, so as to obtain the trained online recommendation model. And recommending the object to the user account on line by using the trained online recommendation model.

Further, in the subsequent online recommendation process, the server may still perform online training on the online recommendation model in real time with reference to the processes of step S210 and step S230.

Further, the process described in the above embodiment is also applicable to the offline training process of the recommendation model.

In the above data processing method for model training, for the online recommendation model trained to meet the online prediction requirement, the online recommendation model has learned the historical behavior information of the user account, but the relevant information is forgotten due to catastrophic forgetting and the like. Therefore, by adding the historical behavior data sample in the online training process of the online recommendation model, the online recommendation model can learn the historical behavior information of the user again, and the recommendation accuracy of the online recommendation model is improved. In addition, by adding the historical behavior data samples, the pressure of an online system can be greatly reduced and the memory consumption can be reduced compared with a method for training and reasoning based on an ultra-long behavior sequence in the related technology under the condition that the model structure of online recommendation and the input of a reasoning stage are not changed.

In an exemplary embodiment, all historical behavior data samples may be obtained in the following manner. In the process of performing offline training (offline training) on the online recommendation model by using the historical behavior data samples, the corresponding relation between the account identifier and the historical behavior data samples can be synchronously stored to obtain a first mapping table. The stored historical behavior data sample may include operation behavior information of the user, time information, object identification, and the like. The object identifier may be a hash value obtained by hashing, or may be an original value that is not hashed. The server can read out the corresponding object characteristics through the object identification.

Further, the first mapping table may be stored in a parameter server. The parameter server adopts a distributed memory as storage of storage parameters, so that during online training, the server can quickly retrieve all historical behavior data corresponding to the account identifier of the user account from the memory of the parameter server according to the account identifier.

Further, the first mapping table may also be stored in another device, so that the server may obtain all the historical behavior data samples by reading a HIVE (a data warehouse tool) table through SQL (Structured Query Language), or the like.

Further, the server can update new behavior data samples into the first mapping table in real time along with online training of the online recommendation model, so that accuracy and consistency of data are guaranteed.

Further, the amount of data learned due to off-line flow training may be much larger than the amount of data learned on-line. For example, the offline training is 30-day behavioral data, while the online training is real-time training. Therefore, the inference system may generate a lot of resource idleness during online training. The embodiment can perform online training by using redundant resources of the system, so that no additional operating pressure is brought to the system.

In the embodiment, the mapping table of the account identifier and the historical behavior data is established, so that the server can quickly retrieve all historical behavior data samples from the mapping table, and the efficiency of online training of the online recommendation model is improved.

In an exemplary embodiment, the portion of the historical behavior data samples extracted from the total historical behavior data samples based on the sample extraction logic is obtained by performing any one of the following processes:

(1) And extracting according to the importance of the object corresponding to the historical behavior data sample to obtain a part of historical behavior data sample.

Wherein importance may be represented from multiple dimensions. For example, represented by a heat label of an object, then a historical behavioral data sample containing objects labeled as hot can be extracted; through the interestingness characterization of the user, historical behavior data samples containing objects marked as interesting by a certain number of user accounts can be extracted.

(2) And acquiring a target object type of an object corresponding to the online behavior data sample, and extracting historical behavior data samples under the target object type from all historical behavior data samples to serve as part of historical behavior data samples.

Specifically, the target object type of the object corresponding to the online behavior data sample may include at least one. And the server extracts historical behavior data samples under the target object type from all the historical behavior data based on the target object type. Illustratively, the target object type of the object corresponding to the online behavior data sample is a game video. The server may extract a sample containing the game video ID from the entire historical behavior data sample as a partial historical behavior data sample. By extracting samples based on the object types, the online recommendation model can recommend objects which are more in line with the current interests of the users to the user accounts, and therefore the stay time of the users can be prolonged.

(3) And acquiring a first similarity of the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain a part of historical behavior data samples.

The first similarity can be represented by cosine similarity, hamming distance, mahalanobis distance, and the like. Specifically, the object characteristics of the object may be found from the object ID in the historical behavior data sample. And calculating a first similarity between the object characteristics of the online behavior data samples and the object characteristics of the historical behavior data samples, and extracting the historical behavior data samples with the highest first similarity to serve as part of the historical behavior data samples. By extracting samples based on the similarity, the online recommendation model can recommend objects more conforming to the current interest of the user to the user account, and therefore the stay time of the user can be prolonged.

(4) And acquiring a type diversity index of the object type, and extracting according to the type diversity index to obtain a part of historical behavior data samples.

The object type may be, but is not limited to, a live video, a game video, a picture, and the like. The type diversity index of the object type can be used for representing various object types to be extracted and indexes such as proportion, quantity and the like corresponding to each object type. Specifically, a plurality of recommendable object types are predefined. After the server obtains all the historical behavior data, according to indexes such as the proportion and the number corresponding to each object type, extracting partial historical behavior data samples meeting requirements from all the historical behavior data samples. Illustratively, the object types in the type diversity index include a game video and a music video, each of which is 50% in proportion. Then, the server may extract half of the preset number of historical behavior data samples containing game video IDs and half of the preset number of historical behavior data samples containing music video IDs from all the historical behavior data samples. By extracting samples based on the object types, the trained online recommendation model can recommend objects in various object types to the user account, and accordingly diversified recommendation of the objects is achieved.

In an exemplary embodiment, the number of extracted partial historical behavior data samples may be a preset number. The preset number can be adjusted according to the training speed of the online recommendation model, for example, the faster the training speed of the model is, the smaller the preset number is, so as to ensure that the training speed of the online recommendation model can meet the training requirement of the online behavior data. In specific implementation, mapping tables of preset quantity and training speed can be established in advance, so that the quantity to be extracted can be quickly obtained in a table look-up mode. Or, the required extraction quantity is predicted according to the current training speed based on a deep learning model and the like.

In this embodiment, samples meeting requirements are extracted from all historical behavior data samples based on sample extraction logic, so that the recommended model can learn historical behavior data, and the training speed of the model can be increased.

In an exemplary embodiment, as shown in fig. 3, the step S230 of training the online recommendation model by the training data and the training labels may be implemented by:

in step S310, a weight corresponding to each behavior data sample is obtained.

Wherein, the weight is used for reflecting the importance degree of the training data to the model training. In the process of training the online recommendation model on line, corresponding weight can be set for each behavior data sample, so that the online recommendation model can learn historical behavior data and the learning of the online recommendation model on the online behavior data is not influenced.

In some possible embodiments, when the behavior data sample is an online behavior data sample, the weight of the online behavior data sample may be set to 1 by default. When the behavior data samples are historical behavior data samples, the weight corresponding to each historical behavior data sample can be obtained by any one of the following modes:

(1) The weight may be determined from the time difference between the timestamp in each historical behavior data sample and the current time. The weight is inversely related to the time difference, i.e. the longer the time difference, the smaller the weight. The implementation of the training labels that can specifically refer to the above historical behavior data samples is not specifically set forth herein. By determining the weight of each historical behavior data sample from the time dimension, the online recommendation model can learn more knowledge from the historical behavior data samples which are closer to the current time, and thus the online recommendation model can recommend objects which are more in line with the current interest of the user to the user account.

(2) And determining the weight corresponding to each historical behavior data sample according to the importance of the object corresponding to each historical behavior data sample, wherein the weight is positively correlated with the importance.

The definition of the importance degree may refer to the above embodiments, which are not specifically described herein. The importance levels may be predefined and a corresponding weight may be set for each importance level, the higher the importance level, the greater the weight. Specifically, the importance level of the historical behavior data sample is determined according to the importance of the object corresponding to each historical behavior data sample. And further acquiring the weight corresponding to the importance level as the weight of each historical behavior data sample. For example, the trending level is higher than the level labeled as of interest, and the level labeled as of interest is higher than the level not labeled as of interest. The hit weight is 0.9, the weight labeled as interesting is 0.7, and the rank not labeled as interesting is 0.5. If the importance of a certain historical behavior data sample is hot, the weight of the historical behavior data sample is 0.9.

(3) And obtaining a second similarity of each historical behavior data sample and the online behavior data sample, and determining a weight corresponding to each historical behavior data sample based on the second similarity, wherein the weight is positively correlated with the second similarity.

The second similarity can be represented by cosine similarity, hamming distance, mahalanobis distance, and the like. Specifically, object characteristics of the object can be found according to the object ID in the historical behavior data sample. And calculating a second similarity of the object characteristics of the online behavior data sample and the object characteristics of the historical behavior data sample. The server may configure the historical behavior data samples with higher second likelihoods with higher weights. For example, a corresponding relationship between a second similarity and a weight is established in advance, the second similarity is 0.9 to 1, and the weight is 0.9; the second similarity is 0.8-0.9, and the weight is 0.8; and so on. If the similarity of the acquired historical behavior data samples is 0.95, the weight can be obtained as 0.9.

(3) And acquiring a type diversity index of the object type, and determining the weight corresponding to each historical behavior data sample according to the type diversity index.

The type diversity index may be used to indicate various object types to be extracted, and indexes such as a weight corresponding to each object type. Specifically, after the server obtains the historical behavior data samples, the weight corresponding to each historical behavior data sample may be determined according to the object type to which the object corresponding to each historical behavior data sample belongs. For example, the weight corresponding to a game video is set to 0.9, the weight corresponding to a music video is set to 0.5, and the weight corresponding to a live video is set to 0.3. And when the object corresponding to a certain historical behavior data sample belongs to the game video, determining that the weight of the historical behavior data sample is 0.9.

(4) And predicting according to each historical behavior data sample through the first deep learning model to obtain corresponding weight.

The first deep learning model may be any model capable of predicting a weight, such as a linear model, a neural network model, a support vector machine, and the like. The first deep learning model can be an off-line model or an on-line model, and can be trained together with an on-line recommendation system model to learn the association relationship between the historical behavior data sample and the weight. In the training process, the cross entropy loss function is used for reversely transmitting the gradient to finish the training. After the historical behavior data sample is obtained, the relevant features, the time difference and the like in the historical behavior data sample can be used as input data, and the weights can be obtained through prediction by the first deep learning model.

In step S320, the training data corresponding to each behavior data sample is input to the online recommendation model, so as to obtain a prediction result corresponding to each behavior data sample.

In step S330, a loss value is determined according to the prediction result, the training label and the weight corresponding to each behavior data sample.

In step S340, the model parameters of the online recommendation model are adjusted according to the loss value, and the training data corresponding to the next behavior data sample is continuously input until the training stop condition is reached.

Specifically, after each behavior data sample is processed to obtain corresponding training data, training labels and weights, the training data of a plurality of behavior data samples can be randomly divided to obtain a plurality of batches of training data. And inputting each training data to the online recommendation model in sequence to obtain a corresponding prediction result. And calculating a loss value according to the prediction result, the training label and the weight of each behavior data sample by using a loss function with a weight coefficient. And adjusting the model parameters in the direction of reducing the loss value until obtaining a training stopping condition to obtain a trained online recommendation model.

In the embodiment, the corresponding weight is set for each behavior data sample, so that the online recommendation model can learn historical behavior data, the learning of the online recommendation model on the online behavior data is not influenced, the prediction accuracy of the model is improved, and the recommendation effect is improved.

In an exemplary embodiment, as shown in fig. 4, in step S220, determining training data and training labels corresponding to each behavior data sample may be implemented by:

in step S410, if the behavior data sample is an online behavior data sample, training data corresponding to the online behavior data sample is generated according to the online behavior data sample.

Specifically, for online behavior data samples, the server processes each online behavior data sample to obtain a vectorized representation of each online behavior data sample. For example, each online behavior data sample may be processed in one or more combination manners of MLP, one-hot coding, embedding Lookup, and the like, so as to obtain a vectorized representation of each online behavior data sample, which is used as training data corresponding to each online behavior data sample.

In step S420, an original label is obtained from the online behavior data sample as a training label corresponding to the online behavior data sample.

Wherein the original label can be determined according to the operation behavior information in each online behavior data sample. For example, the original tag with click behavior in each online behavior data sample may be set to 1; the original tag with no click behavior is set to 0. For the online behavior data sample, the original label in the online behavior data can be used as a training label.

In step S430, if the behavior data sample is a historical behavior data sample, training data corresponding to the historical behavior data sample is generated according to the historical behavior data sample.

Specifically, for the historical behavior data samples, the server processes each historical behavior data sample to obtain a vectorized representation of each historical behavior data sample. For example, each historical behavior data sample may be processed in one or more combination of MLP, one-hot coding, embedding Lookup, and the like, so as to obtain a vectorized representation of each historical behavior data sample, which is used as training data corresponding to each historical behavior data sample.

In step S440, the time difference between the time stamp in the historical behavior data sample and the current time is obtained.

In step S450, the original label in the historical behavior data sample is attenuated according to the time difference, so as to obtain a training label corresponding to the historical behavior data sample.

In particular, for historical behavioral data samples, the user does not necessarily click, given that the user's interests may evolve over time, e.g., a short video that the user clicked a year ago is now recommended to the user. Therefore, the original label can be attenuated according to the time difference between the timestamp in the historical behavior data sample and the current time, and the attenuated original label is used as the training label of each historical behavior data sample.

In the embodiment, the training labels of the behavior data samples are obtained based on the generation time of the behavior data samples, so that the model parameters of the online recommendation model can be adjusted to the direction more conforming to the current interest of the user in the training process, the prediction precision of the model can be improved, and the recommendation effect can be improved.

In an exemplary embodiment, in step S450, attenuating the original label in the historical behavior data sample according to the time difference to obtain a training label of the historical behavior data sample, which may be implemented in any one of the following manners:

(1) And querying the second mapping table to obtain training labels corresponding to the time differences of the historical behavior data samples.

Specifically, the second mapping table includes a correspondence between the time difference and the training label. The server can inquire the training labels corresponding to the time differences of the historical behavior data samples from a preset second mapping table. Illustratively, the time difference defined in the second mapping table is 15 days to 1 month, and the original label decay of the positive sample is 0.9; the time difference is 1 month to 2 months, and the original label attenuation of the positive sample is 0.85; the time difference is 2 months to 3 months, the original label attenuation of the positive sample is 0.8, and so on until the original label attenuation reaches 0. If the time difference of the historical behavior data samples is 20 days, the training label can be obtained to be 0.9. By determining the training labels corresponding to the historical behavior data samples in the form of the mapping table, the determination logic of the training labels can be simplified, and the model training speed can be improved.

(2) And attenuating the original label according to the time difference through a preset attenuation function to obtain a training label of the historical behavior data sample.

Specifically, the decay function may be any one of a linear function, an exponential function, a gaussian function, and the like, and may be determined by performing a plurality of experimental analyses. After the time difference of each historical behavior data sample is obtained, a training label corresponding to the time difference can be obtained through calculation according to the attenuation function. The training label corresponding to each historical behavior data sample is obtained by using the attenuation function, so that the accuracy of the training label is ensured.

(3) And predicting according to the historical behavior data sample through a second deep learning model to obtain a training label of the historical behavior data sample.

The second deep learning model may be any model capable of predicting the training labels, such as a linear model, a neural network model, a support vector machine, and the like. The second degree learning model may be an offline model or an online model, and may be trained together with the recommendation system model to learn the correlation between the time difference and the decay rate. In the training process, the cross entropy loss function is used for reversely transmitting the gradient to finish the training. After the historical behavior data samples are obtained, the relevant features, the time difference, the original labels and the like in each historical behavior data sample can be used as input data, and the training labels are obtained through prediction by the second deep learning model. The training labels of the historical behavior data samples are obtained based on the deep learning model, and the accurate training labels can be obtained by combining the priori knowledge prediction learned by the deep learning model, so that the model performance is improved.

Fig. 5 is a flowchart illustrating a data processing method for model training according to an exemplary embodiment, and as shown in fig. 5, the data processing method for model training may be used in a server, and includes the following steps.

In step S502, several behavioral data samples of the user account are obtained. Wherein the behavior data samples comprise historical behavior data samples and online behavior data samples. The specific manner of obtaining the historical behavior data samples and the online behavior data samples may refer to the above embodiments, and is not specifically described herein.

In step S504, for each online behavior data sample, the training data and the training label corresponding to each online behavior data sample may be generated with reference to the above-described embodiment.

In step S506, for each historical behavior data sample, the training data and the training label corresponding to each historical behavior data sample may be generated with reference to the above-described embodiment.

In step S508, for each historical behavior data sample, the weight of each historical behavior data sample may be obtained with reference to the above-described embodiment.

In step S510, the training data corresponding to each behavior data sample is input to the online recommendation model, so as to obtain a prediction result corresponding to each behavior data sample.

In step S512, a loss value corresponding to each behavior data sample is calculated based on the weight corresponding to each behavior data sample, the prediction result, and the training label through a loss function with a weight coefficient.

In step S514, the model parameters of the online recommendation model are adjusted in the direction of decreasing loss value until the training stop condition is reached.

It should be understood that, although the steps in the above-described flowcharts are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the above-mentioned flowcharts may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or the stages is not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a part of the steps or the stages in other steps.

FIG. 6 is a block diagram illustrating a data processing apparatus 600 for model training in accordance with an exemplary embodiment. Referring to fig. 6, the apparatus includes an obtaining module 602, a training sample generating module 604, and a model training module 606.

The obtaining module 602 is configured to perform obtaining of a plurality of behavior data samples of the user account, where each behavior data sample is generated by the user account operating on each associated object, and the plurality of behavior data samples include historical behavior data samples and online behavior data samples, where the historical behavior data samples include all the historical behavior data samples of the user account, or a part of the historical behavior data samples extracted from all the historical behavior data samples based on sample extraction logic; a training sample generation module 604 configured to perform determining training data and training labels corresponding to each behavior data sample; and the model training module 606 is configured to perform online training on an online recommendation model through the training data and the training labels, the online recommendation model is a model which is trained to meet the online prediction requirement, and the trained online recommendation model is used for recommending objects to the user account online.

In one embodiment, the device further comprises a sample extraction module configured to perform extraction according to the importance of the object corresponding to the historical behavior data sample to obtain a part of the historical behavior data sample;

or acquiring a target object type of an object corresponding to the online behavior data sample, and extracting historical behavior data samples under the target object type from all historical behavior data samples to serve as part of the historical behavior data samples;

or acquiring a first similarity between the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain a part of historical behavior data sample;

or acquiring a type diversity index of the object type, and extracting according to the type diversity index to obtain a part of historical behavior data samples.

In an exemplary embodiment, the number of extracted partial historical behavior data samples is determined according to the training speed of the online recommendation model.

In an exemplary embodiment, the model training module 606 includes: a weight obtaining unit configured to perform obtaining of a weight corresponding to each behavior data sample; the prediction unit is configured to input training data corresponding to each behavior data sample into the online recommendation model to obtain a prediction result corresponding to each behavior data sample; a loss value determination unit configured to perform determining a loss value according to the prediction result, the training label and the weight corresponding to each behavior data sample; and the parameter adjusting unit is configured to adjust the model parameters of the online recommendation model according to the loss values, and continue inputting training data corresponding to the next behavior data sample until a training stopping condition is reached.

In an exemplary embodiment, when the behavior data samples are historical behavior data samples, the weight obtaining unit is configured to perform obtaining a time difference between a timestamp in each historical behavior data sample and a current time, determine a weight corresponding to each historical behavior data sample according to the time difference, wherein the weight is negatively correlated with the time difference; or determining the weight corresponding to each historical behavior data sample according to the importance of the object corresponding to each historical behavior data sample, wherein the weight is positively correlated with the importance; or acquiring a second similarity of each historical behavior data sample and the online behavior data sample, and determining the weight corresponding to each historical behavior data sample based on the second similarity, wherein the weight is positively correlated with the second similarity; or acquiring a type diversity index of the object type, and determining the weight corresponding to each historical behavior data sample according to the type diversity index; or predicting according to each historical behavior data sample through the first deep learning model to obtain the corresponding weight.

In an exemplary embodiment, all the historical behavior data samples are obtained by querying from a first mapping table, and the first mapping table is obtained when the online recommendation model is trained offline and is updated in real time along with online training of the online recommendation model.

In an exemplary embodiment, the training sample generation module 604 includes: the first training data generation unit is configured to execute the step of generating training data corresponding to the online behavior data sample according to the online behavior data sample if the behavior data sample is the online behavior data sample; the first label determining unit is configured to obtain an original label from the online behavior data sample as a training label corresponding to the online behavior data sample; the second training data generation unit is configured to execute generating training data corresponding to the historical behavior data samples according to the historical behavior data samples if the behavior data samples are the historical behavior data samples; an acquisition unit configured to perform acquisition of a time difference between a time stamp in the historical behavior data sample and a current time; and the second label determining unit is configured to perform attenuation on the original labels in the historical behavior data samples according to the time difference to obtain training labels corresponding to the historical behavior data samples.

In an exemplary embodiment, the second label determining unit is configured to perform query to obtain a training label corresponding to a time difference of the historical behavior data sample from a second mapping table, where the second mapping table includes a correspondence between the time difference and the training label; or attenuating the original label according to the time difference through a preset attenuation function to obtain a training label of the historical behavior data sample; or predicting according to the historical behavior data sample through the second deep learning model to obtain a training label of the historical behavior data sample.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 7 is a block diagram illustrating an electronic device S00 for model training in accordance with an exemplary embodiment. For example, the electronic device S00 may be a server. Referring to fig. 7, the electronic device S00 comprises a processing component S20, which further comprises one or more processors, and memory resources, represented by memory S22, for storing instructions, e.g. application programs, executable by the processing component S20. The application program stored in the memory S22 may include one or more modules each corresponding to a set of instructions. Furthermore, the processing component S20 is configured to execute instructions to perform the above-described data processing method of model training.

The electronic device S00 may also include a power supply component S24 configured to perform power management of the electronic device S00, a wired or wireless network interface S26 configured to connect the electronic device S00 to a network, and an input/output (I/O) interface S28. The electronic device S00 may operate based on an operating system stored in the memory S22, such as Windows Server, mac OS X, unix, linux, freeBSD, or the like.

In an exemplary embodiment, there is also provided a computer readable storage medium comprising instructions, such as the memory S22 comprising instructions, executable by the processor of the electronic device S00 to perform the above method. The storage medium may be a computer-readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which comprises a computer program, which when executed by a processor implements the data processing method of model training as described in any of the above embodiments.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A data processing method for model training is characterized by comprising the following steps:

2. The model-trained data processing method according to claim 1, wherein the portion of the historical behavior data samples extracted from the entire historical behavior data samples based on the sample extraction logic is obtained by performing any one of the following processes:

3. The model training data processing method according to claim 2, wherein the number of the extracted partial historical behavior data samples is determined according to the training speed of the online recommendation model.

4. The data processing method for model training according to any one of claims 1 to 3, wherein the training of the online recommendation model by the training data and the training labels comprises:

acquiring the weight corresponding to each behavior data sample;

5. The data processing method for model training according to claim 4, wherein when the behavior data samples are historical behavior data samples, the obtaining the weight corresponding to each behavior data sample comprises:

6. The model training data processing method according to claim 1, wherein all the historical behavior data samples are obtained by querying a first mapping table, and the first mapping table is obtained by performing offline training on the online recommendation model and is updated in real time along with the online training of the online recommendation model.

7. A data processing apparatus for model training, comprising:

the obtaining module is configured to obtain a plurality of behavior data samples of a user account, wherein each behavior data sample is generated by the user account through operation on each associated object, the plurality of behavior data samples comprise historical behavior data samples and online behavior data samples, the historical behavior data samples comprise all historical behavior data samples of the user account, or part of the historical behavior data samples are extracted from all the historical behavior data samples based on sample extraction logic;

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the data processing method of model training of any one of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the model-trained data processing method of any one of claims 1 to 6.

10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the data processing method of model training of any one of claims 1 to 6.