CN109902849B

CN109902849B - User behavior prediction method and device, and behavior prediction model training method and device

Info

Publication number: CN109902849B
Application number: CN201810636443.8A
Authority: CN
Inventors: 唐睿明; 钮敏哲; 曲彦儒; 张伟楠; 俞勇
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-06-20
Filing date: 2018-06-20
Publication date: 2021-11-30
Anticipated expiration: 2038-06-20
Also published as: EP3690768A1; WO2019242331A1; CN109902849A; US20200242450A1; US11531867B2; EP3690768A4

Abstract

The application provides a user behavior prediction method and device and a behavior prediction model training method and device, and belongs to the field of big data processing. After behavior prediction information including a plurality of feature data is obtained, a first contribution value of each feature data to a specified behavior can be obtained respectively, and for every N feature data in the plurality of feature data, a corresponding feature interaction model can be adopted to process every N feature data to obtain a second contribution value of every N feature data to the specified behavior. And finally, determining the execution probability of the specified behavior according to the acquired first contribution value and the acquired second contribution value. According to the method, when the specified behavior is predicted, the interactive influence of a plurality of feature data on the specified behavior is considered, so that the accuracy of behavior prediction is effectively improved.

Description

User behavior prediction method and device, and behavior prediction model training method and device

Technical Field

The application relates to the field of big data processing, in particular to a user behavior prediction method and device and a behavior prediction model training method and device.

Background

The user behavior prediction is a technology for predicting user behaviors according to behavior prediction information (such as user attribute data, current environment data, attribute data of an execution object of a behavior, and the like), and the user behavior prediction technology is widely applied to the fields of personalized recommendation, accurate advertisement delivery, and the like.

In the related art, a Linear Regression model (LR model) is generally used to predict user behavior. For behavior prediction information of a certain specified behavior, the LR model may calculate a contribution value of each feature data in the behavior prediction information to the specified behavior, and then accumulate the contribution values of each feature data to the specified behavior, so as to obtain a probability that the user executes the specified behavior. The contribution value can be used to indicate the degree of influence of the feature data on the user to perform the specified behavior, and the magnitude of the contribution value is positively correlated with the magnitude of the degree of influence.

However, the LR model in the related art only considers the influence degree of each feature data on the specified behavior individually, and the accuracy of the user behavior prediction method is low.

Disclosure of Invention

The application provides a user behavior prediction method and device and a behavior prediction model training method and device, and can solve the problem that a behavior prediction method in the related art is low in accuracy.

In one aspect, a user behavior prediction method is provided, and the method may include: behavior prediction information for predicting a specified behavior is acquired, the behavior prediction information may include a plurality of feature data, and categories to which any two feature data belong are different. Then, a first contribution value of each of the plurality of feature data to the specified behavior may be obtained, where the first contribution value is used to indicate a degree of influence on the execution of the specified behavior, and a magnitude of the first contribution value is positively correlated with a magnitude of the degree of influence. For every N pieces of feature data in the plurality of feature data, a corresponding one of the feature interaction models may be used for processing, so as to obtain a second contribution value of every N pieces of feature data to the specified behavior, where N is an integer greater than 1, and the one of the feature interaction models corresponding to any N pieces of feature data is determined by N categories to which the any N pieces of feature data belong, the second contribution value is used for indicating a degree of influence on execution of the specified behavior, and a magnitude of the second contribution value is positively correlated with a magnitude of the degree of influence. Finally, the execution probability of the specified behavior may be determined according to the obtained first contribution value of each piece of feature data and the obtained second contribution value of each piece of N pieces of feature data.

When the user behavior prediction method provided by the application is used for predicting the specified behavior, the interactive influence of a plurality of characteristic data on the specified behavior is considered, so that the accuracy of behavior prediction can be effectively improved. Moreover, because the feature interaction model corresponding to each N feature data is determined based on the category to which the N feature data belong, namely each N category corresponds to one feature interaction model, the problem that the prediction result is poor due to the fact that all feature data are processed by the same feature interaction model can be solved, and the problem that the calculation complexity is too high due to the fact that each N feature data are processed by independent feature interaction models can be solved. Namely, the behavior prediction method provided by the application can obtain a good prediction effect with low calculation complexity.

Optionally, the determining, according to the obtained first contribution value of each piece of feature data and the obtained second contribution value of each piece of feature data, the execution probability of the designated behavior may include:

and determining a first comprehensive contribution value according to the first contribution value of each acquired feature data, and determining a second comprehensive contribution value according to the second contribution value of each acquired N feature data. And finally, weighting and summing the first comprehensive contribution value and the second comprehensive contribution value by adopting a preset weight value to obtain the execution probability.

The preset weight value can be obtained by training the training sample data in advance, and the two comprehensive contribution values are subjected to weighted summation by adopting the preset weight value, so that the influence of the independent characteristic data on the specified behavior and the interactive influence of a plurality of characteristic data on the specified behavior can be well balanced, and the prediction effect of behavior prediction is ensured.

Optionally, the process of determining the second comprehensive contribution value according to the second contribution value of each acquired N pieces of feature data may include:

and directly summing the second contribution values of every N acquired feature data to obtain the second comprehensive contribution value. The method for obtaining the second comprehensive contribution value is simple and has low calculation complexity.

Alternatively, the second contribution values of every N acquired feature data may be input to the neural network, and the output of the neural network may be used as the second comprehensive contribution value. The neural network may be a multilayer neural network, and the weights and biases between the neurons may be obtained by training sample data in advance. The neural network obtained by pre-training is adopted to obtain the second comprehensive contribution value, so that the accuracy of the obtained second comprehensive contribution value can be ensured, and the prediction effect of behavior prediction can be further ensured.

Optionally, the process of determining the first comprehensive contribution value according to the first contribution value of each acquired feature data may include:

and summing the first contribution value of each acquired feature data and the reference contribution value to obtain the first comprehensive contribution value.

The reference contribution value may be obtained by training sample data in advance.

Optionally, before obtaining the first contribution value of each of the plurality of feature data to the specified behavior, the method may further include:

and determining the feature identifier of each feature data in the plurality of feature data according to the corresponding relationship between the feature data and the feature identifier, wherein the feature identifier can be a code word or a vector meeting the requirements of a preset format. Because the data formats of the feature data in the behavior prediction information may be different, the feature identifier of each feature data is obtained first, and each feature data can be converted into the feature identifier with a uniform format, so that the subsequent data processing is facilitated, and the behavior prediction efficiency is improved.

Accordingly, the process of obtaining the first contribution value of each of the plurality of feature data to the specified behavior may include:

and respectively determining a first contribution value corresponding to the feature identifier of each feature data in the plurality of feature data according to the corresponding relation between the feature identifier and the contribution value.

The corresponding relation between the feature identifier and the contribution value can be obtained by training the training sample data, and the first contribution value of each feature data is directly obtained based on the corresponding relation, so that the efficiency is high.

Optionally, for every N pieces of feature data in the plurality of feature data, processing the N pieces of feature data by using a corresponding feature interaction model, and obtaining a second contribution value of every N pieces of feature data to the specified behavior may include:

respectively acquiring a feature vector corresponding to the feature identifier of each feature data in each N pieces of feature data, wherein the length of the feature vector corresponding to each feature identifier is equal; and then, processing the obtained N characteristic vectors by adopting a characteristic interaction model corresponding to the N categories to which the N characteristic data belong to, so as to obtain a second contribution value of the N characteristic data to the specified behavior.

Since the lengths of the feature identifiers corresponding to different feature data may be different, and the lengths of the feature identifiers of some categories (e.g., city, time, temperature, etc.) of feature data may be too long, the effective information is less. Therefore, the characteristic identification of each characteristic data is converted into a characteristic vector with uniform length and then processed, and the data processing efficiency can be improved.

Optionally, before processing every N feature data in the plurality of feature data by using a corresponding feature interaction model, the method may further include:

for every N pieces of feature data in the plurality of feature data, determining a corresponding feature interaction model from the corresponding relation between the feature interaction model and the category. The corresponding relationship comprises a plurality of feature interaction models, each feature interaction model corresponds to N categories, and the categories corresponding to any two feature interaction models are different.

Optionally, the first contribution value, the second contribution value and the execution probability may all be obtained by a behavior prediction model, and the method may further include the following training process:

the method comprises the steps of obtaining training sample data, wherein the training sample data can comprise a plurality of sample characteristic data and behavior labels of sample behaviors, the types of any two sample characteristic data are different, and the behavior labels are used for indicating whether a user executes the sample behaviors or not. And then, acquiring a first reference contribution value of each sample feature data in the plurality of sample feature data to the sample behavior, and processing every N sample feature data in the plurality of sample feature data by adopting a corresponding feature interaction model to obtain a second reference contribution value of every N sample feature data to the sample behavior. Further, the execution probability of the sample behavior is determined according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of each N sample feature data. And finally, according to the difference between the execution probability of the sample behavior and the behavior label, adjusting the model parameters of the behavior prediction model and continuing training until the training stopping condition is met, and finishing the training to obtain the behavior prediction model after the model parameters are adjusted.

The adjusted model parameters of the behavior prediction model may include model parameters of each feature interaction model, and may further include a first reference contribution value corresponding to each sample feature data.

In this application, the behavior prediction model obtained by training in the training process may include a plurality of feature interaction models, and each feature interaction model may correspond to N categories. When the behavior prediction model is adopted to predict the behavior, the interactive influence of the N characteristic data on the specified behavior can be considered, so that the prediction accuracy can be effectively improved.

Optionally, determining, according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of each N sample feature data, the process of obtaining the execution probability of the sample behavior may include:

summing the obtained first reference contribution value and the reference contribution value of each sample feature data to obtain a first reference comprehensive contribution value; inputting the second reference contribution value of each acquired sample characteristic data into a neural network, and taking the output of the neural network as a second reference comprehensive contribution value; and finally, according to a preset weight value, carrying out weighted summation on the first reference comprehensive contribution value and the second reference comprehensive contribution value to obtain the execution probability of the sample behavior.

Accordingly, the adjusted model parameters of the behavior prediction model may further include at least: the reference contribution value, the weights and biases between neurons in the neural network, and the preset weight value.

Optionally, the feature interaction model may include: a kernel function, which may be in the form of a vector, a matrix, or a functional. The kernel function has various forms and high flexibility when the behavior prediction model is modeled.

Optionally, N may be 2, that is, for every two feature data, a feature interaction model corresponding to two categories to which the two feature data belong may be adopted for processing, so that a prediction effect of behavior prediction may be effectively ensured.

Optionally, the behavior prediction information may include: user attribute data, current environment data, and attribute data of an execution object of the specified behavior.

The user attribute data is characteristic data for describing user attributes, and may include characteristic data of multiple categories such as gender, age, occupation, and the like. The current environment data is characteristic data for describing an environment state at the behavior prediction time, and may include a plurality of categories of characteristic data such as time, place, and weather. The attribute data of the execution object of the specified behavior is feature data for describing the attribute of the execution object, and may include feature data of a plurality of categories such as a brand and a type of the execution object.

In another aspect, a behavior prediction model training method is provided, where the behavior prediction model includes a plurality of feature interaction models, each of the feature interaction models corresponds to N categories, the categories corresponding to any two of the feature interaction models are different, and the category is a category of sample feature data in training sample data. The training method can comprise the following steps:

the method comprises the steps of obtaining training sample data, wherein the training sample data comprise a plurality of sample characteristic data and behavior labels of sample behaviors, the types of any two sample characteristic data are different, and the behavior labels are used for indicating whether a user executes the sample behaviors or not. A first reference contribution value of each sample feature data in the plurality of sample feature data to the sample behavior may be obtained, where the first reference contribution value is used to indicate a degree of influence on executing the sample behavior, and a magnitude of the first reference contribution value is positively correlated with a magnitude of the degree of influence. For every N sample feature data in the plurality of sample feature data, processing by using a corresponding feature interaction model to obtain a second reference contribution value of every N sample feature data to the sample behavior, wherein the second reference contribution value is used for indicating the degree of influence on execution of the sample behavior, and the magnitude of the second reference contribution value is positively correlated with the magnitude of the degree of influence. Further, the execution probability of the sample behavior may be determined according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of each N sample feature data. Finally, according to the difference between the execution probability of the sample behavior and the behavior label, the model parameters of the behavior prediction model are adjusted and training is continued until the training stopping condition is met, and the behavior prediction model after the model parameters are adjusted is obtained.

The behavior prediction model obtained by training the model training method provided by the application can comprise a plurality of feature interaction models, and each feature interaction model can correspond to N categories. When the behavior prediction model is adopted to predict the behavior, the interactive influence of the N characteristic data on the specified behavior can be considered, so that the prediction accuracy can be effectively improved.

In addition, when model training is carried out, each N categories correspond to one feature interaction model, so that the problem that the training result is poor due to the fact that all sample feature data are processed by the same feature interaction model can be solved, and the problem that the training calculation complexity is too high due to the fact that each N sample feature data are processed by the independent feature interaction models can be solved. Namely, the training method provided by the embodiment of the invention can obtain a better training effect with lower computation complexity.

Optionally, the determining, according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of each N sample feature data, the execution probability of the sample behavior may include:

and summing the acquired first reference contribution value and the reference contribution value of each sample feature data to obtain a first reference comprehensive contribution value, inputting the acquired second reference contribution value of each sample feature data into the neural network, and taking the output of the neural network as a second reference comprehensive contribution value. Then, the first reference comprehensive contribution value and the second reference comprehensive contribution value may be weighted and summed according to a preset weight value, so as to obtain an execution probability of the sample behavior.

Accordingly, the model parameters adjusted according to the difference may further include at least: the reference contribution value, the weights and biases between neurons in the neural network, and the preset weight value.

In yet another aspect, a user behavior prediction apparatus is provided, which may include at least one module, and the at least one module may be configured to implement the user behavior prediction method according to the above aspect.

In yet another aspect, an apparatus for training a behavior prediction model is provided, and the apparatus may include at least one module, and the at least one module may be configured to implement the behavior prediction model training method according to the above aspect.

In still another aspect, a server is provided, which may include: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for predicting user behavior as described in the above aspect or the method for training a behavior prediction model as described in the above aspect when executing the computer program.

In yet another aspect, a computer-readable storage medium is provided, having instructions stored therein, which, when run on a computer, cause the computer to perform a method of predicting user behavior as described in the above aspect, or a method of training a behavior prediction model as described in the above aspect.

In a further aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of predicting user behaviour as described in the above aspect, or a method of training a behaviour prediction model as described in the above aspect.

The beneficial effects that technical scheme that this application provided brought can include at least:

when the execution probability of the specified behavior is predicted according to the acquired behavior prediction information, the first contribution value of each feature data to the specified behavior can be respectively calculated, the second contribution values of the N feature data to the specified behavior can be calculated according to the feature interaction model, and the interaction influence of the plurality of feature data on the specified behavior is considered, so that the accuracy of behavior prediction is effectively improved. Moreover, because the feature interaction model corresponding to each N feature data is determined based on the category to which the N feature data belong, namely each N category corresponds to one feature interaction model, the problem that the prediction result is poor due to the fact that all feature data are processed by the same feature interaction model can be solved, and the problem that the calculation complexity is too high due to the fact that each N feature data are processed by independent feature interaction models can be solved. Namely, the technical scheme provided by the application can obtain a better prediction effect with lower calculation complexity.

Drawings

Fig. 1 is a schematic structural diagram of an object pushing system according to an embodiment of the present invention;

fig. 2 is a flowchart of a user behavior prediction method according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a behavior prediction model according to an embodiment of the present invention;

FIG. 4 is an architecture diagram of a behavior prediction model provided by an embodiment of the present invention;

FIG. 5 is a diagram illustrating a structure of a kernel function according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of another kernel function provided in the embodiment of the present invention;

FIG. 7 is a diagram illustrating a structure of another kernel function according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a neural network according to an embodiment of the present invention;

FIG. 9 is a flowchart of a method for training a behavior prediction model according to an embodiment of the present invention;

FIG. 10 is a flowchart of a method for determining execution probabilities of sample behaviors provided by an embodiment of the present invention;

fig. 11 is a schematic structural diagram of a user behavior prediction apparatus according to an embodiment of the present invention;

FIG. 12 is a block diagram of a first determining module according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of another user behavior prediction apparatus according to an embodiment of the present invention;

FIG. 14 is a schematic structural diagram of a behavior prediction model training apparatus according to an embodiment of the present invention;

fig. 15 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The user behavior prediction method provided by the embodiment of the invention can be applied to an object pushing system, and the object pushing system can be deployed in a server. The server is in communication connection with a plurality of terminals through a wired network or a wireless network, and can push objects such as application programs, news or advertisements to users of each terminal. The server may be a server, a server cluster composed of a plurality of servers, or a cloud computing service center. The terminal can be a smart phone, a desktop computer, a notebook computer, a tablet computer, a wearable device or the like.

Optionally, the object pushing system may include a personalized recommendation system, an advertisement accurate delivery system, and the like. The personalized recommendation system can recommend objects such as application programs and media contents (such as videos, news or music) to the user of the terminal. The advertisement accurate delivery system can push advertisements to the users of the terminals. A good object push system not only affects the user experience, but also more directly affects the revenue of the object providers (e.g., application developers and content providers).

As shown in fig. 1, an object pushing system provided by the embodiment of the present invention may include a learning module 10, a behavior prediction model 20, and a pushing model 30. The learning module 10, the behavior prediction model 20, and the push model 30 may be deployed in the same server, or may be deployed in different servers, for example, the learning module 10 and the behavior prediction model 20 may be deployed in a training server, and the push model 30 may be deployed in a background server of an object provider. The learning module 10 is configured to obtain historical behavior information of the user recorded in the log file 40, use the historical behavior information as training sample data, and train the training sample data by using a machine learning algorithm. After the training is completed, the learning module 10 may update the model parameters in the behavior prediction model 20 based on the training result. The behavior prediction model 20 may determine, according to the behavior prediction information obtained for predicting the specified behavior, a probability that the user executes the specified behavior, and send the determined probability to the push model 30. The push model 30 may rank the probabilities of the user performing the respective specified behaviors determined by the behavior prediction model 20, and push the execution object of the specified behavior with higher probability to the user.

The log file 40 may be a network (Web) log, a log obtained by a packet sniffer, a log obtained by a buried point technology, or a log obtained in other manners, which is not limited in the embodiment of the present invention. The behavior prediction information for predicting the specified behavior may include: user attribute data (e.g., gender, age, occupation, academic calendar, etc. of the user), attribute data of the execution object of the specified behavior (e.g., type of application, release time, brand, etc.), and current environmental data (e.g., time, weather, temperature, location, etc.), etc.

For example, it is assumed that the object pushing system is a personalized recommendation system for pushing an application program, and the personalized recommendation system is deployed in a background server of an application market. When the user opens the application program installed in the mobile phone: when the application market is applied, the application market can send a recommendation request to a background server. After receiving the recommendation request, the background server may respectively obtain behavior prediction information for predicting a behavior of the user for downloading each candidate application program for the plurality of candidate application programs, and may predict a probability of downloading each candidate application program by the user according to the obtained behavior prediction information. And then, the background server can recommend a plurality of application programs with higher probability to the application market for display, so that the effect of improving the downloading rate of the application programs is achieved. Meanwhile, the actual downloaded behavior data of the user is stored in the log file 40 as new training sample data, so that the learning module 10 continuously updates the model parameters of the behavior prediction model 20 through training the new training sample data, so as to improve the prediction effect of the behavior prediction model 20.

In the advertisement accurate delivery system, the behavior prediction model 20 is mainly used for predicting advertisement click behaviors of users. Accordingly, the historical behavior information obtained by the learning module 10 may be advertisement click history data of the user. The behavior prediction model 20 may predict the probability of the user clicking each advertisement according to the obtained user attribute data, the current environment data, and the attribute data of each candidate advertisement provided by the advertiser, and display the advertisement with the highest probability to the user. Meanwhile, the actual click behavior data of the user is stored in the log file 40 as new training sample data, so that the learning module 10 continues to train on the new training sample data.

The advertisement click history data obtained by the learning module 10 usually includes a plurality of feature data belonging to different categories. For example, each advertisement click history data may include a plurality of categories of feature data such as the user's age, gender, city, time of clicking on an advertisement, brand of advertisement, and Internet Protocol (IP) address. The following two advertisement click history data are taken as examples:

beijing, Tuesday, 17:00, fast food A, 0;

shanghai, Sunday, 11:20, fast food B, 1;

in the two advertisement click history data, the first four items are feature data, and the last item is a behavior tag. The behavior tag may be used to indicate whether the user clicked on the advertisement, e.g., 1 indicates that the user clicked on the advertisement and 0 indicates that the user did not click on the advertisement. In the two advertisement click history data, the categories to which the feature data included in each advertisement click history data belongs are: city, week, time of day, and brand of advertisement. Thus, the second advertisement click history data may indicate that a user in the Shanghai was 11 a.m: 20 one click behavior on advertisement for fast food a.

According to the above example, it can be seen that the advertisement click history data includes a plurality of feature data of different categories, the plurality of feature data have strong discrete characteristics (i.e. each feature data can be represented by a discrete numerical value), and complex interaction relationships exist between the feature data of different categories: the interaction between different categories of feature data is sometimes positively correlated with the probability of a user clicking on an advertisement, and sometimes negatively correlated. For example, food-related advertisements are more likely to be clicked during meal times, while cold-related advertisements are clicked less likely during winter. Therefore, mining the link between feature data has an important influence on improving the push accuracy of the object push system.

The embodiment of the invention provides a user behavior prediction method, which can improve the accuracy of behavior prediction by analyzing the interaction influence of different types of feature data on specified behaviors. The method may be applied to the behavior prediction model 20 in the object push system shown in fig. 1, which may be deployed in a server. Referring to fig. 2, the method may include:

step 101, behavior prediction information for predicting a specified behavior is acquired.

In the embodiment of the present invention, the behavior prediction model may obtain behavior prediction information for predicting the specified behavior after receiving the prediction request sent by the terminal. The specified behavior may be one of a plurality of candidate behaviors pre-configured in the object push system. The behavior prediction information for predicting the specified behavior may include a plurality of feature data, and categories to which any two feature data belong may be different.

The prediction request may be a request for the terminal to obtain a candidate object, for example, the prediction request may be an object obtaining request sent by the terminal to the backend server after the user instructs the terminal to start a certain application (such as an application market, news, or video), and the object obtaining request is used to request to obtain an object such as a candidate application, news, or video. Alternatively, the prediction request may be a request for the terminal to obtain a specified object, for example, the prediction request may be a video object obtaining request sent by the terminal to the background server after the user instructs the terminal to play a certain video online. After receiving the video object acquisition request, the background server can predict the probability of clicking each candidate advertisement by the user of the terminal according to the acquired behavior prediction information before sending the video object to the terminal, and pushes the advertisement with the highest probability to the terminal.

Optionally, the behavior prediction information for predicting the specified behavior may include: user attribute data, current environment data, and attribute data of an execution object of the specified behavior. The user attribute data may be feature data for describing user attributes, and may include a plurality of categories of feature data such as gender, age, occupation, academic calendar, native place, and the like. The current environment data is feature data for describing an environment state at the behavior prediction time, and may include, for example, feature data of a plurality of categories such as time, place, temperature, and weather. The attribute data of the execution object of the specified behavior may be feature data for describing the attribute of the execution object, and may include, for example, feature data of a plurality of categories such as a brand, a type, and a release time of the execution object. The user attribute data in the behavior prediction information may be obtained from a log file, or may be obtained from a background server of the object provider (e.g., a user database of a video server). The attribute data of the execution object may be obtained from a log file or may be obtained from a background server of the object provider (e.g., a video database of a video server).

For example, assuming that the object pushing system is an advertisement accurate delivery system for pushing an advertisement before video playing, the user behavior is a behavior of clicking the advertisement by the user, and the prediction request may be a request for acquiring a video object sent by the terminal. If five candidate advertisements, namely, the advertisement a to the advertisement E, are pre-stored in the push model 30 of the accurate advertisement delivery system, after the behavior prediction model 20 of the accurate advertisement delivery system detects the prediction request, behavior prediction information for predicting the behavior of the user clicking each candidate advertisement can be respectively obtained. For example, the behavior prediction information obtained by the behavior prediction model 20 for predicting the behavior of the user clicking on the advertisement a may include: woman, 30 years old, 12:00, ad a, and food. The categories to which the plurality of feature data included in the behavior prediction information belong are in turn: gender, age, time, brand of advertisement, and type of advertisement. Wherein, gender and age are user attribute data, time is current environment data, and brand and type of advertisement are attribute data of execution object.

And 102, determining the feature identifier of each feature data in the plurality of feature data according to the corresponding relation between the feature data and the feature identifier. Step 103 and step 104 are performed.

Because the data formats of the feature data in the behavior prediction information may be different, in order to improve the efficiency of data processing, the behavior prediction model may determine the feature identifier of each feature data, and the feature identifier may be a codeword or a vector meeting the requirements of a preset format, so that each feature data may be converted into a feature identifier in a unified format, which is convenient for subsequent data processing and improves the efficiency of behavior prediction. Each feature data corresponds to a unique feature identifier in the category to which the feature data belongs, and the feature identifiers corresponding to the feature data of different categories can be the same.

Optionally, the feature identifier of each feature data may be a vector encoded by a one-hot code (one-hot code). Only one bit of the feature identifier obtained by the unique code is 1, other bits are 0, and the length of the feature identifier is equal to the total number of the feature data included in the category to which the feature data belongs. For example, for the gender category, since the feature data of the category only includes male and female feature data, the feature identifier obtained by using the unique code may be a code word including a two-digit binary number. For example, the codeword for gender females may be 01 and the codeword for gender males may be 10. For the category of the week, since the feature data of the category includes 7 kinds of feature data from monday to sunday, the feature identifier obtained by using the unique code may be a vector with a length of 7, for example, the feature identifier corresponding to monday may be a vector [1,0,0,0,0,0,0], and the feature identifier corresponding to wednesday may be a vector [0,0,1,0,0,0,0, 0 ].

In the embodiment of the present invention, the feature identifier of each feature data may also be a codeword or vector obtained by encoding in other encoding manners, as long as it is ensured that each feature data has a unique feature identifier in the category to which the feature data belongs. For example, for behavior prediction information: the behavior prediction model comprises a woman, 30 years old, 12:00, advertisement A and food, and the feature identifications corresponding to the five feature data acquired by the data processing module 201 in the behavior prediction model can be sequentially as follows: 1. 30, 12, 1 and 4.

Fig. 3 is a schematic structural diagram of a behavior prediction model according to an embodiment of the present invention, and referring to fig. 3, the behavior prediction model may include: a data processing module 201, a width model module 202, a feature interaction model module 203, and a result integration module 204. The step of obtaining the behavior prediction information shown in step 101 and the step of determining the feature identifier in step 102 may be implemented by the data processing module 201.

And 103, acquiring a first contribution value of each feature data in the plurality of feature data to the specified behavior. Step 105 is performed.

In the embodiment of the present invention, a corresponding relationship between the feature identifier and the contribution value may be stored in the behavior prediction model in advance, the corresponding relationship is obtained by training the training sample data in advance, and the feature identifier recorded in the corresponding relationship may include the feature identifier of each feature data in all feature data trained by the behavior prediction model. When user behavior prediction is performed, after the behavior prediction model obtains the feature identifier of each feature data in the behavior prediction information, a first contribution value of each feature data to the specified behavior can be directly obtained according to the corresponding relation.

The first contribution value can be used to indicate the degree of influence of the feature data on the execution of the specified behavior, and the magnitude of the first contribution value is positively correlated with the magnitude of the degree of influence. I.e. the larger the first contribution value of a certain feature data, the higher the influence of the feature data on the execution of the specified action, i.e. the more likely the user is to execute the specified action. Alternatively, the first contribution value may be a positive number not greater than 1.

Optionally, the step of determining the first contribution value in step 103 may be implemented by the width model module 202. As mentioned above, since the feature identifiers of different classes of feature data may be the same, the correspondence between the feature identifiers and the contribution values stored in the width model module 202 may include a plurality of correspondences, each correspondence corresponds to a class, and each correspondence is used for recording the first contribution value corresponding to each feature data in the corresponding class.

In order to ensure that the width model module 202 can accurately identify each feature data, the data processing module 201 may arrange the feature identifiers of each feature data according to a predetermined category sequence (for example, the category of the first feature data is gender, the category of the second feature data is age, etc.) to obtain an identifier sequence, and then input the identifier sequence to the width model module 202. After the width model module 202 obtains the identifier sequence, based on the predetermined feature sequence, the type of the feature data indicated by the feature identifier is determined according to the position of each feature identifier in the identifier sequence, so as to obtain a corresponding relationship corresponding to the type, and obtain the first contribution value of the feature data from the corresponding relationship.

For example, assume that the pre-agreed category order is: gender, age, time, brand of advertisement, type of advertisement. If the identification sequence obtained by the width model module 202 is 1, 30, 12, 1, 4, the width model module 202 may determine that the first feature identifier 1 in the identification sequence is 1The category to which the feature data indicated by the feature identifier belongs is gender, and therefore the first contribution value c corresponding to the feature identifier 1 can be obtained from the correspondence between the feature identifier corresponding to the gender and the first contribution value_0,1. Similarly, the width model module 202 may sequentially obtain the first contribution values corresponding to the other four feature identifiers based on the same method: c. C_1,30，c_2,12，c_3,1And c_4,4. The first bit of the subscript in each first contribution value may represent a category of the feature data, that is, a position where the feature identifier is located in the identifier sequence, and the second bit is the feature identifier of the feature data in the category to which the feature data belongs.

And 104, processing every N pieces of feature data in the plurality of feature data by adopting a corresponding feature interaction model to obtain a second contribution value of every N pieces of feature data to the specified behavior. Step 106 is performed.

The N is an integer greater than 1, a feature interaction model corresponding to any N feature data is determined by N categories to which the any N feature data belong, the second contribution value is used for indicating an influence degree on execution of the specified behavior, and the magnitude of the second contribution value is positively correlated with the magnitude of the influence degree.

In the embodiment of the present invention, the behavior prediction model may include a plurality of feature interaction models obtained through pre-training, where each feature interaction model corresponds to N categories, and the categories corresponding to any two feature interaction models are different. After the behavior prediction information is obtained by the behavior prediction model, a feature interaction model corresponding to the N categories to which each N feature data belongs can be determined according to the correspondence between the feature interaction models and the categories. Then, the determined feature interaction model can be used to process the corresponding N feature data, so as to obtain a second contribution value of the N feature data to the specified behavior.

If the number of feature data included in the behavior prediction information is M (i.e., the number of categories is M), since a corresponding feature interaction model needs to be determined for every N feature data of the M, the behavior prediction models need to be commonly used

And processing the corresponding characteristic interaction data in the behavior prediction information by the characteristic interaction model. Accordingly, a behavior prediction model can be finally obtained

A second contribution value. Optionally, in the embodiment of the present invention, N may be 2, that is, for every two feature data, a feature interaction model corresponding to two categories to which the two feature data belong may be used for processing, so that a prediction effect of the behavior prediction model may be effectively ensured.

For example, assuming that, when model training is performed, the number of classes M to which the sample feature data belongs is 5, and the number of classes N corresponding to each feature interaction model is 2, 10 feature interaction models may be included in the behavior prediction model. The correspondence between the 10 feature interaction models and the categories can be shown in table 1. Wherein, the characteristic interaction model corresponding to the gender and the age is sigma_0,1The feature interaction model corresponding to age and time is σ_1,2. As can be seen from table 1, in the behavior prediction model, the feature interaction model corresponding to the ith category and the jth category in the plurality of categories may be represented as σ_i,j。

TABLE 1

Further, if the categories to which the feature data included in the behavior prediction information acquired by the behavior prediction model belong are in turn: gender, age, time, brand of advertisement, type of advertisement (i.e., M-5), and N-2. Then the feature data of the 5 categories are combined pairwise to obtain the total

And (4) combining types of categories. For each two of the categories, the behavior prediction model may be based on the correlations shown in Table 1 aboveFeature interaction models corresponding to the two categories are determined. For example, for gender and age, the behavior prediction model may determine that the feature interaction model for the two categories is σ_0,1And can adopt the feature interaction model as sigma_0,1And processing the characteristic number with the gender as the category and the characteristic data with the age as the category to obtain a second contribution value of the two characteristic data to the specified behavior.

In an optional implementation manner of the embodiment of the present invention, in the step 102, because the behavior prediction model obtains the feature identifier of each feature data, when obtaining the second contribution value of each N feature data, the behavior prediction model may first obtain the feature vector corresponding to the feature identifier of each feature data in the N feature data, and then process the obtained N feature vectors by using one feature interaction model corresponding to the N categories to which the N feature data belong, so as to obtain the second contribution value of the N feature data to the specified behavior.

The behavior prediction model may store a correspondence between feature identifiers and feature vectors in advance, and the length of the feature vector corresponding to each feature identifier is equal, and the length of the feature vector may be determined by an empirical value, for example, may be 16 or 32. Since the lengths of the feature identifiers corresponding to different feature data may be different, and the lengths of the feature identifiers of some categories (e.g., city, time, temperature, etc.) of feature data may be too long, the effective information is less. Therefore, in order to improve the data processing efficiency, before calculating the second contribution value of each N feature data, the feature identifier of each feature data may be converted into a feature vector with a uniform length, and then each N feature vectors are processed to obtain a corresponding second contribution value.

Optionally, as shown in fig. 3, the data processing module 201 may send the generated identification sequence to a feature interaction submodule 2031 in the feature interaction model module 203, and then the feature interaction submodule 2031 may obtain a second contribution value of every N feature data. Fig. 4 is an architecture diagram of a behavior prediction model according to an embodiment of the present invention, and as can be seen from fig. 4, an embedding layer may be included in the behavior prediction model, and the embedding layer may be a processing layer in the data processing module 201 for converting feature identifiers into feature vectors. As shown in fig. 4, the embedding layer may convert each of the M feature data into a corresponding feature vector, and then input the feature vector into the feature interaction model.

As mentioned above, since the feature identifiers of the feature data of different categories may be the same, the correspondence between the feature identifiers and the feature vectors stored in the width model module 202 may include a plurality of correspondences, each correspondence corresponds to a category, and each correspondence is used for recording the feature vector corresponding to each feature data in the corresponding category.

For example, as shown in fig. 3, it is assumed that the identification sequence sent by the data processing module 201 to the feature interaction submodule 2031 is 1, 30, 12, 1, 4. The feature interaction submodule 2031 may: gender, age, time, brand of advertisement, type of advertisement, determining the category to which the feature data indicated by each feature identifier in the sequence of identifiers belongs. For the first feature identifier 1 in the identifier sequence, the feature interaction sub-module 2031 may determine that the category to which the feature data indicated by the feature identifier belongs is gender, and thus may determine that the feature vector corresponding to the feature identifier 1 is v from the correspondence between the feature identifier corresponding to gender and the feature vector_0,1. Similarly, the feature interaction sub-module 2031 may sequentially obtain feature vectors corresponding to the other four feature identifiers based on the same method: v. of_1,30，v_2,12，v_3,1And v_4,4. The first bit of the subscript in each feature vector may represent the category of the feature data, that is, the position of the feature identifier in the identifier sequence, and the second bit is the feature identifier of the feature data in the category to which the feature data belongs.

Further, for every two feature vectors in the five feature vectors, the feature interaction sub-module 2031 may use a corresponding feature interaction model to perform a feature interaction on the two feature vectors according to the categories to which the feature data indicated by the two feature vectors belongAnd processing the feature vector. For example, for feature vector v_0,1And v_1,30Referring to table 1, the feature interaction submodule 2031 may employ a feature interaction model σ_0,1Processing to obtain a second contribution value f of the feature data indicated by the two feature vectors_0,1. Similarly, for every two other feature vectors, the feature interaction sub-module 2031 may respectively adopt the corresponding feature interaction models for processing. The second contribution value finally obtained by the feature interaction submodule 2031 may be sequentially: f. of_0,1，f_0,2，f_0,3，f_0,4，f_1,2，f_1,3，f_1,4，f_2,3，f_2,4，f_3,4. Two digits in the subscript of the second contribution value may indicate a category to which the two feature data corresponding to the second contribution value belong.

In the embodiment of the invention, the second contribution value of the feature data of different classes to the specified behavior is calculated, and the interactive influence of the feature data on the user to execute the specified behavior is considered, so that the prediction effect of the behavior prediction model can be effectively improved,

optionally, in the embodiment of the present invention, the feature interaction model in the behavior prediction model may be a kernel function σ, and the kernel function σ may be in the form of a vector, a matrix, or a functional. The structure of the kernel functions of different feature interaction models may be the same (e.g. may be in the form of a matrix), but the parameters of the kernel functions of different feature interaction models are different. The parameters of each kernel function are obtained by training the training sample data in advance.

For example, as shown in fig. 5, the kernel function σ in the feature interaction model may be a kernel vector (kernel vector); or as shown in fig. 6, the kernel function σ may be a kernel matrix (kernel matrix); alternatively, as shown in fig. 7, the kernel function σ may be a functional (functional kernel) expressed in the form of a neural network. The embodiment of the invention expands the implementation mode of the feature interaction model from a single calculation vector inner product to a kernel function, and calculates the influence of each feature vector on the execution of the specified behavior by mapping the feature vectors to different spaces, thereby effectively improving the flexibility of the feature interaction model during implementation. And the accuracy of the second contribution value calculated by the feature interaction model can be further improved due to the fact that the kernel function has various structures.

And 105, determining a first comprehensive contribution value according to the acquired first contribution value of each feature data.

The first integrated contribution value may be positively correlated with the first contribution value of each feature data, that is, the larger the first contribution value of each feature data is, the larger the first integrated contribution value is. In an embodiment of the present invention, the process of determining the first integrated contribution value may also be implemented by the width model module 202 in the behavior prediction model.

As an optional implementation manner, the behavior prediction model may sum the first contribution value of each acquired feature data and the reference contribution value to obtain the first comprehensive contribution value. The reference contribution value may be obtained by training the sample feature data in advance by a behavior prediction model, and may also be a positive number not greater than 1. For example, the baseline contribution value may be the output of the width model module 202 when no feature data is input.

For example, it is assumed that the first contribution value of each feature data acquired by the behavior prediction model is: c. C_0,1，c_1,30，c_2,12，c_3,1And c_4,4The reference contribution value obtained by pre-training is c_gThen the first integrated contribution value out₁Can satisfy the following conditions:

out1＝c_0,1+c_1,30+c_2,12+c_3,1+c_4,4+c_g。

as another alternative implementation, the behavior prediction model may also directly sum the first contribution values of the acquired feature data, so as to obtain the first comprehensive contribution value. I.e. the behaviour prediction model may also be used without training and storing the reference contribution values.

And 106, determining a second comprehensive contribution value according to the acquired second contribution value of each N pieces of feature data.

In an embodiment of the present invention, the obtaining of the determined second comprehensive contribution value may also be implemented by the feature interaction model module 203 in the behavior prediction model. For example, it can be implemented by the comprehensive processing sub-module 2032 in the feature interaction model module 203.

As an optional implementation manner, the behavior prediction model may directly sum the second contribution values of every N acquired feature data to obtain the second comprehensive contribution value, and the method for acquiring the second comprehensive contribution value is simple and has low computational complexity.

For example, assuming that N is 2, the second contribution values of the 10 feature interaction model outputs obtained by the comprehensive processing sub-module 2032 are in turn: f. of_0,1，f_0,2，f_0,3，f_0,4，f_1,2，f_1,3，f_1,4，f_2,3，f_2,4，f_3,4. The comprehensive processing sub-module 2032 sums the second contribution values to obtain a second comprehensive contribution value out₂Can satisfy the following conditions:

out₂＝f_0,1+f_0,2+f_0,3+f_0,4+f_1,2+f_1,3+f_1,4+f_2,3+f_2,4+f_3,4。

as another optional implementation manner, the behavior prediction model may further input the second contribution values of each acquired N feature data into the neural network, and use an output of the neural network as the second comprehensive contribution value. The neural network obtained by pre-training is adopted to obtain the second comprehensive contribution value, so that the accuracy of the obtained second comprehensive contribution value can be ensured, and the prediction effect of behavior prediction can be further ensured.

Optionally, the comprehensive processing sub-module 2032 may be a neural network module. With reference to fig. 3 and 4, after the feature interaction sub-module 2031 obtains the second contribution values of every N feature data, it may generate a feature interaction vector based on the obtained second contribution values, and input the feature interaction vector to the neural network (e.g., each second contribution value in the feature interaction vector may be input to one nerve of the input layerIn meta). The feature interaction vector has a length of

For example, the feature interaction submodule 2031 generates a feature interaction vector V based on the second contribution value obtained by the feature interaction submodule 2031_fCan be as follows:

V_f＝[f_0,1，f_0,2，f_0,3，f_0,4，f_1,2，f_1,3，f_1,4，f_2,3，f_2,4，f_3,4]。

the arrangement order of the second contribution values in the feature interaction vector may be arranged according to the order of the categories, to which the N feature data corresponding to the second contribution values belong, in the identification sequence. For example, the second contribution f_0,1The category to which the corresponding two feature data belong is located at the first two bits in the identification sequence, so that the second contribution value f can be set_0,1As the first bit in the feature interaction vector. Second contribution f_3,4The category to which the corresponding two feature data belong is located at the last two bits in the identification sequence, so that the second contribution value f can be set_3,4As the last bit in the feature interaction vector.

Fig. 8 is a schematic structural diagram of a neural network according to an embodiment of the present invention, and referring to fig. 8, the neural network may be a multilayer neural network, each layer of the neural network includes a plurality of neurons, and weights and offsets between neurons in adjacent layers may also be obtained through training. After the neural network obtains the interactive feature vector, the numerical value of each neuron can be calculated layer by layer based on the weights and the offsets between the neurons obtained through training, and the numerical value of the neuron of the output layer is finally calculated.

As shown in fig. 8, the input layer of the neural network is the lowermost layer of the entire network, and the output layer is the uppermost layer. Each layer of neurons in the neural network is connected with one or more neurons in the adjacent layer, and the connecting edges between two neurons comprise a weight and an offset. The neural network can calculate from the bottom layer to the upper layer when processing the input interactive feature vector, and the value of each neuron is determined by the value of the neuron of the next layer connected with the neuron. The contribution value of each lower layer neuron to the neuron of the upper layer connected with the neuron is obtained by multiplying the numerical value of the lower layer neuron by the weight of the corresponding connecting edge and adding bias. For each neuron, after summing the contribution values of the connected lower layer neurons, multiplying by an activation function (usually a nonlinear function that maps from all real numbers to a fixed interval to ensure that the value of each neuron is within a fixed range) to obtain the value of the neuron. The neural network may repeat the above process until the value of the neuron element of the output layer is calculated, which is the output of the entire neural network, i.e., the second integrated contribution value.

For example, let the h +1 th layer of neurons be r^h+1The neuron of the h +1 th layer r^h+1And the h-th layer of neurons r^hConnected to the h-th layer of neurons r^hThe weight of the connecting edge is W^hOffset is b^hAnd A is an activation function, the neuron r of the h +1 th layer in the neural network^h+1The numerical value of (a) is calculated as follows:

r^h+1＝A(W^hr^h+b^h)；

r in the above formula^h+1And b^hMay all be a K1 matrix, r^hMay be a Lx 1 matrix, W^hIt may be a K × L matrix, where K is the number of neurons included in the h +1 th layer, and L is the number of neurons included in the h layer. W^hThe element of the ith row and the ith column (K is a positive integer not more than K, and L is a positive integer not more than L) is the weight of the connecting edge between the kth neuron in the h +1 th layer and the ith neuron in the h layer, b^hThe k row element is the bias of the k neuron in this h +1 layer.

For example, it is assumed that the first layer and the second layer in the neural network each include 3 neurons (i.e., W ═ L ═ 3), where the 3 neurons in the first layer are x1, x2, and x3, respectively, and the 3 neurons in the second layer are y1, y2, and y3, respectively. With w_klRepresenting the distance between the kth neuron in the second layer and the l neuron in the first layerThe weight of the connected edges and b_kRepresenting the bias of the kth neuron in the second layer, the values of the 3 neurons of the second layer are respectively:

y1＝A(x1*w₁₁+x2*w₁₂+x3*w₁₃+b₁)；

y2＝A(x1*w₂₁+x2*w₂₂+x3*w₂₃+b₂)；

y3＝A(x1*w₃₁+x2*w₃₂+x3*w₃₃+b₃)；

the values of the 3 neurons of the second layer can be expressed in the form of matrix multiplication as follows:

assuming that the neural networks share the H layer, the value of the neuron (i.e., the second integrated contribution value) out of the final output layer₂Can satisfy the following conditions:

out₂＝r^H＝W^H-1r^H-1+b^H-1＝W^H-1[A(W^H-2r^H-2+b^H-2)]+b^H-1；

wherein r is^H-1Is a neuron of layer H-1, W^H-1And b^H-1Respectively, the weights and offsets of the connecting edges between the neurons of the H-1 th layer and the neurons of the output layer. Referring to the above formula, it can be seen that the activation function a does not need to be considered when calculating the values of the neurons of the output layer, i.e., the activation function a needs to be multiplied only when calculating the neurons of the network layer preceding the output layer.

And 107, carrying out weighted summation on the first comprehensive contribution value and the second comprehensive contribution value by adopting a preset weight value to obtain the execution probability of the specified behavior.

The preset weight value is obtained by training the sample characteristic data in advance by the behavior prediction model, and the preset weight value comprises the weight of the first comprehensive contribution value and the weight of the second comprehensive contribution value. The execution probability is obtained through a weighted summation mode, the influence of single characteristic data on the specified behavior and the interactive influence of a plurality of characteristic data on the specified behavior can be well balanced, and the prediction effect of behavior prediction is guaranteed.

By way of example, assume that the first composite contribution value has a weight of k₁The second integrated contribution value has a weight of k₂Then, the execution probability P of the user executing the specified behavior, which is finally calculated by the behavior prediction model, satisfies:

P＝k₁×out₁+k₂×out₂。

the execution probability P may also be referred to as a prediction score (score) of a given behavior. The higher the execution probability P of the specified behavior, the more likely the user is to execute the specified behavior.

Optionally, since the weighted summation of the first and second integrated contribution values results in that the summation result may exceed the range of [0,1], if the summation result exceeds the range of [0,1] after the weighted summation, the summation result may be further processed by using a mapping function to map the summation result into the interval of [0,1 ]. The mapping function may be a sigmoid (sigmoid) function or other functions with similar functions, which is not limited in the embodiment of the present invention.

In the embodiment of the present invention, for a plurality of candidate behaviors configured in advance in the object pushing system, the behavior prediction model may respectively calculate the execution probability of each candidate behavior executed by the user through the methods shown in the above steps 101 to 107, and may send the calculation result to the pushing model 30. The push model 30 may rank the execution probabilities of the candidate behaviors in an order from high to low, and push the execution object of the candidate behavior with the highest execution probability to the terminal, or may push the execution objects of several candidate behaviors with the highest execution probabilities to the terminal.

In an optional application scenario, if the object pushing system is an advertisement accurate delivery system, the candidate behavior is a behavior of clicking an advertisement, and an execution object of the candidate behavior is an advertisement. If five candidate advertisements from the advertisement A to the advertisement E are stored in the accurate advertisement delivery system in advance, the probability of clicking the five advertisements by the user calculated by the behavior prediction model is 0.8, 0.5, 0.3, 0.6 and 0.4 in sequence. The push model 30 may determine that the user has the highest probability of clicking on ad a and may therefore push ad a to the terminal.

In another optional application scenario, it is assumed that the object pushing system is a personalized recommendation system, and the personalized recommendation system is deployed in a background server of an application market. When the user opens the application program installed in the mobile phone: when the application market is applied, the application market can send a recommendation request to a background server. After the personalized recommendation system deployed in the background server receives the recommendation request, for a plurality of candidate applications, for example, the application 1 to the application 10, a behavior prediction model of the personalized recommendation system may calculate a probability that the user downloads each candidate application. And then, the background server can recommend a plurality of application programs with higher probability to the application market for display.

Optionally, in the embodiment of the present invention, the object pushed by the object pushing system may be, in addition to an advertisement and an application program, a video, music, news, and the like, which is not limited in the embodiment of the present invention.

It should be noted that the order of the steps of the behavior prediction method provided in the embodiment of the present invention may be appropriately adjusted, and the steps may also be increased or decreased according to the situation. For example, step 104 and step 103 may be executed synchronously, or step 102 may be deleted as the case may be, that is, the behavior prediction model may obtain the first contribution value and the second contribution value directly based on the received feature data. Any method that can be easily conceived by a person skilled in the art within the technical scope disclosed in the present application is covered by the protection scope of the present application, and thus the detailed description thereof is omitted.

In summary, the embodiments of the present invention provide a behavior prediction method, where when the execution probability of a specified behavior is predicted according to acquired behavior prediction information, a first contribution value of each feature data to the specified behavior may be respectively calculated, and a second contribution value of N feature data to the specified behavior may be calculated according to a feature interaction model, where interaction influence of multiple feature data on the specified behavior is considered, so that accuracy of behavior prediction is effectively improved. Moreover, because the feature interaction model corresponding to each N feature data is determined based on the category to which the N feature data belong, namely each N category corresponds to one feature interaction model, the problem that the prediction result is poor due to the fact that all feature data are processed by the same feature interaction model can be solved, and the problem that the calculation complexity is too high due to the fact that each N feature data are processed by independent feature interaction models can be solved. Namely, the behavior prediction method provided by the embodiment of the invention can obtain a better prediction effect with lower calculation complexity.

The embodiment of the invention also provides a training method of the behavior prediction model, and the training method can be used for training the behavior prediction model adopted in the embodiment of the method. The training method can be applied to the learning module 10 in the object pushing system shown in fig. 1. Referring to fig. 9, the method may include:

step 301, obtaining training sample data, where the training sample data includes multiple sample feature data and behavior labels of sample behaviors.

As shown in fig. 1, the training sample data may be the historical behavior information of the user acquired from the log file 40. In the training sample data, the categories to which any two sample feature data belong are different. The behavior tag may be used to indicate whether the user performs the sample behavior, and the value of the behavior tag may be 0 or 1. Where 0 is used to indicate that no sample behavior is performed and 1 is used to indicate that a sample behavior is performed. Similar to the behavior prediction information, the sample feature data in the training sample data may also include user attribute data, environment data, and attribute data of an execution object of the sample behavior.

For example, a certain training sample data acquired by the learning module 10 may be: beijing, Tuesday, 17:00, fast food A, 0, the training sample data may indicate that a user in Beijing did not click on the advertisement in fast food A after receiving the advertisement in Tuesday afternoon at 17: 00. The category of the Beijing is a city where the user is located, and the category of the Beijing is user attribute data; the category of the tuesday is week, the category of the tuesday 17:00 is time, and the two training sample data are both environmental data; fast food A belongs to the category of the brand of the advertisement and is attribute data of the execution object.

Step 302, obtaining a first reference contribution value of each sample feature data in the plurality of sample feature data to the sample behavior.

In the embodiment of the present invention, during the initial training, the learning module 10 may store an initial reference contribution value corresponding to each sample feature data, where the initial reference contribution value may be obtained by random initialization. The learning module 10 may obtain a first reference contribution value of each sample feature data to the sample behavior based on the correspondence.

Step 303, for every N sample feature data in the plurality of sample feature data, processing by using a corresponding feature interaction model to obtain a second reference contribution value of every N sample feature data to the sample behavior.

Optionally, the learning module 10 may store initial model parameters of a plurality of feature interaction models, where each feature interaction model may correspond to N categories, and the categories corresponding to the feature interaction models are different from each other. Moreover, the initial model parameters of each feature interaction model may be the same or different, and this is not limited in the embodiment of the present invention.

For every N sample feature data in the plurality of sample feature data, the learning module 10 may determine, according to N categories to which the N sample feature data belong, one feature interaction model corresponding to the N categories, and process the N sample feature data by using the determined feature interaction model. If the training sample data includes M sample feature data, the learning module 10 can finally obtain the training sample data

A second reference contribution value of the device.

And step 304, determining the execution probability of the sample behavior according to the acquired first reference contribution value of each sample feature data and the acquired second reference contribution value of each N sample feature data.

Optionally, the learning module 10 may accumulate the first reference contribution value and the second reference contribution value of each acquired sample feature data to obtain the execution probability of the sample behavior.

And 305, adjusting model parameters of the behavior prediction model according to the difference between the execution probability of the sample behavior and the behavior label, and continuing training until the training stopping condition is met, so as to obtain the behavior prediction model after the model parameters are adjusted.

In the embodiment of the present invention, after determining the execution probability of the sample behavior, the learning module may compare the execution probability with the difference between the behavior labels recorded in the training sample data, and adjust the model parameters of the behavior prediction model based on the difference. The model parameters may include model parameters of the feature interaction models and a first reference contribution value corresponding to each sample feature data. The model parameters may include parameters of the kernel function, and a feature vector corresponding to a feature identifier of each feature data.

The training stop condition may include: the number of times of iterative training reaches a specified number of times, or the difference between the execution probability and the behavior label is smaller than a specified difference threshold value. After the training is finished, the finally obtained behavior prediction model may include a plurality of feature interaction models, where each feature interaction model may correspond to N categories, and the categories corresponding to any two feature interaction models are different.

Optionally, fig. 10 is a flowchart of a method for determining an execution probability of a sample behavior according to an embodiment of the present invention, and referring to fig. 10, the method may include:

step 3041, summing the obtained first reference contribution value and the benchmark contribution value of each sample feature data to obtain a first reference comprehensive contribution value.

In the initial training, the reference contribution value may be a preset value smaller than 1, for example, 0.

Step 3042, inputting the second reference contribution value of each acquired sample feature data into the neural network, and taking the output of the neural network as a second reference comprehensive contribution value.

The neural network may be a multi-layer neural network, each layer including a plurality of neurons. In the initial training, the weights and the bias between the neurons of the adjacent layers may be preset initial values.

Step 3043, according to a preset weight value, performing weighted summation on the first reference comprehensive contribution value and the second reference comprehensive contribution value to obtain an execution probability of the sample behavior.

Similarly, the preset weight value may also be a preset fixed value during the initial training. For example, in the initial training, the first reference integrated contribution value and the second reference integrated contribution value may both be weighted 0.5.

Accordingly, in step 305, the model parameters adjusted by the learning module 10 may further include at least: the reference contribution value, the weights and biases between neurons in the neural network, and the preset weight value.

Optionally, before the step 302, the method may further include:

and determining the feature identifier of each sample feature data in the plurality of sample feature data according to the corresponding relation between the sample feature data and the feature identifier.

The step 102 may be referred to in the process of determining the feature identifier of the sample feature data, which is not described herein again.

Accordingly, the step 302 may include:

and respectively determining a first reference contribution value corresponding to the feature identifier of each sample feature data in the plurality of sample feature data according to the corresponding relation between the feature identifier and the reference contribution value.

Accordingly, the step 303 may include:

3031, respectively obtaining the feature vector corresponding to the feature identifier of each sample feature data in each N sample feature data, wherein the length of the feature vector corresponding to each feature identifier is equal.

Step 3032, processing the obtained N feature vectors by using a feature interaction model corresponding to the N categories to which the every N sample feature data belong, and obtaining a second contribution value of the every N sample feature data to the sample behavior.

Therefore, in step 305, the model parameters of the behavior prediction model adjusted based on the difference may further include a feature vector corresponding to each feature identifier.

It should be noted that, in order to ensure the performance of the trained behavior prediction model, the learning module needs to train a large amount of training sample data, where the above steps 301 to 305 may be referred to for the process of training each training sample data, and details are not repeated here. In addition, the specific implementation processes of the steps 301 to 305, and the steps 3041 to 3043 may refer to the corresponding steps in the embodiment shown in fig. 2, which are not described herein again.

When the method provided by the embodiment of the invention is used for training the behavior prediction model, a corresponding feature interaction model can be established for every N categories in feature data. If the same feature interaction model is established for any multiple categories of feature data, although the efficiency of model training can be effectively improved, the effect of the feature interaction model is poor, and the interaction characteristics among different categories of feature data cannot be effectively expressed. If an independent feature interaction model is established for every N feature data, although the interaction characteristics among the feature data can be fully expressed, the number of parameters in the feature interaction model is greatly increased, and the complexity of the model and the training difficulty are greatly increased.

For example, assuming that the categories of the sample feature data share M classes, if a corresponding feature interaction model is established for each of the two classes (i.e., N is 2), the number of feature interaction models to be established is equal to

If the M types of sample feature data comprise samples of each type of sample feature dataThe total number of the characteristic data is n in sequence₁，n₂，…，n_MThen, the number of feature interaction models to be established is:

wherein n is_mThe total number of sample feature data included in the M-th (M is a positive integer not greater than M) class sample feature data in the M class sample feature data. Therefore, if a feature interaction model is established for every two feature data, the complexity of model training can be obviously improved, and the efficiency of model training is reduced.

For example, assume that the categories of training sample data are gender, brand of advertisement, and week, i.e., M-3. If the method provided by the embodiment of the present invention is adopted, and N is 2, only one corresponding feature interaction model needs to be established for the brand of the gender and the advertisement, the gender and the week, and the brand of the advertisement and the week, respectively, and the number of the feature interaction models is 3. However, if a corresponding feature interaction model needs to be established for every two feature data, the feature data includes both male and female in the category of gender, i.e., the total number of feature data is 2. In the category of the week, the feature data includes monday to sunday, and the total number of the feature data is 7, and it is assumed that in the category of the brand of the advertisement, the total number of the feature data is 5. When modeling, a corresponding feature interaction model needs to be established for women and mondays, a corresponding feature interaction model needs to be established for men and mondays, a corresponding feature interaction model needs to be established for women and mondays, and so on, the number of feature interaction models to be established is

According to the analysis, the method provided by the embodiment of the invention establishes the corresponding feature interaction models for the feature data of different types, and can better solve the contradiction between the prediction effect and the complexity of the behavior prediction model. In addition, the training method provided by the embodiment of the invention can establish a corresponding feature interaction model for every N categories, so that the explicit modeling of feature interaction among a plurality of feature data is realized, and compared with the implicit modeling, the explicit modeling can better explain and embody the interaction conditions among different feature data.

The embodiment of the invention also compares the prediction effect of the behavior prediction model provided by the application with that of a prediction model in the related technology. Four data sets commonly used in a behavior prediction method are adopted in the comparison process, and compared objects comprise an LR model, a factor decomposition Machine (FM) model, a field perception factor decomposition Machine (FFM) model, a Deep Neural Network (DNN) model, an attention-based AFM (attention-based factor decomposition Machine) and an Deep learning (Deep F) model based on the factor decomposition Machine in the related technology.

In the comparison process, four data sets are adopted, wherein the first data set comprises a large number of advertisement click records of a month of the user. And meanwhile, down-sampling negative samples (namely samples with behavior labels used for indicating non-clicked advertisements) in the training set, so that the ratio of the positive samples to the negative samples in the final training set is 1: 1. For the second data set, the embodiment of the present invention randomly selects 80% of the data as the training set and the remaining 20% of the data as the test set, and deletes the category whose occurrence number is less than 20 in the training set. The third data set itself contains the training set and the test set, and therefore, can be directly used in the comparison process. The fourth data set is a click download data set of the game type application of the application market.

The behavior prediction model provided by the embodiment of the invention can be called a Product-Network In Network (PIN) model based on a Product Network. In the comparison process, the PIN model and each model of the LR model, the FM model, the FFM model, the DNN model, the AFM model, and the deep FM model in the related art need to be model-trained by using a training set provided by each of the four data sets, and then behavior prediction is performed on a test set provided by each data set. The prediction result is measured by the area auc (area under curve) and the logarithmic loss (Logloss) under the receiver operating characteristic curve (ROC curve). The AUC is positively correlated with the accuracy of the prediction result, namely the larger the AUC is, the better the prediction effect is. The Logloss is inversely related to the accuracy of the prediction result, namely the smaller the Logloss is, the better the prediction effect is.

The behavior prediction model provided by the embodiment of the present invention and the prediction results of each model in the related art under the four data sets are shown in table 2 below. Referring to table 2, it can be seen that the predicted AUC of the deep fm model for the training set in the first data set is 79.91%, and Logloss is 0.5423%. The predicted AUC of the behavior prediction model PIN model in the first data set is 80.21% and Logloss is 0.5390%. As can be seen from table 2, the behavior prediction model provided in the embodiment of the present invention has better prediction effect in each data set than other models.

TABLE 2

Moreover, the comparison of the prediction results can also show that the behavior prediction model provided by the application is improved by 0.15% to 0.3% in AUC compared with the DeepFM model ranked second. While in general, a smaller AUC boost may result in a larger Click-Through-Rate (CTR) boost. For example, an AUC improvement of 0.275% may result in a 3.9% CTR improvement, which 3.9% CTR improvement rate may result in higher revenue for the subject provider.

Further, the complexity of each model is analyzed, and it is assumed that the size of an embedding layer (embedding) in each model is L (that is, the embedding layer can convert the feature identifier into a feature vector with a length of L), the number of classes of the sample feature data is M, and the total number of feature data included in each class is n. Then, for the FM model in the related art, the parameter scale is O (Ln), the parameter scale of the FFM model is O (nlm), and the parameter scale of the PIN model provided in the embodiment of the present invention is O (Ln + M (M-1)/2 × q). Here, O () can also be understood as the spatial complexity, i.e. the order of magnitude of the memory space consumed during model training. q is the number of parameters included in the feature interaction model (e.g., kernel function) in the behavior prediction model provided in the embodiment of the present invention. In practical application, the value of n is usually large, and the value of q is relatively small, so that the parameters of the PIN model provided by the embodiment of the invention are much smaller than those of the FFM model.

Table 3 is a comparison table of the parameter scale required when each model provided in the embodiment of the present invention trains the same training set, and the comparison table uses the memory occupied by the parameter as the measurement standard of the parameter scale. Referring to Table 3, it can be seen that the LR model requires 1 × 10 memory for the required parameters⁶Byte, the memory occupied by the parameters needed by the FFM model is more than or equal to 40 multiplied by 10⁶Bytes, and the memory occupied by the parameters required by the PIN model provided by the embodiment of the invention is 26.48 × 10⁶Bytes whose parameter size is much smaller than the FFM model.

TABLE 3

Model (model)	LR	DNN	FM	FFM	PIN
						Scale of parameters (10)⁶)	1	22.51	21	≥40	26.48

It should be noted that the behavior prediction model training method provided in the embodiment of the present invention may be executed before step 101 in the embodiment shown in fig. 2. Alternatively, after the step 107 is executed, that is, after the behavior prediction model determines the execution probability, the learning module may determine the behavior label of the specified behavior according to the behavior actually executed by the user, generate training sample data based on the behavior prediction information and the behavior label, and continue to train the newly generated training sample data by the methods shown in the steps 301 to 305.

In summary, an embodiment of the present invention provides a method for training a behavior prediction model, where in training sample data, for every N sample feature data, a corresponding feature interaction model may be used to calculate a second contribution value of the N sample feature data according to a category to which the N sample feature data belongs, and thus a trained behavior prediction model may include multiple feature interaction models, and each feature interaction model may correspond to N categories. When the behavior prediction model is adopted to predict the behavior, the interactive influence of the N characteristic data on the specified behavior can be considered, so that the prediction accuracy can be effectively improved. In addition, when model training is carried out, each N categories correspond to one feature interaction model, so that the problem that the training result is poor due to the fact that all sample feature data are processed by the same feature interaction model can be solved, and the problem that the training calculation complexity is too high due to the fact that each N sample feature data are processed by the independent feature interaction models can be solved. Namely, the training method provided by the embodiment of the invention can obtain a better training effect with lower computation complexity and can meet the actual deployment requirement.

Fig. 11 is a schematic structural diagram of a user behavior prediction apparatus according to an embodiment of the present invention, where the apparatus may be applied to the object pushing system shown in fig. 1. Referring to fig. 11, the apparatus may include:

the first obtaining module 401 may be configured to implement the method shown in step 101 in the foregoing method embodiment.

The second obtaining module 402 may be configured to implement the method shown in step 103 in the foregoing method embodiment.

The first processing module 403 may be configured to implement the method shown in step 104 in the foregoing method embodiment.

The first determining module 404 is configured to determine an execution probability of the specified behavior according to the obtained first contribution value of each piece of feature data and the obtained second contribution value of each N pieces of feature data.

Fig. 12 is a schematic structural diagram of a first determining module 404 according to an embodiment of the present invention, and as shown in fig. 12, the first determining module 404 may include:

the first determining sub-module 4041 may be configured to implement the method shown in step 105 of the above method embodiment.

The second determining sub-module 4042 may be configured to implement the method shown in step 106 in the above-described method embodiment.

The first summing submodule 4043 may be configured to implement the method shown in step 107 in the above-described method embodiment.

Optionally, the second determining sub-module 4042 may be configured to:

summing the second contribution values of every N acquired feature data to obtain a second comprehensive contribution value;

or inputting the second contribution values of every N acquired feature data into the neural network, and taking the output of the neural network as the second comprehensive contribution value.

Optionally, the first determining sub-module 4041 may be configured to:

Fig. 13 is a schematic structural diagram of another user behavior prediction apparatus according to an embodiment of the present invention, and as shown in fig. 13, the apparatus may further include:

the second determining module 405 may be configured to implement the method shown in step 102 in the foregoing method embodiment.

Accordingly, the second obtaining module 402 may be configured to:

Accordingly, the first processing module 403 may be configured to:

respectively acquiring a feature vector corresponding to the feature identifier of each feature data in each N feature data;

and processing the obtained N characteristic vectors by adopting a characteristic interaction model corresponding to the N categories to which the N characteristic data belong to obtain a second contribution value of the N characteristic data to the specified behavior.

With continued reference to fig. 13, the apparatus may further include:

a third determining module 406, configured to determine, for each N pieces of feature data in the plurality of feature data, a corresponding one of the feature interaction models from a correspondence between the feature interaction model and the category before the first processing module 403 processes each N pieces of feature data in the plurality of feature data by using the corresponding one of the feature interaction models;

the corresponding relationship may include a plurality of feature interaction models, each of the feature interaction models may correspond to N categories, and the categories corresponding to any two feature interaction models are different.

Optionally, the first contribution value, the second contribution value and the execution probability may all be obtained by a behavior prediction model, as shown in fig. 13, and the apparatus may further include:

the third obtaining module 407 may be configured to implement the method shown in step 301 in the foregoing method embodiment.

The fourth obtaining module 408 may be configured to implement the method shown in step 302 in the foregoing method embodiment.

The second processing module 409 may be configured to implement the method shown in step 303 in the foregoing method embodiment.

The fourth determining module 410 may be configured to implement the method shown in step 304 in the above method embodiment.

The adjusting module 411 may be configured to implement the method shown in step 305 in the foregoing method embodiment.

Optionally, the fourth determining module 410 may be configured to implement the methods shown in steps 3041 to 3043 in the above method embodiments.

Optionally, the feature interaction model may include: a kernel function; the N may be 2; the behavior prediction information may include: user attribute data, current environment data, and attribute data of an execution object of the specified behavior.

In summary, embodiments of the present invention provide a behavior prediction apparatus, which, when predicting an execution probability of a specified behavior according to obtained behavior prediction information, may respectively calculate a first contribution value of each feature data to the specified behavior, and may calculate second contribution values of N feature data to the specified behavior according to a feature interaction model, where interaction influences of multiple feature data on the specified behavior are considered, so that accuracy of behavior prediction is effectively improved. Moreover, because the feature interaction model corresponding to each N feature data is determined based on the category to which the N feature data belong, namely each N category corresponds to one feature interaction model, the problem that the prediction result is poor due to the fact that all feature data are processed by the same feature interaction model can be solved, and the problem that the calculation complexity is too high due to the fact that each N feature data are processed by independent feature interaction models can be solved. Namely, the behavior prediction device provided by the embodiment of the invention can obtain a better prediction effect with lower calculation complexity.

Fig. 14 is a schematic structural diagram of a behavior prediction model training apparatus according to an embodiment of the present invention, which may be applied to the object pushing system shown in fig. 1. The behavior prediction model may include a plurality of feature interaction models, where each feature interaction model corresponds to N categories, and the categories corresponding to any two feature interaction models are different, where the category is a category of sample feature data in training sample data. Referring to fig. 14, the apparatus may include:

the first obtaining module 501 may be configured to implement the method shown in step 301 in the foregoing method embodiment.

The second obtaining module 502 may be configured to implement the method shown in step 302 in the foregoing method embodiment.

The processing module 503 may be configured to implement the method shown in step 303 in the foregoing method embodiment.

The determining module 504 may be configured to implement the method shown in step 304 in the above method embodiment.

The adjusting module 505 may be configured to implement the method shown in step 305 in the foregoing method embodiment.

Optionally, the determining module 504 may be configured to implement the methods shown in steps 3041 to 3043 in the above method embodiments.

In summary, an embodiment of the present invention provides a behavior prediction model training device, where when training sample data, for every N sample feature data, a corresponding feature interaction model may be used to calculate a second contribution value of the N sample feature data according to a category to which the N sample feature data belongs, so that a trained behavior prediction model may include multiple feature interaction models, and each feature interaction model may correspond to N categories. When the behavior prediction model is adopted to predict the behavior, the interactive influence of the N characteristic data on the specified behavior can be considered, so that the prediction accuracy can be effectively improved. In addition, when model training is carried out, each N categories correspond to one feature interaction model, so that the problem that the training result is poor due to the fact that all sample feature data are processed by the same feature interaction model can be solved, and the problem that the training calculation complexity is too high due to the fact that each N sample feature data are processed by the independent feature interaction models can be solved. Namely, the training device provided by the embodiment of the invention can obtain a better training effect with lower computation complexity and can meet the actual deployment requirement.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 15 is a schematic structural diagram of a server according to an embodiment of the present invention, and as shown in fig. 15, the server may include: a processor 1201 (e.g., a CPU), memory 1202, a network interface 1203, and a bus 1204. The bus 1204 is used for connecting the processor 1201, the memory 1202, and the network interface 1203. The Memory 1202 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the server and the communication device is realized through a network interface 1203 (which may be wired or wireless). The memory 1202 stores a computer program 12021, the computer program 12021 is used to implement various application functions, and the processor 1201 is configured to execute the computer program 12021 stored in the memory 1202 to implement the user behavior prediction method or the behavior prediction model training method provided by the above method embodiments.

The embodiment of the present invention further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the user behavior prediction method or the behavior prediction model training method provided in the foregoing method embodiment.

The embodiment of the present invention further provides a computer program product containing instructions, and when the computer program product runs on a computer, the computer is enabled to execute the user behavior prediction method or the behavior prediction model training method provided by the above method embodiment.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium (e.g., solid state disk), among others.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for predicting user behavior, the method comprising:

acquiring behavior prediction information for predicting a specified behavior, wherein the behavior prediction information comprises a plurality of feature data, and the categories of any two feature data are different;

determining a feature identifier of each feature data in the plurality of feature data according to the corresponding relation between the feature data and the feature identifier, wherein the feature identifier is a code word or a vector meeting the requirements of a preset format;

respectively determining a first contribution value of each feature data in the plurality of feature data to the specified behavior according to the corresponding relation between the feature identification and the contribution value, wherein the first contribution value is used for indicating the degree of influence on the execution of the specified behavior, and the magnitude of the first contribution value is positively correlated with the magnitude of the degree of influence;

for every N pieces of feature data in the plurality of feature data, determining a corresponding feature interaction model from the corresponding relation between the feature interaction models and the categories, wherein the corresponding relation comprises a plurality of feature interaction models, each feature interaction model corresponds to N categories, and the categories corresponding to any two feature interaction models are different;

respectively obtaining a feature vector corresponding to a feature identifier of each feature data in each N feature data, wherein the length of the feature vector corresponding to each feature identifier is equal;

for every N pieces of feature data in the plurality of feature data, processing the obtained N pieces of feature vectors by adopting a feature interaction model corresponding to N categories to which the N pieces of feature data belong to obtain a second contribution value of each N pieces of feature data to the specified behavior, wherein N is an integer greater than 1, one feature interaction model corresponding to any N pieces of feature data is determined by the N categories to which the any N pieces of feature data belong, the second contribution value is used for indicating the degree of influence on the execution of the specified behavior, and the magnitude of the second contribution value is positively correlated with the magnitude of the degree of influence;

and determining the execution probability of the specified behavior according to the acquired first contribution value of each piece of feature data and the acquired second contribution value of each piece of N pieces of feature data.

2. The method according to claim 1, wherein the determining, according to the obtained first contribution value of each piece of feature data and the obtained second contribution value of every N pieces of feature data, the execution probability of the specified behavior includes:

determining a first comprehensive contribution value according to the acquired first contribution value of each feature data;

determining a second comprehensive contribution value according to the obtained second contribution value of each N pieces of feature data;

and weighting and summing the first comprehensive contribution value and the second comprehensive contribution value by adopting a preset weight value to obtain the execution probability.

3. The method according to claim 2, wherein the determining a second comprehensive contribution value according to the obtained second contribution value of every N pieces of feature data includes:

or inputting the second contribution values of every N acquired feature data into a neural network, and taking the output of the neural network as the second comprehensive contribution value.

4. The method according to claim 2, wherein the determining a first comprehensive contribution value according to the first contribution value of each acquired feature data includes:

and summing the first contribution value and the reference contribution value of each acquired feature data to obtain the first comprehensive contribution value.

5. The method of any of claims 1 to 4, wherein the first contribution value, the second contribution value and the execution probability are obtained by a behavior prediction model, the method further comprising:

acquiring training sample data, wherein the training sample data comprises a plurality of sample characteristic data and behavior labels of sample behaviors, the types of any two sample characteristic data are different, and the behavior labels are used for indicating whether a user executes the sample behaviors;

obtaining a first reference contribution value of each sample feature data in the plurality of sample feature data to the sample behavior;

processing every N sample characteristic data in the plurality of sample characteristic data by adopting a corresponding characteristic interaction model to obtain a second reference contribution value of every N sample characteristic data to the sample behavior;

determining the execution probability of the sample behavior according to the obtained first reference contribution value of each sample characteristic data and the obtained second reference contribution value of each N sample characteristic data;

and adjusting model parameters of the behavior prediction model and continuing training according to the difference between the execution probability of the sample behavior and the behavior label until the training stopping condition is met, and finishing training to obtain the behavior prediction model after the model parameters are adjusted.

6. The method according to claim 5, wherein the determining the execution probability of the sample behavior according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of every N sample feature data comprises:

summing the obtained first reference contribution value and the reference contribution value of each sample feature data to obtain a first reference comprehensive contribution value;

inputting the second reference contribution value of each acquired sample feature data into a neural network, and taking the output of the neural network as a second reference comprehensive contribution value;

according to a preset weight value, carrying out weighted summation on the first reference comprehensive contribution value and the second reference comprehensive contribution value to obtain the execution probability of the sample behavior;

the model parameters include at least: the reference contribution value, weights and biases between neurons in the neural network, and the preset weight value.

7. The method of any of claims 1 to 4, wherein the feature interaction model comprises: a kernel function.

8. The method of any one of claims 1 to 4, wherein N is 2.

9. The method of any of claims 1 to 4, wherein the behavior prediction information comprises: user attribute data, current environment data, and attribute data of an execution object of the specified behavior.

10. A behavior prediction model training method is characterized in that the behavior prediction model comprises a plurality of feature interaction models, wherein each feature interaction model corresponds to N categories, the categories corresponding to any two feature interaction models are different, the categories are the categories of sample feature data in training sample data, and N is an integer greater than 1; the method comprises the following steps:

determining a feature identifier of each sample feature data in the plurality of sample feature data according to the corresponding relation between the sample feature data and the feature identifier, wherein the feature identifier is a code word or a vector meeting the requirements of a preset format;

respectively determining a first reference contribution value of each sample feature data in the plurality of sample feature data to the sample behavior according to the corresponding relation between the feature identification and the reference contribution value, wherein the first reference contribution value is used for indicating the degree of influence on executing the sample behavior, and the magnitude of the first reference contribution value is positively correlated with the magnitude of the degree of influence;

for every N sample feature data in the plurality of sample feature data, determining a corresponding feature interaction model from the plurality of feature interaction models;

respectively obtaining a feature vector corresponding to a feature identifier of each sample feature data in every N sample feature data, wherein the length of the feature vector corresponding to each feature identifier is equal;

for every N sample feature data in the plurality of sample feature data, processing the obtained N feature vectors by adopting a feature interaction model corresponding to N categories to which the N sample feature data belong to obtain a second reference contribution value of every N sample feature data to the sample behavior, wherein the second reference contribution value is used for indicating the degree of influence on executing the sample behavior, and the magnitude of the second reference contribution value is positively correlated with the degree of influence;

11. The method according to claim 10, wherein the determining the execution probability of the sample behavior according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of every N sample feature data comprises:

12. A user behavior prediction apparatus, the apparatus comprising:

the system comprises a first obtaining module, a second obtaining module and a third obtaining module, wherein the first obtaining module is used for obtaining behavior prediction information used for predicting a specified behavior, the behavior prediction information comprises a plurality of characteristic data, and the categories of any two characteristic data are different;

the second determining module is used for determining the feature identifier of each feature data in the plurality of feature data according to the corresponding relation between the feature data and the feature identifier, wherein the feature identifier is a code word or a vector meeting the requirements of a preset format;

a second obtaining module, configured to respectively determine, according to a corresponding relationship between feature identifiers and contribution values, a first contribution value of each feature data in the plurality of feature data to the specified behavior, where the first contribution value is used to indicate a degree of influence on execution of the specified behavior, and a magnitude of the first contribution value is positively correlated with a magnitude of the degree of influence;

a third determining module, configured to determine, for every N pieces of feature data in the plurality of pieces of feature data, a corresponding feature interaction model from a correspondence between feature interaction models and categories, where the correspondence includes a plurality of feature interaction models, each feature interaction model corresponds to N categories, and categories corresponding to any two feature interaction models are different;

a first processing module, configured to obtain a feature vector corresponding to a feature identifier of each feature data in every N feature data, where the feature vectors corresponding to the feature identifiers have the same length, and for each N feature data in the plurality of feature data, adopting one feature interaction model corresponding to N categories to which each N feature data belongs, processing the obtained N feature vectors to obtain a second contribution value of each N feature data to the specified behavior, wherein N is an integer greater than 1, a feature interaction model corresponding to any N feature data is determined by N categories to which the any N feature data belong, the second contribution value is used for indicating the degree of influence on the execution of the specified behavior, and the magnitude of the second contribution value is positively correlated with the magnitude of the degree of influence;

and the first determining module is used for determining the execution probability of the specified behavior according to the acquired first contribution value of each piece of feature data and the acquired second contribution value of each piece of N pieces of feature data.

13. The apparatus of claim 12, wherein the first determining module comprises:

the first determining submodule is used for determining a first comprehensive contribution value according to the acquired first contribution value of each feature data;

the second determining submodule is used for determining a second comprehensive contribution value according to the obtained second contribution value of each N pieces of feature data;

and the first summation submodule is used for weighting and summing the first comprehensive contribution value and the second comprehensive contribution value by adopting a preset weight value to obtain the execution probability.

14. The apparatus of claim 13, wherein the second determining submodule is configured to:

15. The apparatus of claim 13, wherein the first determining submodule is configured to:

16. The apparatus according to any one of claims 12 to 15, wherein the first contribution value, the second contribution value and the execution probability are obtained by a behavior prediction model, the apparatus further comprising:

the third acquisition module is used for acquiring training sample data, wherein the training sample data comprises a plurality of sample characteristic data and behavior labels of sample behaviors, the types of any two sample characteristic data are different, and the behavior labels are used for indicating whether a user executes the sample behaviors;

a fourth obtaining module, configured to obtain a first reference contribution value of each sample feature data in the plurality of sample feature data to the sample behavior;

the second processing module is used for processing every N sample characteristic data in the plurality of sample characteristic data by adopting a corresponding characteristic interaction model to obtain a second reference contribution value of every N sample characteristic data to the sample behavior;

a fourth determining module, configured to determine, according to the obtained first reference contribution value of each sample feature data and the obtained second reference contribution value of each N sample feature data, an execution probability of the sample behavior;

and the adjusting module is used for adjusting the model parameters of the behavior prediction model according to the difference between the execution probability of the sample behaviors and the behavior labels, and continuing training until the training is finished when the training stopping condition is met, so that the behavior prediction model after the model parameters are adjusted is obtained.

17. The apparatus of claim 16, wherein the fourth determining module is configured to:

18. The apparatus of any of claims 12 to 15, wherein the feature interaction model comprises: a kernel function.

19. The apparatus of any one of claims 12 to 15, wherein N is 2.

20. The apparatus of any of claims 12 to 15, wherein the behavior prediction information comprises: user attribute data, current environment data, and attribute data of an execution object of the specified behavior.

21. A behavior prediction model training device is characterized in that the behavior prediction model comprises a plurality of feature interaction models, wherein each feature interaction model corresponds to N categories, the categories corresponding to any two feature interaction models are different, the categories are the categories of sample feature data in training sample data, and N is an integer greater than 1; the device comprises:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring training sample data, the training sample data comprises a plurality of sample characteristic data and a behavior tag of a sample behavior, the types of any two sample characteristic data are different, and the behavior tag is used for indicating whether a user executes the sample behavior;

a module, configured to determine a feature identifier of each sample feature data in the multiple sample feature data according to a corresponding relationship between the sample feature data and the feature identifier, where the feature identifier is a codeword or a vector that meets a preset format requirement;

a second obtaining module, configured to respectively determine, according to a corresponding relationship between feature identifiers and reference contribution values, a first reference contribution value of each sample feature data in the plurality of sample feature data for the sample behavior, where the first reference contribution value is used to indicate a degree of influence on execution of the sample behavior, and a magnitude of the first reference contribution value is positively correlated with a level of the degree of influence;

a processing module for respectively obtaining the feature vector corresponding to the feature identifier of each sample feature data in each N sample feature data, wherein the length of the feature vector corresponding to each feature identifier is equal, and for determining, for every N sample feature data of the plurality of sample feature data, a corresponding one of the plurality of feature interaction models, and for every N sample feature data in the plurality of sample feature data, adopting a feature interaction model corresponding to N categories to which the N sample feature data belong, processing the obtained N characteristic vectors to obtain a second reference contribution value of every N sample characteristic data to the sample behavior, the second reference contribution value is used for indicating the degree of influence on executing the sample behavior, and the magnitude of the second reference contribution value is positively correlated with the level of the influence;

the determining module is used for determining the execution probability of the sample behavior according to the acquired first reference contribution value of each sample characteristic data and the acquired second reference contribution value of each N sample characteristic data;

22. The apparatus of claim 21, wherein the determining module is configured to:

23. A server, characterized in that the server comprises: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for predicting user behavior according to any one of claims 1 to 9 or the method for training a behavior prediction model according to claim 10 or 11 when executing the computer program.

24. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to perform a method of predicting user behavior according to any one of claims 1 to 9, or a method of training a behavior prediction model according to claim 10 or 11.