CN114049529A

CN114049529A - User behavior prediction method, model training method, electronic device, and storage medium

Info

Publication number: CN114049529A
Application number: CN202111107783.XA
Authority: CN
Inventors: 张霄; 孟二利; 王斌
Original assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd; Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date: 2021-09-22
Filing date: 2021-09-22
Publication date: 2022-02-15

Abstract

The disclosure relates to a user behavior prediction method, a model training method, an electronic device, and a storage medium. The method comprises the steps of obtaining target data of a user aiming at a target product; inputting the target data into a pre-trained prediction model to obtain a first class of characteristics and a second class of characteristics at least used for representing the use states of different classes of users aiming at the target product, and common characteristics of the different classes of users aiming at the use states of the target product; predicting a first probability that the user is a first class of user who changes to the target product having a first predetermined identification based on the first class of features and the common features. The probability that the user is the first type of user is determined through the first fusion feature obtained by fusing the common features, and compared with the probability that the first probability is determined only through the first user feature, the probability prediction accuracy is higher.

Description

User behavior prediction method, model training method, electronic device, and storage medium

Technical Field

The present disclosure relates to the field of information technologies, and in particular, to a user behavior prediction method, a model training method, an electronic device, and a storage medium.

Background

Neural networks are dynamic systems with directed graph topology that process information by responding to continuous or intermittent inputs. The neural network has the function of simulating or replacing the functions related to human thinking, can realize automatic diagnosis and problem solving, and solves the problems which cannot be solved or are difficult to solve by the traditional method. The neural network theory has achieved wide success in a variety of research fields such as pattern recognition, automatic control, signal processing, decision assistance, artificial intelligence and the like, and includes application to the field of data processing to predict the occurrence probability of an event and the like.

Disclosure of Invention

The present disclosure provides a user behavior prediction method, a model training method, an electronic device, and a storage medium.

In a first aspect of the embodiments of the present disclosure, a method for predicting user behavior is provided, where the method includes:

acquiring target data of a user for a target product;

inputting the target data into a pre-trained prediction model to obtain a first class of characteristics and a second class of characteristics at least used for representing the use states of different classes of users aiming at the target product, and common characteristics of the different classes of users aiming at the use states of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

predicting a first probability of the first class of users according to the first class of features and the common features; wherein the first class of users are users who change the target product to the target product with a first predetermined identification.

In some embodiments, the method further comprises:

predicting a second probability that the user is a second type of user according to the second type of feature and the common feature; and/or the presence of a gas in the gas,

predicting a third probability that the user is a third type of user according to the common characteristics;

wherein the second type of user is a user who changes the target product into the target product without the first predetermined identifier; the third type of user is a user who changes the target product, and changes the target product with the first predetermined identification and the target product without the first predetermined identification.

In some embodiments, the method further comprises:

and sending promotion information of the target product with the first preset identification to the user according to the relation between the first probability and a first probability threshold.

In some embodiments, the second type of user is a user who changes the target product to the target product having a second predetermined identification;

the method further comprises the following steps:

and sending promotion information of the target product with the first preset identification to the user or sending promotion information of the target product with the second preset identification according to the first probability and the second probability.

In some embodiments, the method further comprises:

comparing the magnitudes of the first probability and the second probability when the third probability is greater than a second probability threshold;

sending promotional information for the target product with the first predetermined identification to the user when the first probability is greater than the second probability.

In some embodiments, said predicting a first probability that the user is a first class of user based on the first class of features and the common features comprises:

fusing the first class features and the common features based on a first fusion weight to obtain first fusion features, wherein the first fusion weight is used for distributing a first fusion proportion of the first class features and the common features;

and predicting a first probability that the user is the first class of user according to the first fusion characteristic.

In some embodiments, said predicting a second probability that the user is a second class of user based on the second class of features and the common features comprises:

fusing the second type of features and the common features based on a second fusion weight to obtain second fusion features, wherein the second fusion weight is used for distributing a second fusion proportion of the second type of features and the common features;

and predicting a second probability that the user is the second type of user according to the second fusion characteristic.

According to a second aspect of the present disclosure, there is provided a model training method, the method comprising:

inputting the sample data of the user aiming at the target product into a prediction model to obtain a first class of characteristics and a second class of characteristics at least used for representing the use state of different classes of users aiming at the target product, and common characteristics of the different classes of the users aiming at the use state of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

and training the prediction model according to the first class of features, the second class of features, the common features and the sample data.

In some embodiments, the inputting the sample data into a prediction model to obtain at least a first class of features, a second class of features, and common features for characterizing a use state of a user for the target product includes:

and respectively inputting sample data into a first module, a second module and a third module of the prediction model, and respectively extracting a first class of characteristics corresponding to the first module, a second class of characteristics corresponding to the second module and common characteristics corresponding to the third module.

In some embodiments, said training said predictive model based on said first class of features, said second class of features, said common features, and said sample data comprises:

fusing the first class of features and the common features based on a first fusion weight to obtain a first fused feature,

fusing the second type of features and the common features based on a second fusion weight to obtain second fusion features;

according to the first fused feature, the second fused feature and the common feature, respectively determining a first predicted value corresponding to the first fused feature, a second predicted value corresponding to the second fused feature and a third predicted value corresponding to the common feature;

and training the prediction model according to the first predicted value, the second predicted value, the third predicted value and the label of the sample data.

In some embodiments, said training said predictive model based on said first predictive value, said second predictive value, said third predictive value, and said label of said sample data comprises:

obtaining a first loss value according to the first predicted value and the label of the sample data;

obtaining a second loss value according to the second predicted value and the label of the sample data;

obtaining a third loss value according to the third predicted value and the label of the sample data;

obtaining a training loss value according to the first loss value, the second loss value and the third loss value;

and updating the network parameters of the prediction model according to the training loss value.

In a third aspect of the embodiments of the present disclosure, there is provided a user behavior prediction apparatus, including:

the first acquisition module is used for acquiring target data of a user aiming at a target product;

the extraction module is used for inputting the target data into a pre-trained prediction model to obtain a first class of characteristics and a second class of characteristics which are at least used for representing the use states of different classes of users for the target product, and common characteristics of the different classes of users for the use states of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

the first prediction module is used for predicting a first probability that the user is a first class user according to the first class characteristics and the common characteristics, wherein the first class user is a user who changes the target product into the target product with a first preset identification.

In some embodiments, the apparatus further comprises:

a second prediction module, configured to predict, according to the second class of features and the common features, a second probability that the user is a second class of users;

the third prediction module is used for predicting a third probability that the user is a third type of user according to the common characteristics;

In some embodiments, the apparatus further comprises:

and the first recommending module is used for sending promotion information of the target product with the first preset identification to the user according to the relation between the first probability and the first probability threshold.

the device further comprises:

and the second recommending module is used for sending the promotion information of the target product with the first preset identification or sending the promotion information of the target product with the second preset identification to the user according to the first probability and the second probability.

In some embodiments, the apparatus further comprises:

a comparing module, configured to compare the first probability and the second probability when the third probability is greater than a second probability threshold;

a third recommending module for sending promotion information of the target product with the first predetermined identification to the user when the first probability is greater than the second probability.

In some embodiments, the first prediction module is further configured to:

fusing the first class features and the common features based on a first fusion weight to obtain the first fusion features, wherein the first fusion weight is used for distributing a first fusion proportion of the first class features and the common features;

and predicting a first probability that the user is the class of users according to the first fusion characteristic.

In some embodiments, the second recommendation module is further configured to:

fusing the second type of features and the common features based on a second fusion weight to obtain second fusion features, wherein the second fusion weight is used for distributing a second fusion proportion of the second type of features and the second common features;

In a fourth aspect of the embodiments of the present disclosure, there is provided a model training apparatus, the apparatus including:

the first training module is used for inputting the sample data of the user aiming at the target product into a prediction model to obtain a first class of characteristics and a second class of characteristics which are at least used for representing the use states of different classes of users aiming at the target product, and common characteristics of the different classes of users aiming at the use states of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

and the second training module is used for training the prediction model according to the first class of features, the second class of features, the common features and the sample data.

In some embodiments, the first training module is to:

and respectively inputting sample data into the feature extraction network branches of the first module, the second module and the third module of the prediction model, and respectively extracting a first class of features corresponding to the first module, a second class of features corresponding to the second module and a common feature corresponding to the third module.

In some embodiments, the second training module is to:

fusing the first class features and the common features based on first fusion weights to obtain first fusion features, and fusing the second class features and the common features based on second fusion weights to obtain second fusion features;

In some embodiments, the second training module is to:

obtaining a third loss value according to the third predicted value and the label of the sample data; obtaining a training loss value according to the first loss value, the second loss value and the third loss value;

In a fifth aspect of the embodiments of the present disclosure, there is provided an electronic device, including: a processor and a memory for storing a computer program operable on the processor, wherein the processor is operable to perform the steps of the method of the first or second aspect when executing the computer program.

In a sixth aspect of the embodiments of the present disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements the steps of the method of the first or second aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the user behavior prediction method in the embodiment of the disclosure, target data of a user for a target product is input into a pre-trained prediction model, and a first class feature, a second class feature and a common feature are extracted; and predicting a first probability that the user is a first type of user according to the first type of characteristics and the common characteristics, wherein the first type of user is a user who is replaced by a target product with a first preset identification. Since there are two different situations in which the user may change to the target product having the first predetermined identifier and the target product not having the first predetermined identifier at the same time when predicting whether the user changes to the target product having the first predetermined identifier. When the user features are extracted from the target data and used for probability prediction, the common features belonging to different types of users are ignored, for example, when the first type of features are extracted, the common features are judged to be the second features, which causes an error in predicting the probability that the user is the first type of user based on the first features.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow diagram illustrating a method for user behavior prediction, according to an example embodiment.

FIG. 2 is a diagram illustrating a distribution of user change scenario, according to an exemplary embodiment.

FIG. 3 is a block diagram illustrating a predictive model probabilistic predictive structure according to an exemplary embodiment.

FIG. 4 is a diagram illustrating a task-unique feature (first class of features/second class of features) and common feature fusion process in a predictive model according to an exemplary embodiment.

FIG. 5 is a block diagram illustrating the structure of the third module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment.

FIG. 6 is a block diagram illustrating the structure of the third module and the first module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment.

FIG. 7 is a block diagram illustrating the structure of the third module and the second module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating the structure of the third module, the second module and the first module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment.

Fig. 9 is a schematic structural view of a CGC shown in accordance with an exemplary embodiment.

Fig. 10 is a schematic diagram illustrating a structure of a user behavior prediction apparatus according to an exemplary embodiment.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of devices consistent with certain aspects of the present disclosure, as detailed in the appended claims.

In the embodiment of the present disclosure, the user behavior prediction method is applied to an electronic device, which may be a mobile device, for example: cell-phone, panel computer, notebook computer, unmanned aerial vehicle or wearable electronic equipment etc. or, this electronic equipment is fixed equipment, for example: desktop computers or televisions, etc.; in some possible implementations, the electronic device may include a server, which may include a cloud server and/or a local server.

FIG. 1 is a flow diagram illustrating a method for user behavior prediction, according to an example embodiment. As shown in fig. 1, the user behavior prediction method includes:

step 10, acquiring target data of a user for a target product;

step 11, inputting the target data into a pre-trained prediction model to obtain a first class of characteristics and a second class of characteristics at least used for representing the use states of different classes of users aiming at the target product, and common characteristics of the different classes of users aiming at the use states of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

step 12, predicting a first probability that the user is a first class user according to the first class feature and the common feature; wherein the first class of users are users who change the target product to the target product with a first predetermined identification.

In the embodiment of the disclosure, the user behavior prediction method can be applied to prediction of user behavior execution. And pushing information to the user according to the user behavior execution. For example, information is recommended to the user through multitask learning. The system applicable to the user behavior prediction method can comprise the field of pushing, a Multi-task learning model based on an Mmoe (Multi-gate Mixture-of-Experts) structure is used for training and optimizing the model according to a user participation and user satisfaction target, wherein the user participation target comprises behaviors of clicking, watching time and the like of a user, and the user satisfaction target comprises behaviors of praise, comment and the like of the user;

or, the method is applied to the advertisement field, a multi-task learning model is used, the problem of directly predicting the CVR conversion rate is converted into a multi-task learning task for learning the CTR click rate and the CTCVR, and the data of the CTR and the CTCVR are used for training, so that the challenges that the CVR is difficult to directly predict and the direct CVR data are extremely sparse are solved;

or, the method is applied to the NLP (Natural Language Processing) field, where the inputs of tasks such as chinese segmentation, part of speech tagging (POS), Named Entity Recognition (NER), and grammar parsing are intersected with each other, for example, the result of segmentation is required for part of speech tagging, and the result of part of speech tagging is required for named entity recognition. In practical application in the industry, the tasks are combined into one model, and a multi-task learning mode is used for learning, so that the effect of each single task is improved, and the expense of the model is also saved.

In the embodiment of the present disclosure, when the user behavior prediction method is applied to prediction of probability of replacing an article or a service by a user, user data of the user for a target product may include: basic information and action information of a user; the basic information is used for indicating the inherent identity information of the user at present, and the action information is used for indicating the operation of the user on goods or services;

wherein, the basic information at least comprises information describing the characteristics of the user such as the age, the sex, the academic calendar and/or the income of the user, and the action information at least comprises the frequency of using goods or services, and the like. The first predetermined identification comprises at least one of: name, name of manufacturing enterprise, trademark, specification or product type, etc. But is not limited thereto.

In some embodiments, the user behavior prediction method comprises the steps of:

acquiring target data of a user for a target product;

inputting the target data into a feature extraction layer of a pre-trained prediction model to obtain a first class feature, a second class feature and a common feature; wherein the feature extraction layer comprises: the system comprises a first module for extracting first class characteristics of a first class of users, a second module for extracting second class characteristics of a second class of users and a third module for extracting common characteristics belonging to the first class of users and the second class of users simultaneously; the first class of users are users who change to the target product with a first predetermined identification; the second type of users are users who are not replaced by the target product with the first preset identification, and the third type of users are: intersection users of the first class of users and the second class of users;

and fusing the first class characteristics and the first common characteristics to obtain first fusion characteristics, and obtaining a first probability that the user is the first class user based on the first fusion characteristics.

In the embodiment of the disclosure, the first-class characteristics, the second-class characteristics and the common characteristics are extracted by inputting the target data of a user for a target product into a pre-trained prediction model; and predicting a first probability that the user is a first type of user according to the first type of characteristics and the common characteristics, wherein the first type of user is a user who is replaced by a target product with a first preset identification. Since there are two different situations in which the user may change to the target product having the first predetermined identifier and the target product not having the first predetermined identifier at the same time when predicting whether the user changes to the target product having the first predetermined identifier. When the user features are extracted from the target data and used for probability prediction, the common features belonging to different types of users are ignored, for example, when the first type of features are extracted, the common features are judged to be the second features, which causes an error in predicting the probability that the user is the first type of user based on the first features.

In the embodiment of the disclosure, the user features may be extracted through a feature extraction layer in a prediction model, where the feature extraction layer includes a first module for extracting a first class feature of a first class of users, a second module for extracting a second class feature of a second class of users, and a third module for extracting common features belonging to both the first class of users and the second class of users. The feature extraction layer is a ring in the whole neural network model and is used for extracting features which are useful for predicting action probability of the user in the user data. Wherein, according to the different use status of the user for the target product, for example: according to different actions executed by users, different types of users at least comprise three types, which respectively comprise: the first class users, the second class users and the third class users.

In the present disclosure, a third type of user may be determined as a user who changes a target product, for example, a user who performs a change or purchases an article or service; the first class of users are determined as users who perform preset operations on target products with first preset identifications, such as users who replace or purchase first goods or services in goods or services; the second type of user is determined to be a user who performs a preset operation on a target product without the first predetermined identification, for example, a user who replaces or purchases a second one of the items or services.

Taking the replacement of the article or the service as an example, the user characteristics include at least three types, which are respectively a first type of characteristic, a second type of characteristic, and a common characteristic, wherein the first type of characteristic is a user characteristic corresponding to a first type of user, and the user can be determined to be a user who performs a preset operation on the target product object of the first predetermined identifier through the first type of characteristic.

The first type of feature may be a feature common to users who replace the target product with a first predetermined identification. Similarly, the second type of feature is a user feature corresponding to the second type of user, and the first probability that the user is the user who is changed to the target product with the first predetermined identifier can be determined through the first type of feature.

The common characteristics are the characteristics of the users belonging to the first class of users and the second class of users at the same time, namely the characteristics common to the first class of users and the second class of users. Because when the use state of the user for the target product is determined, the user may replace the target product with different predetermined identification at the same time, for example: the user changes to a target product with the first predetermined identification and to a target product without the first predetermined identification. Accordingly, a user characteristic common to different types of users whose usage states include the target product of the first predetermined identification and the target product without the first predetermined identification can be determined as the common characteristic.

In some embodiments, the method further comprises:

predicting a second probability that the user is a second type of user according to the second type of feature and the common feature;

determining a third probability that the user is the third type of user according to the common characteristics, wherein the second type of user is a user who changes the target product into the target product without the first predetermined identifier; the third type of user is a user who changes the target product, and changes the target product with the first predetermined identification and the target product without the first predetermined identification.

The third class of users comprises the first class of users and the second class of users.

In the embodiment of the present disclosure, when it is determined that a user is a second-class user, there is a case where the user is both a first-class user and a second-class user, and when a user feature is extracted from user data and used for probability prediction, there is a case where a common feature that belongs to both the first-class user and the second-class user is ignored, for example, when the second-class feature is extracted, the common feature is determined to be the first-class feature, which causes an error in predicting a probability that the user is the second-class user based on the second-class feature.

Thus, the disclosed embodiments have not only higher probability prediction accuracy for the first probability prediction, but also higher probability prediction accuracy for the second probability prediction.

In the embodiment of the present disclosure, a third probability that the user is a third type of user may also be determined according to the common feature. The third class of users comprises the first class of users and the second class of users. And determining a third probability size of the users as the third class of users through the common characteristics.

In an embodiment of the present disclosure, the target product at least includes: the article comprises at least equipment electronic equipment such as a mobile phone, a computer and a tablet, and the service comprises at least information services which can be provided by the equipment electronic equipment, such as video playing, music playing, book reading and the like.

In some embodiments, the method further comprises:

If the first probability is larger than or equal to the first probability threshold, the tendency that the user purchases the target product with the first preset identification is larger, so that the promotion information sent by the user is more inclined to the selection requirement of the user, and the information pushing quality is improved.

the method further comprises the following steps:

If the second probability is greater than or equal to the second probability threshold, it indicates that the user has a greater propensity to purchase the target product having the second predetermined identification.

The target product without the predetermined identifier may be a target product including a plurality of predetermined identifiers, or may be a target product including one predetermined identifier, for example: and the target product without the preset identification is the target product with a second preset identification, and at the moment, a second probability that the user is a second type of user can be predicted through the prediction model so as to send promotion information to the user. Therefore, by utilizing the multi-task function of the prediction model, different promotion information can be sent to different types of users by utilizing the first probability and the second probability at the same time, and the information pushing efficiency is improved. And if the target product without the preset identification is the target product with a second preset identification, the second preset identification is different from the first preset identification. The target product having the first predetermined identification is a brand of item or service and the target product having the second predetermined identification is an item or service other than the first predetermined identification. For example, a first type of user is replaced with a first brand of cell phone, a second type of user is replaced with a second brand of cell phone, and so on.

In the embodiment of the present disclosure, after determining the probability that the target object is the first class of user or the second class of user, the method may send the popularization information of the first operation object or the second operation object to the user according to the magnitudes of the first probability and the second probability, including:

sending the promotion information of said target product with a first predetermined identification to the user when the first probability is greater than a first probability threshold, or sending the promotion information of said target product with a second predetermined identification to the user when the second probability is greater than the first probability threshold, or sending the promotion information of the target product with a first predetermined identification to the user when the first probability is greater than the second probability, or,

and when the second probability is greater than the first probability, sending promotion information aiming at the target product with the second preset identification to the user, so that the promotion information sent by the user is more inclined to the selection requirement of the user.

In the disclosed embodiment, when the first probability is greater than the second probability, the target product is indicated as being more prone to be replaced with the target product having the first predetermined identification. At this time, promotion information of the target product having the first predetermined identifier may be transmitted to a user. Conversely, the instructions are for a target product that is more prone to be changed with the second predetermined identification. At this moment, the promotion information of the target product with the second preset identification can be sent to the user, so that product promotion can be better carried out, and the information pushing quality is improved.

In an embodiment of the present disclosure, the first probability threshold is used to determine a user's propensity to replace a target product of a different predetermined identification. In a specific application, the target product with the first predetermined identifier may be a first brand of mobile phone, and when it is determined according to the probability judgment that the user tends to purchase the first brand of mobile phone, popularization information about the first brand of mobile phone, including model information, price information, performance information and the like of the first brand of mobile phone, may be sent to the user, so as to better promote the product and improve the information push quality. Similarly, when the second probability is greater than the first probability threshold, it is indicated that the user is more inclined to replace the mobile phone with the second brand at the moment, and the promotion information for the mobile phone with the second brand is sent to the user at the moment, so that product promotion can be better performed, and the information pushing quality can be improved.

In some embodiments, the method further comprises:

In the embodiment of the disclosure, when the popularization information is sent to the user according to the first probability and the second probability, the third probability may be further determined, and when the user is a third type of user, the user may be the first type of user or the second type of user, so that the size of the probability that the user is the third type of user may be determined by the third probability, and when the third probability is greater than the first probability threshold, the sizes of the first probability and the second probability are compared, which is beneficial to improving the information push efficiency for the target object.

In some embodiments, said predicting a first probability that the user is a first category user changing to the target product having a first predetermined identification based on the first category characteristic and the common characteristic comprises: and fusing the first user characteristic and the common characteristic to obtain a first fused characteristic, and predicting the first probability that the user is a first class of user who is changed into the target product with a first preset identification according to the first fused characteristic.

In some embodiments, said fusing said first class of features and said common features to obtain a first fused feature comprises:

and fusing the first class features and the common features based on a first fusion weight to obtain the first fusion features, wherein the first fusion weight is used for distributing a first fusion proportion of the first class features and the common features.

fusing the first user characteristic and the common characteristic to obtain a second fused characteristic;

and predicting a second probability that the user is a second type of user according to the second fusion characteristic.

In some embodiments, said fusing said first user characteristic and said common characteristic to obtain a second fused characteristic comprises:

and fusing the second type of features and the common features based on a second fusion weight to obtain the second fusion features, wherein the second fusion weight is used for distributing a second fusion proportion of the second type of features and the common features.

In the embodiment of the present disclosure, when determining that a user is a first class user or a second class user, it is necessary to determine the probability that the user is the first class user based on the first fusion feature, and determine the probability that the user is the second class user based on the second fusion feature. The first fusion characteristic is obtained based on the fusion of the first-class characteristic and the common characteristic, and the fusion mode comprises the following steps: and distributing a first fusion proportion of the first class of features and the common features based on the first fusion weight for fusion.

The determination of the first fusion ratio comprises: the first fusion proportion is determined based on the predicted impact of the first class of features and the common features on the first probability, respectively. For example, when the influence of the first class of features on the predicted first probability is greater than the influence of the common features on the predicted first probability, the proportion of the first fusion ratio is biased toward the proportion of the first class of features. The proportion of the first fusion ratio tends to share the common feature's dominance when the first class of features has less influence on the predicted first probability than the shared feature.

Similarly, the determining of the second fusion ratio includes: determining the second fusion ratio based on the predicted impact of the second class of features and the common features on the second probability, respectively. For example, when the influence of the second class of features on the predicted first probability is greater than the influence of the common features on the predicted second probability, the proportion of the second fusion ratio tends to be the proportion of the second class of features. The proportion of the second fusion ratio tends to share the proportions of the common features when the influence of the second class of features on the predicted first probability is less than the influence of the common features on the predicted second probability. Through the adjustment of the fusion proportion among the user characteristics, the accuracy of probability prediction is improved.

An embodiment of a second aspect of the present disclosure provides a model training method, where the method includes:

The predictive model in the embodiment of the second aspect refers to the predictive model in the embodiment of the first aspect.

As shown in fig. 5, the first module, the second module and the third module are all feature extraction layers, and each feature extraction layer can be represented by Expert.

As shown in fig. 5, the prediction model further includes a first fusion layer (Gate1) fusing the first class of features and the common features, and a second fusion layer (Gate2) fusing the second class of features and the common features.

As shown in fig. 5, the prediction model further includes three prediction layers, which are respectively prediction for retention users, prediction for auxiliary task users, and prediction for churn users. Wherein, the retention user prediction and the attrition user prediction can also become main tasks. The results of these three prediction layers are: and reserving user prediction probability (corresponding to the first prediction value, corresponding to the first probability when used for prediction), loss user prediction probability (corresponding to the second prediction value, corresponding to the second probability when used for prediction), and auxiliary task prediction probability (corresponding to the third prediction value, corresponding to the third probability when used for prediction).

In some embodiments, as shown in fig. 5, the predictive model of an embodiment of the disclosure includes:

the system comprises an input layer, three feature extraction layers, two feature fusion layers and three prediction layers; the input layer is used for accepting input of data (including sample data and target data); the output result of the input layer is input into a feature extraction layer, and the feature extraction layer comprises a first module, a second module and a third module respectively; inputting the output result of the feature extraction layer into a feature fusion layer, wherein the feature fusion layer comprises the first fusion layer and the second fusion layer respectively; and inputting the output result of the feature fusion layer into a prediction layer, wherein the prediction layer respectively comprises the reserved user prediction, the auxiliary task user machine change prediction and the loss user prediction.

In some embodiments, the predictive model comprises: the first module, the second module and the third module;

the model training method comprises the following steps:

inputting sample data into the first module, the second module and the third module to respectively obtain a first predicted value corresponding to the first module, a second predicted value corresponding to the second module and a third predicted value corresponding to the third module;

In the disclosed embodiment, the prediction model comprises at least a first module, a second module and a third module for feature extraction. The training sample data is input to network branches where the first module, the second module and the third module are respectively located, so that a first predicted value corresponding to the first module, a second predicted value corresponding to the second module and a third predicted value corresponding to the third module are respectively obtained.

In practical applications, the sample data includes user data and target product data. The sample data feature attributes are divided into five types, namely String (first type attribute), StringList (second type attribute), Int (third type attribute), IntList (fourth type attribute) and Double (fifth type attribute); string characteristics comprise two types of relatively fixed information of the property of a user for changing the phone and the property of electronic equipment (including but not limited to a mobile phone, a tablet computer and the like), wherein the user property comprises the gender, the age, the province, the city, the academic history and the income of the user, and the property of the mobile phone is the type, the storage space and the maintenance state; the StringList characteristics comprise three types of push information (push information), user purchase data and purchase logs, and each type of information has respective typical characteristic attributes; the Double feature comprises statistical information of the use duration and times of 16 APP primary classifications such as 'chat social contact', 'efficiency office', and the like, statistical information of the use duration and times of 24 mobile phone APPs such as Taobao, millet mall, Jingdong, tremble, Kuangou, and the like, and the search times of a user; the Int characteristics comprise communication failure times, use duration, purchase data number, purchase number, imsi information and other data of the mobile phone within one month; IntList is characterized by the percentage of electricity consumed by the electronic device per day for 30 days.

In practical application, before inputting sample data into a prediction model for training, training data can be sorted first, sample data characteristics of the 5 th class of attributes are spliced to obtain multiple classes of characteristics with different attributes, a user characteristic vector matrix is constructed based on the characteristics to obtain input data of a training set, and output data respectively correspond to binary labels of a machine changing task, a retention task and a loss task.

Obtaining a first loss value through the first predicted value and a label of the sample data in the network training, obtaining a second loss value according to the second predicted value and the label of the sample data, and obtaining a third loss value according to the third predicted value and the label of the sample data; obtaining a training loss value according to the first loss value, the second loss value and the third loss value; and fitting each predicted value and the data label through the LOSS function to obtain each LOSS value. The label of the sample data may include: and the data is used for indicating that the user in the sample data is the determined first class user, second class user or third class user. For example, if the user determines that the user is a first type of user, a second type of user, or a third type of user, the tags are all 1, otherwise, the tags are 0.

In an embodiment of the disclosure, the LOSS function may include a first LOSS function L for fitting the first predicted value to the label_miFitting the second predicted value to the second LOSS function L of the label_otFitting a third LOSS function L of the third predicted value and the label_change. Wherein L is_total＝L_mi+L_ot+L_change. And accumulating the first loss value, the second loss value and the third loss value to obtain a training loss value. And updating the network parameters of the prediction model according to the obtained training loss value.

In some embodiments, the network parameters of the predictive model include at least one of:

a first weight of a network node within the first module;

a second weight of a network node within the second module;

a third weight of a network node within the third module;

a first fusion weight for the first fusion feature and a second fusion weight for the second fusion feature.

In the embodiment of the present disclosure, the sample data is input into the prediction model, and the multi-order features are extracted from the sample data in the prediction model, where the multi-order features include: low-order features e and high-order features d above the second order. Wherein the low-order features include first-order features and second-order features.

Characteristic of the first kind f¹The low-order characteristic and the high-order characteristic are fused through a first weight;

wherein the content of the first and second substances,

x(input)＝[e₁,e₂,...,e_k,d₁,d₂,d,...,d_n]；k∈[1，2]，n∈N⁺；

is a first weight, b¹A first bias corresponding to a first type of feature. Wherein the low-order features e are used to express a single feature or an association of two features, and the high-order features d are used to express an association of multiple features. When the features are extracted, the relevance between partial features of the user is high, and at the moment, the relevance can be highThe multi-order feature extraction is carried out on the features, and the comprehensive features after the correlation of a plurality of features are extracted. E.g. d_nAnd expressing the comprehensive characteristics extracted after the n characteristics with high relevance are relevant, wherein n is more than or equal to 3. For example, when n is 3, the description is of feature extraction after 3 features are associated. For example, for the user, the association degree of three features, namely the academic feature, the income feature and the work feature, is high, and at the moment d₃It can be expressed as feature extraction after the above-mentioned 3 feature correlations. The correlation method can realize comprehensive feature extraction by adopting a weighting and proportion distribution mode.

Characteristic of the second kind f²The low-order characteristic and the high-order characteristic are fused through a second weight;

wherein the content of the first and second substances,

x(input)＝[e₁,e₂,...,e_k,d₁,d₂,d,...,d_n]；k∈[1，2]，n∈N⁺；

is a second weight, b²A second bias corresponding to a second type of feature, e_k、d_nAs in the previous embodiment.

Common feature f³The low-order characteristic and the high-order characteristic are fused through a third weight;

wherein the content of the first and second substances,

x(input)＝[e₁,e₂,...,e_k,d₁,d₂,d,...,d_n]；k∈[1，2]，d∈N⁺；

is a third weight, b³A third bias corresponding to the common characteristic.

The first fusion characteristic is g^1(x)，g^1(x)＝W^1(X)*S¹，S¹＝[f¹，f³](ii) a Wherein, W^1(X)Is a first fusion weight;

the second fusion characteristic is g^2(x)，g^2(x)＝W^2(X)*S²，S²＝[f²，f³](ii) a Wherein, W^2(X)Is a second fusion weight; wherein, W^1(X)、W^2(X)Can be obtained by activating the function Softmax. Wherein the content of the first and second substances,

the prediction model is a weight coefficient of a full connection layer, the corresponding dimension is (dk + dc) × dx, dx represents the dimension of the input feature X (sample data), dk and dc respectively represent the number of Experts of the main task feature layer and the shared layer, and both the prediction model can be set to be 1.

In some embodiments, said updating network parameters of said predictive model based on said training loss values comprises:

when the training result of the prediction model is determined not to meet the training stopping condition, updating the network parameters of the prediction model according to the training loss value;

and when the training result of the prediction model meets the training stopping condition, stopping updating the network parameters of the prediction model.

In the embodiment of the present disclosure, after the training loss value is obtained, the network parameters of the prediction model are updated according to the training loss value, including updating the network parameters in the above embodiments, such as the first weight, the second weight, the third weight, the first bias, the second bias, the third bias, the first fusion weight, and the second fusion weight. And when the training result of the prediction model meets the training stopping condition, the updating of the network parameters can be stopped, and if the training result of the prediction model does not meet the training stopping condition, the network parameters are continuously updated until the training result meets the training stopping condition.

Wherein the training stopping condition is used for indicating that the training result meets the application condition of the prediction model. For example, the training stop condition may include that the training index of the post-training prediction model satisfies the application, or that the data processing capability of the post-training prediction model satisfies the application, or the like.

In some embodiments, the updating the network parameters of the predictive model according to the training loss values includes at least one of:

when the training result of the prediction model is determined not to meet the training stopping condition, updating the network parameters of the first module and the network parameters of the third module according to the back propagation of the first loss value;

when the training result of the prediction model is determined not to meet the training stopping condition, updating the network parameters of the second module and the network parameters of the third module according to the back propagation of the second loss value;

and when the training result of the prediction model is determined not to meet the training stopping condition, updating the network parameters of the third module according to the back propagation of the third loss value.

In the embodiment of the present disclosure, when the network parameter is updated according to the training result, the network parameter associated with the loss value may be updated according to the back propagation of the loss value. For example, the first loss value is associated with a first predicted value, which corresponds to a first probability, associated with the first fused feature. Thus, the network parameters of the first module and the network parameters of the third module may be updated based on the back propagation of the first penalty value.

The second loss value is associated with a second predicted value, the second predicted value corresponding to the second probability, associated with the second fused feature, such that the network parameters of the second module and the network parameters of the third module can be updated based on back propagation of the second loss value.

The third loss value is associated with the predicted value, the third predicted value corresponds to the third probability and is associated with the common feature, and therefore the network parameter of the third module can be updated according to the back propagation of the third loss value until the training result of the prediction model meets the training stopping condition, and the updating of the network parameter is stopped.

In some embodiments, the training result of the predictive model satisfies a training stop condition, which includes at least:

and the matching degree of the prediction result obtained by the prediction model through training the sample data and the sample data label reaches a preset threshold value.

In the embodiment of the present disclosure, when the training result of the sample data is determined, the training result of the model may be determined according to the matching degree between the prediction result and the sample data label. For example, recall may be used as a performance assessment indicator for model training results. The recall rate is used for indicating the matching degree of the training result of the sample data and the sample data label. For example, the sample data is 1.13 hundred million user data, and 3000 ten thousand users most likely to become the first class of users are predicted through training analysis; and confirming and comparing the users with the real first class users to obtain the proportion of the first class users in the real first class users, namely the recall ratio, which is predicted to be correct. The larger the recall rate is, the better the pre-model training result is. Wherein, the sample data label is used for indicating the specific classification of the user in the sample data.

When the specific application scenario of the user behavior prediction method disclosed by the present disclosure is determined, the determination may be exemplarily determined as a mobile phone replacement probability prediction, and the scenario is merely exemplary and not limiting, and may also be applied to other scenarios.

FIG. 2 is a diagram illustrating a distribution of user change scenario, according to an exemplary embodiment. As shown in fig. 2, in the scene, a preset operation is determined to be a mobile phone replacement, a first operation object is determined to be a target brand mobile phone, a second operation object is determined to be a non-target brand mobile phone, a first class of users is determined to be users who replace the target brand mobile phone when the mobile phone is replaced, namely, remaining users, a second class of users is determined to be users who replace the non-target brand mobile phone, namely, losing users, and a third class of users includes a first class of users and a second class of users, namely, users who replace the mobile phone. Wherein the intersection of the first type of users and the second type of users is the user who changes the brand and the non-brand mobile phone at the same time. The user behavior prediction method in the disclosure can be applied to prediction of probability of mobile phone replacement by the user shown in fig. 2, and prediction of probability of mobile phones replaced by the user being mobile phones of the brand and mobile phones not of the brand.

FIG. 3 is a block diagram illustrating a predictive model probabilistic predictive structure according to an exemplary embodiment. As shown in fig. 3, a first module is used to extract a first class of features in the input layer, a second module is used to extract a second class of features in the input layer, and a third module is used to extract common features in the input layer. The first class of features and the common features are fused in Gate1 to obtain a first fused feature, and the second class of features and the common features are fused in Gate2 to obtain a second fused feature. And inputting a first fusion feature in the full connection layer TOWER1, and predicting and outputting to obtain a first probability, namely a user-remaining prediction probability. And inputting a second fusion characteristic in the full connection layer TOWER2, and predicting and outputting to obtain a second probability, namely the loss user prediction probability.

FIG. 4 is a diagram illustrating a task-unique feature (first class of features/second class of features) and common feature (common feature) fusion process in a predictive model according to an exemplary embodiment. As shown in fig. 4, when the common features and the task unique features are fused, a feature fusion ratio, i.e., a first fusion weight or a second fusion weight, is determined. When fusion weight is determined, inputting the feature X into FC for weighting processing, obtaining fusion weight after activation function Softmax processing, distributing fusion proportion between common features and task unique features through the fusion weight on a feature splicing layer, splicing in proportion, and finally outputting the fused first fusion feature and the fused second fusion feature.

Fig. 5 is a schematic structural diagram illustrating a monitoring module monitoring training of a third module by adding an auxiliary task (a task for predicting a third probability that a user belongs to a third class of users) during training of predictive model data according to an exemplary embodiment. The auxiliary task user machine change prediction module can perform monitoring training on common characteristics in the third module so as to improve the model training quality.

FIG. 6 is a block diagram illustrating the structure of the third module and the first module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment. The auxiliary task user machine changing prediction module can perform monitoring training on the common features in the third module and the first class features in the first module so as to improve the model training quality.

FIG. 7 is a block diagram illustrating the structure of the third module and the second module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment. The auxiliary task user machine changing prediction module can perform monitoring training on the common features in the third module and the second class features in the second module so as to improve the model training quality.

FIG. 8 is a block diagram illustrating the structure of the third module, the second module and the first module monitoring training by the auxiliary task monitoring module during the training of the predictive model data according to an exemplary embodiment. The auxiliary task user machine changing prediction module can perform monitoring training on the common features in the third module, the first class of features in the first module and the second class of features in the second module so as to improve the model training quality.

Table 1 is a comparison table of recall rates (corresponding to probabilities in the embodiment of the present disclosure, where the recall rate of the user of the exchange corresponds to the third probability, the recall rate of the user of the retention corresponds to the first probability, and the recall rate of the user of the attrition corresponds to the second probability) of different monitoring situations of the auxiliary task. The recall rate is used for indicating the matching degree of the training result of the sample data and the sample data. For example, the sample data is 3000 ten thousand user data, and the number of users, which are the first class of users, in the 3000 ten thousand users is determined through training analysis; and confirming and comparing the users determined as the first class users with whether the users are really the first class users (namely whether preset operation is executed aiming at the first operation object), and determining the proportion of the obtained real first class users in the predicted first class users as the recall rate. The larger the recall rate is, the better the pre-model training result is. As shown in table 1, the auxiliary task is not connected to the recall rate of the replacement user, the recall rate of the reserved user, and the recall rate of the lost user corresponding to the main task (i.e., the auxiliary task monitoring module only monitors the training of the third module); the auxiliary task is connected with the recall rate of the change users, the recall rate of the reserved users and the recall rate of the lost users corresponding to the two main tasks (namely the auxiliary task monitoring module simultaneously monitors the training of the first module, the second module and the third module); the auxiliary task is connected with a recall rate of a replacement user, a recall rate of a reserved user and a recall rate of a lost user corresponding to a reserved prediction task (namely, the auxiliary task monitoring module simultaneously monitors training of the first module and the third module); and the recall rate of the users who change the machine, the recall rate of the retained users and the recall rate of the lost users corresponding to the auxiliary task connection loss prediction task (namely, the auxiliary task monitoring module simultaneously monitors the training of the second module and the third module).

TABLE 1 recall ratio comparison Table for different monitoring situations of auxiliary tasks

Model (model)	Change user recall rate	Retention user recall	Attrition user recall
				Structure 1 (fig. 4)	57.02％	72.95％	53.45％
Structure 2-two Main tasks connected auxiliary tasks (FIG. 8)	56.83％	72.14％	53.49％
				Structure 3-persistence prediction task connection assistance task (FIG. 6)	56.88％	72.91％	53.34％
Structure 4-run-off prediction task connection assistance task (FIG. 7)	57.11％	72.22％	53.70％
				Structure 5-Main task not connected auxiliary task (FIG. 5)	57.17％	73.22％	53.39％

In the case of keeping the original CGC structure task input, fig. 8, 6 and 7 show different input situations of three other auxiliary tasks, respectively. In the structure shown in fig. 8, the auxiliary task is used for monitoring retention, loss and deep feature extraction of a user who changes the machine, specifically, feature vectors extracted by exclusive experts of the main task are spliced with a shared Expert feature vector to serve as input features of the auxiliary task; to illustrate the influence of feature information extracted by an auxiliary task on the performance of the main task, the configurations shown in fig. 6 and 7 are respectively designed, and only the Experts of one main task and the shared Expert feature vector are spliced and input to the auxiliary task.

As can be seen from the comparison results in table 1, by comparing the structure 1 with the structure 5, the auxiliary task structure introduced in the embodiment of the present disclosure makes the greatest contribution to the technical effect, and the recall rates of the prediction model on the predicted tasks of the user who changes the machine and the user who retains the machine are respectively increased by 0.15% and 0.27%. It can be seen by comparing the structure 2 with the structure 5 that when the Experts information of the main task is input into the auxiliary task, the performance of the auxiliary task and the performance of the reserved user are interfered, the recall rate index is reduced, and the performance of the lost user is improved, namely, a certain seesaw phenomenon exists, which is caused by a large proportion of the lost users in the data set; the experimental results of the structure 3, the structure 4 and the structure 5 also show that when Experts information of a certain task is input to an auxiliary task, the performance of another task will be reduced, that is, a certain interference exists between main tasks is verified, and the embodiment of the disclosure can also be demonstrated to alleviate the 'seesaw' phenomenon between tasks to a certain extent.

Table 2 is a recall comparison table of models with and without added auxiliary tasks. As shown in table 2, the recall rates of the retained users and the lost users corresponding to the models without adding the auxiliary task and the recall rates of the retained users and the lost users corresponding to the added auxiliary task in the present disclosure are listed respectively. Wherein, Mmoe (Modeling Task Relationships in Multi-Task Learning with Multi-gate knowledge-of-Experts) refers to the underlying shared Multi-Task Learning model. Fig. 9 is a schematic structural view of a CGC shown in accordance with an exemplary embodiment. As shown in fig. 9, the CGC is a single-layer version of PLE, the bottom layer is an expert network and a shared expert network specific to each task, for each task, the gate control unit controls the input of the expert module and the shared module to obtain a weighted output, and finally the output of a single task is obtained through simple MLP.

The underlying network of the CGC mainly comprises shared experiments and task-specific experiments, each experiment module consists of a plurality of sub-networks, and the number and the network structure (dimension) of the sub-networks are both hyper-parameters; the upper layer is composed of a multitask network, the input of each multitask network (towerA and towerA) is weighted and controlled by a gating network, the input of the gating network of each subtask comprises two parts, wherein one part of the input of the gating network of each subtask consists of the expectes of the task-specific part and the expectes of the shared part (namely vector1 … … vector m in the network structure of the gating network), and the input is used as the selector of the gating network. The structure of the gating network is also simple, and only a single-layer forward FC is used, and the input is used as a selector (selector) to obtain the weight occupied by different sub-networks, so as to obtain the weighted sum of the gating networks under different tasks. That is to say, the CGC network structure ensures that each subtask can perform weighted summation on task-specific and shared extra vectors according to the input, so that each subtask network obtains an embedding, and then the output of the corresponding subtask is obtained through the tower of each subtask.

PLE is a version of stacked multilayer CGC. On the basis of CGC, Progressive Layered Extraction (PLE) considers the interaction between different experts and can be regarded as a combined version of Customized shading and ML-MMOE.

TABLE 2 recall ratio comparison of models with and without added auxiliary tasks

As can be seen from Table 2, all the Experts information are shared among the main tasks of the MMOE model, and it is difficult to effectively extract the specific information of each task, so that the performance of the main tasks is general. The PLE structure is added with a plurality of layers of feature extraction modules on the basis of the CGC structure, feature fusion among different expets is considered, and experimental results show that the recall rate of the PLE model on lost users is relatively improved, and more parameters and calculated amount are brought at the same time.

In summary, the prediction method provided by the embodiment of the present disclosure is a mobile phone user machine change prediction method based on multi-task learning, and based on the inclusion relationship between a machine change task and retention and loss tasks, the auxiliary tasks are ingeniously designed, and the correlation relationship between the main tasks is implicitly modeled, so as to improve the crowd coverage of the main tasks; the model can also be flexibly applied to business scenes with similar relations.

Compared with the CGC structure, the method adds an auxiliary task to the subsequent Shared Experts and adds the auxiliary task to the model loss.

And optimizing the function. The auxiliary task of changing the machine introduced by the invention can carry out main task learning on one hand and can also carry out the learning of the characteristics of the user of changing the machine on the other hand, thereby enhancing the sharing characteristics of the main task; the independent Experts module can avoid mutual interference between main tasks, excavate unique characteristics among the tasks, selectively fuse shared characteristics, enrich characteristic information of the main tasks and enable the user to predict the tendency of changing the machine more accurately.

The embodiment of the disclosure also provides a user behavior prediction device. Fig. 10 is a schematic diagram illustrating a structure of a user behavior prediction apparatus according to an exemplary embodiment. As shown in fig. 10, the apparatus includes:

a first obtaining module 71, configured to obtain target data of a user for a target product;

an extracting module 72, configured to input the target data into a pre-trained prediction model, so as to obtain a first class of features and a second class of features at least used for representing the use states of different classes of users for the target product, and a common feature of the use states of the different classes of users for the target product; wherein the use state comprises at least: the user changes the use state of the target product;

a first prediction module 73, configured to predict a first probability that the user is a first class of user according to the first class of features and the common features; wherein the first type of users are users who change the target product to the target product with a first predetermined identifier.

In the embodiment of the present disclosure, the user behavior prediction apparatus may be applied to prediction of user behavior execution. And pushing information to the user according to the user behavior execution. For example, information is recommended to the user through multitask learning. The system applicable to the user behavior prediction method can comprise the field of pushing, a Multi-task learning model based on an Mmoe (Multi-gate Mixture-of-Experts) structure is used for training and optimizing the model according to a user participation and user satisfaction target, wherein the user participation target comprises behaviors of clicking, watching time and the like of a user, and the user satisfaction target comprises behaviors of praise, comment and the like of the user;

In this disclosure, when the user behavior processing apparatus of the present disclosure is applied to the probability prediction of the user replacing goods or services, the user data of the user may include: basic information and action information of a user; the basic information is used for indicating the inherent identity information of the user at present, and the action information is used for indicating the operation of the user on goods or services;

wherein, the basic information at least comprises the information of the age, the sex, the school calendar, the income and the like of the user, and the action information at least comprises the frequency of using the goods or the services and the like.

In the embodiment of the disclosure, the feature extraction layer is a loop in the whole neural network model, and is used for extracting user features useful for predicting the action probability of the user in the user data of the user. The accuracy of the probability prediction is improved for a second probability prediction in which the user is changed to a second class of users having the first predetermined identity.

In some embodiments, as shown in fig. 10, the apparatus further comprises:

a second prediction module 74, configured to predict a second probability that the user is a second class of user according to the second class of features and the common features; and/or the presence of a gas in the gas,

a third prediction module 75, configured to predict, according to the common feature, a third probability that the user is a third class of user;

wherein the second type of user is to change the target product to the target product without the first predetermined identifier; the third type of user is to replace the target product, and to replace the target product with the first predetermined identification and the target product without the first predetermined identification.

In some embodiments, the apparatus further comprises:

the device further comprises:

In some embodiments, the apparatus further comprises:

In some embodiments, the first prediction module is further configured to:

In some embodiments, the second recommendation module is further configured to:

An embodiment of the third aspect of the present disclosure further provides a model training apparatus, where the apparatus includes:

In some embodiments, the first training module is to:

In some embodiments, the second training module is to:

obtaining a first loss value according to the first predicted value and a first label of the sample data;

obtaining a second loss value according to the second predicted value and a second label of the sample data;

obtaining a third loss value according to the third predicted value and a third label of the sample data; obtaining a training loss value according to the first loss value, the second loss value and the third loss value;

In some embodiments, the training module (including the first training module and the second training module) of the predictive model is configured to:

inputting sample data into network branches where the first module, the second module and the third module are respectively located, and respectively obtaining a first predicted value corresponding to the first module, a second predicted value corresponding to the second module and a third predicted value corresponding to the third module;

In the embodiment of the disclosure, training of a prediction model is further included, and the prediction model at least includes a first module, a second module and a third module for feature extraction. The training sample data is input to network branches where the first module, the second module and the third module are respectively located, so that a first predicted value corresponding to the first module, a second predicted value corresponding to the second module and a third predicted value corresponding to the third module are respectively obtained.

a first weight of a network node within the first module;

a second weight of a network node within the second module;

a third weight of a network node within the third module;

wherein the content of the first and second substances,

x(input)＝[e₁,e₂,...,e_k,d₁,d₂,d,...,d_n]；k∈[1，2]，d∈N⁺；

is a first weight, b¹A first bias corresponding to a first type of feature.

wherein the content of the first and second substances,

x(input)＝[e₁,e₂,...,e_k,d₁,d₂,d,...,d_n]；k∈[1，2]，d∈N⁺；

is a second weight, b²A second bias corresponding to the second type of feature.

wherein the content of the first and second substances,

x(input)＝[e₁,e₂,...,e_k,d₁,d₂,d,...,d_n]；k∈[1，2]，d∈N⁺；

is a third weight, b³A third bias corresponding to the common characteristic.

In some embodiments, the second training module is specifically configured to:

In some embodiments, the second training module is specific to at least one of:

In the embodiment of the present disclosure, when the network parameter is updated according to the training result, the network parameter associated with the loss value may be updated according to the back propagation of the loss value. For example, since the first loss value is associated with a first predicted value corresponding to the first probability associated with the first fused feature, the network parameters of the first module and the network parameters of the third module may be updated according to a back propagation of the first loss value; since the second loss value is associated with the second predicted value, which corresponds to the second probability, associated with the second fused feature, the network parameters of the second module and the network parameters of the third module may be updated according to back propagation of the second loss value. Because the third loss value is associated with the predicted value, and the third predicted value corresponds to the third probability and is associated with the common feature, the network parameter of the third module can be updated according to the back propagation of the third loss value until the training result of the prediction model meets the training stopping condition, and the updating of the network parameter is stopped.

In the embodiment of the present disclosure, when the training result of the sample data is determined, the training result of the model may be determined according to the matching degree between the prediction result and the sample data. For example, recall may be used as a performance assessment indicator for model training results. The recall rate is used for indicating the matching degree of the training result of the sample data and the sample data. For example, the sample data is 3000 ten thousand user data, and the number of users, which are the first class of users, in the 3000 ten thousand users is determined through training analysis; and confirming and comparing the users determined as the first class users with whether the users are really the first class users (namely whether preset operation is executed aiming at the first operation object), and determining the proportion of the obtained real first class users in the predicted first class users as the recall rate. The larger the recall rate is, the better the pre-model training result is.

An embodiment of the present disclosure further provides an electronic device, including: a processor and a memory for storing a computer program operable on the processor, wherein the processor is operable to perform the steps of the method of the embodiments when executing the computer program.

The disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method of the embodiments.

FIG. 11 is a block diagram illustrating an electronic device in accordance with an example embodiment. For example, the electronic device can be a mobile phone, a computer, a digital broadcast electronic device, a messaging device, a gaming console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 11, the electronic device may include one or more of the following components: a processing component 802, a memory 804, a power component 806, a multimedia component 808, an audio component 810, an input/output (I/O) interface 812, a sensor component 814, and a communication component 816.

The processing component 802 generally controls overall operation of the electronic device, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the electronic device. Examples of such data include instructions for any application or method operating on the electronic device, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power components 806 provide power to various components of the electronic device. The power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for electronic devices.

The multimedia component 808 includes a screen that provides an output interface between the electronic device and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the electronic device is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the electronic device is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the electronic device. For example, the sensor assembly 814 may detect an open/closed state of the electronic device, the relative positioning of components, such as a display and keypad of the electronic device, the sensor assembly 814 may also detect a change in the position of the electronic device or a component of the electronic device, the presence or absence of user contact with the electronic device, orientation or acceleration/deceleration of the electronic device, and a change in the temperature of the electronic device. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate wired or wireless communication between the electronic device and other devices. The electronic device may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, communications component 816 further includes a Near Field Communications (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the electronic device may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for predicting user behavior, the method comprising:

acquiring target data of a user for a target product;

predicting a first probability that the user is a first class of user according to the first class of features and the common features; wherein the first class of users are users who change the target product to the target product with a first predetermined identification.

2. The method of claim 1, further comprising:

3. The method of claim 1, further comprising:

4. The method of claim 2, wherein the second type of user is a user who changes the target product to the target product having a second predetermined identification;

the method further comprises the following steps:

5. The method of claim 2, further comprising:

6. The method of claim 1, wherein predicting a first probability that the user is a first class of user based on the first class of features and the common features comprises:

7. The method of claim 2, wherein predicting a second probability that the user is a second class of user based on the second class of features and the common features comprises:

8. A method of model training, the method comprising:

inputting sample data of a user for a target product into a prediction model to obtain a first class of characteristics and a second class of characteristics at least used for representing the use state of different classes of users for the target product and common characteristics of the different classes of users for the use state of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

9. A user behavior prediction apparatus, the apparatus comprising:

the first prediction module is used for predicting a first probability that the user is a first class user according to the first class feature and the common feature; wherein the first class of users are users who change the target product to the target product with a first predetermined identification.

10. A model training apparatus, the apparatus comprising:

the system comprises a first training module, a second training module and a third training module, wherein the first training module is used for inputting sample data of a user aiming at a target product into a prediction model to obtain a first class of characteristics and a second class of characteristics which are at least used for representing the use state of different classes of users aiming at the target product, and common characteristics of the different classes of users aiming at the use state of the target product; wherein the use state comprises at least: the user changes the use state of the target product;

11. An electronic device, comprising: a processor and a memory for storing a computer program operable on the processor, wherein the processor is operable to perform the steps of the method of any one of claims 1 to 7 or to perform the steps of the method of claim 8 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7 or carries out the steps of the method of claim 8.