CN111695042A

CN111695042A - User behavior prediction method and system based on deep walking and ensemble learning

Info

Publication number: CN111695042A
Application number: CN202010524285.4A
Authority: CN
Inventors: 陈佐; 吴志良; 杨胜刚; 朱桑之; 谷浩然; 杨捷琳
Original assignee: Hunan Huda Jinke Technology Development Co ltd
Current assignee: Hunan Huda Jinke Technology Development Co ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-09-22
Anticipated expiration: 2040-06-10
Also published as: CN111695042B

Abstract

The invention discloses a user behavior prediction method and a system based on deep Walk and ensemble learning, wherein the method comprises the steps of preprocessing problems such as repetition, abnormity, redundancy and the like existing in an original data set, extracting statistical information and liveness information capable of reflecting behavior habits and preference degrees of consumers from the preprocessed data set to construct user portraits for users, then performing random Walk (Ramdon Walk) on social network diagram structures of commodities purchased by the users to obtain a new behavior sequence, and then adding context information of each behavior of the users obtained by a Word2vec model into a machine learning model to train and learn, so that the prediction reliability and the prediction accuracy of the model are improved.

Description

User behavior prediction method and system based on deep walking and ensemble learning

Technical Field

The invention relates to the technical field of machine identification, in particular to a user behavior prediction method and a user behavior prediction system based on deep walking and ensemble learning.

Background

With the rapid development of internet technology and electronic commerce, more and more people enjoy shopping from the internet, and the problem of daily article demand is solved. Every day, thousands of users purchase commodities from the E-commerce online shopping platform, and it is significant to analyze the historical behaviors of the users by using an artificial intelligence algorithm to judge whether the users purchase the commodities. For example, researchers have found that by analyzing historical shopping data of users on a certain e-commerce platform, good preference and behavior characteristics can be mined, which has a great effect on personalized recommendation, user relationship management and advertisement placement cost. In view of this, it is of great research significance to use an artificial intelligence algorithm to judge whether a user purchases a commodity historically.

Machine learning algorithms have been a common method of determining whether a user has purchased or collected merchandise. Research shows that a user behavior prediction model is generally established and optimized from two angles, and the generalization capability of the algorithm model is optimized from the angle of the model algorithm; and the other method is to improve the generalization capability of the user behavior prediction model by analyzing the behavior sequence of the user so as to establish an algorithm model. However, with the rise of ensemble learning, the prior art improves the generalization capability of the algorithm by fusing single models. Both of these methods have their own advantages, but at present, some disadvantages remain. The method comprises the following steps:

(1) when a user behavior sequence is researched, context semantic information of each behavior of a user is not effectively considered, so that the learning capability and the prediction accuracy of a model obtained by training are low;

(2) in the integrated learning process, most studies adopt a random sampling method to generate training subsets to construct several single classifiers, however, the diversity of the training subsets is not guaranteed, which may lead to the reduction of the overall classification performance.

Disclosure of Invention

In order to solve at least one of the above problems, the present invention provides a user behavior prediction method based on deep walking and ensemble learning.

The invention is realized by the following technical scheme:

the user behavior prediction method based on deep walking and ensemble learning comprises the following steps:

step S1, acquiring an original data set and preprocessing the original data set;

step S2, constructing a user portrait based on the preprocessed data set, and forming a commodity social network graph structure;

step S3, randomly walking the commodity social network diagram structure to obtain new behavior sequence data, and then training the new behavior sequence data by using a Word2vec model to generate an embedding vector;

and step S4, inputting the embedding vector into a machine learning model for training to obtain a single user behavior prediction model.

According to the commodity social network graph structure, the commodity social network graph structure is formed by constructing the user portrait, the reliability and the precision of user behavior prediction can be improved by utilizing a deep walking technology based on the social network graph structure.

Furthermore, in order to further improve the accuracy and reliability of user behavior prediction, the invention integrates (fuses) the single user behavior prediction model to obtain a fusion model with higher prediction accuracy. The method further comprises a step S5 of fusing two models with the largest difference in the plurality of single user behavior prediction models to obtain the user behavior prediction model.

Preferably, the model fusion step of the invention specifically adopts a model difference measurement method of MIC and confusion matrix to realize model fusion, so that the learning capability of the model can be improved and the generalization capability can be more excellent. Step S5 of the present invention specifically includes:

step S51, repeatedly executing step S3 and step S4 by adjusting the step length of random walk and the dimensionality of the embedding vector, and constructing and obtaining a plurality of single user behavior prediction models;

step S52, selecting n models from a plurality of single user behavior prediction models according to generalization capability; wherein n is a positive integer greater than or equal to 3;

step S53, calculating the maximum information coefficient MIC between each model in the n models and the model, and constructing a confusion matrix and visualizing the confusion matrix;

and step S54, finding out two single models with the minimum similarity on the obtained confusion matrix for fusion to obtain a user behavior prediction model.

Preferably, in step S2 of the present invention, the user representation is constructed from three angles, which are the basic information of the user, the activity information of the user, and the statistical information of the user operation behavior.

Preferably, the random walk process in step S3 of the present invention specifically includes: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the migration, and repeating the process continuously until the set migration length is reached, and stopping the migration, thereby obtaining a new piece of user behavior sequence data.

On the other hand, the invention also provides a user behavior prediction system based on deep walking and integrated learning, and the system comprises a data acquisition module, a preprocessing module, a user portrait module, a random walking module and a training module;

the data acquisition module is used for acquiring original behavior data of a user, constructing an original data set and sending the original data set to the preprocessing module;

the preprocessing module is used for preprocessing the original data set and sending the preprocessed data to the user portrait module;

the user portrait module is used for constructing a user portrait based on the preprocessed data set, forming a commodity social network graph structure and sending the commodity social network graph structure to the walking module;

the walking module is used for randomly walking the commodity social network diagram structure to obtain new behavior sequence data, then training the new behavior sequence data by using a Word2vec model to generate an embedding vector and sending the embedding vector to the training module;

the training module is used for inputting the embedding vector into the machine learning model for training to obtain the single user behavior prediction model.

Preferably, the system of the present invention further comprises: a fusion module;

the fusion module is used for receiving the single user behavior prediction models output by the training module and fusing the two models with the maximum difference to obtain the user behavior prediction model.

The fusion module comprises a selection unit, a calculation unit and a fusion unit;

the selection unit selects n models from a plurality of single user behavior prediction models according to generalization capability; wherein n is a positive integer greater than or equal to 3;

the computing unit computes a maximum information coefficient MIC between each model in the n models and the model, and a confusion matrix is constructed and visualized;

and the fusion unit finds out two single models with the minimum similarity on the obtained confusion matrix for fusion to obtain a user behavior prediction model.

The user portrait module of the invention constructs the user portrait from three angles, namely the basic information of the user, the activity information of the user and the statistical information of the operation behavior of the user.

The random walk module of the present invention is configured to perform the following process: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the migration, and repeating the process continuously until the set migration length is reached, and stopping the migration, thereby obtaining a new piece of user behavior sequence data.

The invention has the following advantages and beneficial effects:

1. the method comprises the steps of preprocessing the problems of repetition, abnormality, redundancy and the like existing in an original data set, extracting statistical information and activity information capable of reflecting behavior habits and preference degrees of consumers from the preprocessed data set to construct user portraits, then performing random Walk (Ramdon Walk) on social network graph structures of commodities purchased by users to obtain new behavior sequences, and then adding context information of each behavior of the users obtained by a Word2vec model into a machine learning model to train and learn, so that the prediction reliability and the prediction accuracy of the model are improved.

2. The method further performs selective integration (fusion) on the obtained single model by adopting an MIC (many integrated core) and confusion matrix method, and further enhances the prediction performance and reliability of the model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic diagram of a user behavior prediction model building process according to a first embodiment of the present invention.

FIG. 2 is a schematic diagram of a random search process according to the present invention.

Fig. 3 is a schematic diagram of a user behavior prediction model building process according to a second embodiment of the present invention.

FIG. 4 is a flow chart of selective model fusion based on MIC and confusion matrix.

FIG. 5 is a ROC graph.

FIG. 6 is a schematic diagram of the model building process during testing and verification according to the present invention.

FIG. 7 is a comparison of AUC of a user profile and a verification set of an original model according to the present invention.

FIG. 8 is a comparison of AUC of a user profile and a test set of original models in accordance with the present invention.

FIG. 9 is a confusion matrix visualization of the present invention.

FIG. 10 is a comparison graph of AUC in the model fusion validation set of the present invention.

FIG. 11 is a comparison of AUC for the model fusion test set of the present invention.

FIG. 12 is a comparison graph of AUC in AUC ranking fusion validation sets of the present invention.

FIG. 13 is a comparison graph of AUC in the AUC ranking fusion test set of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1

The embodiment provides a user behavior prediction method based on deep walking and ensemble learning.

In the embodiment, one commodity of the behavior sequence purchased by the user is regarded as one word, and all commodities are regarded as one document, so that word vectors can be trained by using Natural Language Processing (NLP). On the other hand, under the situation of a user purchasing behavior sequence, a large amount of graph structure information exists between data, the data information is very important, and the embodiment well applies the deep walking (deep walk) technology to a purchasing behavior network structure. The deep Walk (deep Walk) technology utilizes a Random Walk (Random Walk) technology to randomly Walk network nodes in a graph to form a behavior sequence, the behavior sequence of a user is regarded as a Word, all behavior sequence documents are pre-trained by using a Word2vec algorithm model, the deep Walk technology is added on the basis of an original model, and a classifier algorithm based on the deep Walk is provided.

As shown in fig. 1, the method of the present embodiment mainly includes the following steps:

1. acquiring an original data set and preprocessing the original data set;

2. constructing a user image based on the preprocessed data set to form a graph structure of related commodities of the user behavior sequence;

3. and randomly selecting a starting point in a random walk mode in the graph structure, and regenerating the behavior sequence of the commodity. The method specifically comprises the following steps:

the process of accessing other remaining nodes from a certain vertex in the graph structure is called graph traversal, and graph traversal methods are generally two, namely breadth-first search (BFS) and depth-first search (DFS), and are a premise for solving a problem related to a graph topology structure. Breadth-first-search (BFS) traverses its adjacent nodes starting from the starting point, thereby spreading out continuously, giving priority to the amount of information brought by the near-end connection. Starting from a vertex v, the depth-first search (DFS) firstly marks v as a traversed vertex, then selects a vertex u adjacent to v which is not traversed, if u does not exist, the search is terminated, if u exists, the DFS is started again from u, and the process is circulated until no vertex exists, and the depth-first search (DFS) utilizes the information quantity implied by the remote connection. RandomWalk is a depth-first traversal algorithm that can repeatedly access visited nodes. Random Walk (Random Walk) is to continuously and repeatedly Walk stiff randomly in a network graph structure, starting from a specific vertex in the graph structure, randomly selecting one from a plurality of points connected with a current node in each step of the Walk, continuously repeating the process, and stopping the Walk after a set Walk length is reached so as to obtain a piece of sequence data.

4. Inputting the new behavior sequence into a Word2vec model, and training to generate an embedding vector of the commodity. The method specifically comprises the following steps:

the Word2vec algorithm represents semantic information of words by learning a text and then by means of Word vectors, namely, a space where an original Word is located is mapped into a new space, so that semantically similar words are close to each other in distance in the space. Word2vec contains a total of two language algorithmsModels, CBOW models and Skip-gram models. The CBOW model and the Skip-gram model both comprise an input layer, a hidden layer and an output layer, and the CBOW model is a current word w_tContext w of_t-1，w_t-2,w_t+1，w_t+2On the premise of predicting the current word w_tWhereas the Skip-gram model is just the opposite, when the current word w is known_tOn the premise of (1), predicting its context w_t-1，w_t-2,w_t+1，w_t+2. Word2vec algorithm provides two optimization algorithms of Hierarchical Softmax and Negative Sampling to reduce training time of Word vectors. Both the optimization methods of Hierarchical Softmax and Negative Sampling use BP neural network as classification method. And finally, each word is represented by an N-dimensional vector randomly generated by an algorithm, and the optimal word vector, namely the embedding vector of each word can be obtained after the training of a Wood 2vec algorithm model.

An algorithmic model of learning a word vector predicts the next word given a context word. In the algorithm model framework, each term in the document is projected into a vector space, wherein each term in the document corresponds to a unique column vector in the matrix W, and the position of the column vector is determined by the position of the term in the document. The concatenation or addition of context word vectors is then used as feature vectors to predict the next word.

Suppose there is a sentence W that contains T words, W respectively₁,W₂,…,W_i,…,W_TOur goal is to maximize the function L, i.e.

Taking log and averaging to obtain

The idea of softmax multi-classification is mainly used in the prediction task, and in the formula, the posterior probability is

Each yi in the above equation is a log of the probability of not being normalized, and is calculated as follows:

y＝b+Uh(w_t-k，…，w_t+k；W)

the Word vector obtained by training the Word2vec algorithm model contains context information. The Word2vec algorithm model has the advantages that the context information can be obtained, the data dimension scale is compressed compared with one-hot, and the time efficiency of the model is greatly reduced.

5. And inputting the embedding vector into a machine learning model for training and learning, thereby obtaining a single user behavior prediction model.

In the embodiment, Ramdon Walk is performed on a commodity graph structure formed after data preprocessing and user portrait construction to randomly Walk each node to obtain a new behavior sequence, then the new behavior sequence is trained and optimized by using a Word2vec algorithm model to obtain an embedding vector, and finally a machine learning model is used for training and optimizing to obtain a final output result. When the model is trained, two hyper-parameters need to be controlled, namely the dimensionality of the embedding vector and the step length of random walk. The more the dimensionalities of the embedding vector are, the better the embedding vector is, the less dimensionalities are possible to influence the generalization capability of the model, the high dimensionalities are possible to cause dimensionality disasters, and therefore grid searching is needed to determine the optimal dimensionality of the embedding vector. The grid search is to specify the dimension of the word vector first for training the model, and on this basis, increase the dimension of the word vector for training and predicting the model, if the model has a better effect than the previous model, the dimension of the embedding vector is further enlarged, otherwise, the optimal dimension of the embedding vector is achieved, as shown in fig. 2 specifically. Similarly, the optimal solution of the length of the step length of the random walk can also be obtained by random search.

Example 2

This embodiment further fuses a single model based on embodiment 1 described above, as shown in fig. 3. The fusion method of the embodiment firstly measures the difference between each single learner by using the Maximum Information Coefficient (MIC), and then expresses the difference in the form of a confusion matrix, so that two single learners with the largest difference are selected for model fusion to obtain more excellent generalization capability.

Wherein:

1. the Maximum Information Coefficient (MIC) is used to measure the degree of correlation between two variables, whether linear or non-linear. The calculation of the Maximum Information Coefficient (MIC) mainly utilizes Mutual Information (MI) and a mesh division method. Mutual Information (MI) is used to measure the degree of correlation between two variables, given a set of variables B ═ B₁,b₂,…,b_nN is the number of samples, and Mutual Information (MI) can be defined as

Where p (a, B) is the joint probability density between variable a and variable B, and p (a) and p (B) are the edge probability densities of variable a and variable B, respectively, the joint probability calculation is generally relatively complex. The idea of the Maximum Information Coefficient (MIC) is to disperse the relationship between two variables in a two-dimensional space, and to use a scatter diagram to represent, divide the current two-dimensional space into a certain number of intervals in the x and y directions, and then check the situation that the current scatter point falls in each square, which is the calculation of the joint probability, thus solving the problem that the joint probability in the mutual information is difficult to solve. Assuming that a finite ordered pair set D { (a1, B1), (a2, B2), …, (an, bn) }, a partition G is defined to divide the value ranges of the variable a and the variable B into x segments and y segments, respectively, and G is a grid formed by x × y coordinate axes. Calculating inter-computed mutual information MI (A, B) separately for each of the obtained mesh partition segments, there are many ways for an x y mesh partition method, taking the maximum MI (A, B) value in different partition methods as the mutual information value of partition G, and then defining the maximum mutual information formula of D under partition G as:

MI*(D，x，y)＝maxMI(D|G)

where D | G denotes the division of the data set D over G. Normalizing the maximum mutual information obtained under different divisions to obtain characteristic moments M (D) x, y, wherein the calculation formula is

The Maximum Information Coefficient (MIC) is defined as

Where b (n) is a variable representing the maximum value of the grid division x × y, given in the literature when b (n) is equal to n^0.6The effect is the best.

2. The confusion matrix expresses the relation between the prediction result of the data sample model and the real attribute value, and is a common mode for evaluating the generalization capability of the classifier. Assuming now that there are a total of N classes of classification tasks, the set of data samples D contains T in total₀A strip data record, each category containing T_iAnd (i is more than or equal to 1 and less than or equal to N). Construction of a classifier C, cmi using machine learning or deep learning models_jThe data records representing the ith category are judged by the classifier C as a percentage of the total number of data records of the ith category of data records of the jth category, so that a confusion matrix CM (C, D) can be obtained, the dimension of which is N × N:

the column indices of the elements in the confusion matrix represent the predicted results of the classifier C model on the data samples, and the row indices represent the true label values of the data samples. The diagonal elements of the confusion matrix represent the probability that each class can be correctly predicted by classifier C, while the non-diagonal portions represent the probability that classifier C is misjudged.

In the field of machine learning, if the similarity between two classes is relatively high, the data samples of the two classes are highly likely to be predicted as opposite classes by a classifier. Confusion momentArray vector CM_i(1. ltoreq. i.ltoreq.N) represents the tendency of the data sample of the category i to be in each category when model prediction is performed. Based on the confusion matrix, the present embodiment defines a correlation matrix between a single classifier. Assuming that there are M single classifiers, the present embodiment converts the confusion matrix corresponding to each single classifier into a row vector, and specifically expands the confusion matrix by rows in sequence as shown below:

wherein CM⁽ⁱ⁾(i is more than or equal to 1 and less than or equal to M). Then all the row vectors CM are combined⁽ⁱ⁾(i is more than or equal to 1 and less than or equal to M) are combined to form a matrix to obtain a confusion matrix of all single classifiers, which is defined as CMS (C, D) as follows:

based on the obtained confusion matrix and the Maximum Information Coefficient (MIC), a similarity measurement matrix Q in ensemble learning can be obtained. The Q matrix reflects the correlation between each single classifier, and the smaller the value of Qij is, the smaller the correlation between two single classifiers is, and the larger the correlation is. Therefore, the Q matrix can well measure the similarity between each classifier, and the use of the Q matrix can provide a method for how to find two single classifiers with great difference in the ensemble learning. The formula for the Q matrix is shown below.

As shown in fig. 3, the method of this embodiment further includes the following steps based on the above embodiment 1:

selecting n user behavior prediction models with relatively strong generalization ability from the models constructed in the embodiment 1; calculating the maximum information coefficient MIC between each model and each model; constructing a confusion matrix and carrying out visualization; and fourthly, finding out two single models with lighter colors, namely smaller similarity on the obtained confusion matrix to carry out Bagging fusion. The specific flow chart is shown in fig. 4.

When performing model fusion, since simple weighted fusion is performed, how to determine the optimal coefficient of each model is usually a difficulty in performing weighted fusion. When performing weighted fusion, it is simplest to perform average weighted fusion because theoretically, if model fusion can bring about improvement of model effect, averaging each single model can bring about improvement of model effect certainly when performing fusion, but averaging each model may not be the optimal coefficient when performing model fusion for each model. Instead of simply performing simple mean weighted average fusion, a fusion method based on AUC ranking is proposed based on the characteristics of AUC ranking. The calculation formula of AUC is as follows

Where M is the number of positive samples, N is the number of negative samples, rank_insiRefers to the order in which the positive samples are arranged in the data set. From the above formula, the AUC is essentially a sort, and we can use this characteristic to calculate the coefficient of each model when model fusion is performed, and finally perform model fusion. Each model coefficient calculation formula is as follows. The specific mode is that the results obtained by each model are sorted in descending order according to AUC values, and the reciprocal of each sample after being sorted is multiplied by the AUC obtained by model prediction to be added to obtain the final fusion result.

The final value obtained by model fusion is the sum of the AUC value of each sample of each model multiplied by the reciprocal of the AUC value of each sample in the relative ordering of the single model.

Example 3

In this embodiment, the method provided in the above embodiment is tested by taking a background log of APP for shopping in a certain bank as an example.

1. Raw data set

The time span of the obtained bank shopping APP background log is one month, the obtained bank shopping APP background log mainly comprises 4 thousands of user consumption behavior data, each row corresponds to one operation record of a user, and sequencing is carried out according to the operation time of the user. The data set contains the relevant fields as shown in table 1.

Table 1 original data set basic information table

The raw data set is preprocessed.

2. User representation construction

Each line record in the unprocessed data set is an information record of a single operation behavior of the user, which takes the operation behavior of the user as granularity, and the embodiment is information of finer granularity required by each user for predicting whether the user purchases a certain product. Therefore, the processing of the original data set and the construction of the user portrait are very critical, so that the characteristic of finer granularity of the operation behavior of each user is obtained. The embodiment mainly constructs the user portrait by calculating the user behavior statistics, thereby discovering the behavior habit of the user.

Grouping and sequencing unprocessed data sets according to the User-id field, and obtaining the statistical information of each User behavior. The user behavior statistical information describes the behavior habits of the user from a plurality of aspects, and mainly comprises the following aspects:

(1) basic information of user

This part of the data information consists of the User-id field and the User related variable Userinfo _ X in the original data. The data field does not need to be subjected to additional recombination and calculation, and only needs to be updated for corresponding numerical values when each record of a user is read.

(2) User liveness information

The user's activity information reflects the user's preferences for apps and can be thought of from multiple directions. The user activity information indicators used primarily herein are shown in table 2.

TABLE 2 subscriber liveness information Table

(2) User operation behavior statistical information

The user operation behavior information reflects how the user interacts with the APP and the preference degree of the user for a certain function of the APP. The data field mainly includes the number of times, the proportion, etc. of each operation of the user, and the detailed characteristic field is as shown in table 3.

TABLE 3 user behavior operation statistics Table

3. Evaluation index

The prediction results of the model are represented by a confusion matrix, as shown in table 4.

TABLE 4 confusion matrix

The Accuracy (ACC), Precision (Precision), Recall (Recall), false positive (FRP), and F1-source can be defined by the confusion matrix, and the calculation formulas are respectively as follows:

ACC＝(TP+TN)/(TP+FN+FP+TN)

P＝TP/(TP+FP)

R＝TP/(TP+FN)

FRP＝FP/(TN+FP)

F1-sorce＝(2×P×R)/(P+R)

the ROC curve is a curve with Recall (Recall) as ordinate and false positive (FRP) as abscissa, and the area under the ROC curve is the AUC value, as shown in fig. 5, it is obvious that the value of this area is not greater than 1. Since ROC curves are generally located above the line y-x, the AUC generally ranges between 0.5 and 1. 17410 records purchased by a user and 15590 records not purchased by the user in the original data set, data samples are unbalanced, the accuracy is not suitable as a model evaluation index at this time, and generally an AUC is taken as an evaluation mode for the user behavior prediction problem, so that the evaluation index selected by the embodiment is the AUC. The greater the AUC, the better the model.

In order to verify the effectiveness of the user portrait and the prediction model based on the deep walk user behavior, two basic machine learning models, namely, xgboost, xgb for short, and lightgbm, lgb for short, are selected as basic classifiers in the embodiment, and are respectively compared with xgb and lgb extended models added with the user portrait and deep walk technology for experiments. In the deep walk technology, random search is carried out by a random walk step length walk with a step size of 1 in a range of [5, 10), and random search is carried out by a word vector dimension size with a step size of 1 in a range of [5, 10). In this embodiment, the work of data preprocessing and user portrait creation is implemented by pandas, numpy, and sklern. xgb and lgb are implemented using python xgboost and lightgbm packages, respectively. The implementation of the Word2vec model in the Deepwalk model is implemented by the python genetic package. The five-fold cross validation used for model validation is implemented by sklern. xgb and lgb model parameters have a grid search to determine optimal parameter values. The structure of the model constructed in this embodiment is shown in fig. 6. Where size represents the dimension of the training Word2Vec embedded vector and walklength represents the step size of the random walk.

Each line record in the original data set is information of a single operation with the operation behavior of a certain user as granularity, and the embodiment is information of single user granularity needed for predicting whether the user purchases a certain product. Therefore, it is necessary to re-integrate and compute the original data set to create a user profile to obtain more fine-grained features of the operation behavior of each user. The embodiment mainly constructs the user portrait by calculating the user behavior statistics, thereby discovering the behavior habit of the user.

As shown in fig. 7, it is a graph of the AUC performance indicators for the lgb, xgb models and the validation set after user imaging. In contrast, the value of the AUC of xgb basic model, namely model 2, is 0.0027 higher than that of lgb basic model, namely model 1, and reaches 0.5219, models formed after lgb and xgb models are added to the user portrait, namely model 3 and model 4, model 3 is higher than that of model 4 and reaches 0.7131, while the value of AUC of model 3 is 0.1939 higher than that of model 1 and the value of AUC of model 4 is 0.1905 higher than that of model 2, and experiments prove that a great breakthrough is made in the performance aspects of both xgb model and lgb model after the user portrait is established.

FIG. 8 is an experimental effect of the model after the user representation is added and the original model on the test set, i.e., the new data set. The test set does not contain data of the training set, so that the prediction of the model has high unpredictability, and the prediction of the model result has appropriate fluctuation. However, it can be seen that the AUC values of the lgb and xgb models after the user image is added are both higher than those of the other models, wherein the AUC value of model 3 reaches 0.7292 and the AUC value of model 4 reaches 0.7238. Comparing the validation set, lgb and xgb have low AUC values for the base model, but after adding the user profile, the AUC values for the test set are higher than those for the validation set, which is caused by uncertainty in the new data, with the risk of under-fitting for model 3 and model 4. In general, whether the test set or the verification set is used, the model after the user portrait is added is stronger than the learning capability of the basic model.

Where category 1 corresponds to

models

5,6,7,8,9, category 2 corresponds to models 10,11,12,13,14, category 3 corresponds to models 15,16,17,18,19, category 4 corresponds to models 20,21,22,23,24, category 5 corresponds to models 25,26,27,28,29, category 6 corresponds to models 30,31,32,33,34, category 7 corresponds to models 35,36,37,38,39, category 8 corresponds to models 40,41,42,43,44, category 9 corresponds to models 45,46,47,48,49, category 10 corresponds to models 50,51,52,53, 54. Experiments show that in many models obtained after adding the deep walk technology after adding the user portrait model 3 and the model 4, the generalization capability of the model is improved regardless of the dimensions of the Word2vec embedding vector and the values of the step size of random walk, wherein, in terms of the deep walk used by the model 3, when the size is 7 and the walklength is 9, the AUC of the obtained user behavior prediction model, namely the model 29 reaches 0.7431, which is 0.03 higher than AUC of the model 3, and when the size is 7 and the walklength is 8, the AUC of the obtained model 28 is the lowest among the models obtained by the model 3, but the AUC of the model 28 is 0.026 higher than the AUC of the model. The AUC values of the models evolved from model 4 are all lower than those of the models evolved from model 3, and when size is 9 and walklength is 6, the obtained model 51 has the highest AUC value, which reaches 0.7374, and the AUC value is 0.025 higher than that of model 4, the model with the lowest effect is model 53, and the AUC is 0.7342, which is 0.021 higher than that of model 4. In general, the user behavior prediction model added with the Deepwalk technology has better performance than other basic models. The specific experimental data for each model on the validation set are shown in table 5.

TABLE 5 validation set AUC values for each model

On the verification set, models obtained by adding lgb and xgb to Deepwalk rapid expansion have almost the same learning capacity, and AUC values reach 0.74. In the model expanded by lgb, when the size is 5 and the walklength is 8, the obtained user behavior prediction model performs best, namely model 8, the AUC value reaches 0.7479, which is 0.0187 higher than the AUC value of model 3 on the verification set, and when the size is 6 and the walklength is 7, the obtained user behavior prediction model has the lowest AUC value, namely model 17, which reaches 0.7451, which is only 0.0028 lower than model 8. In the model expanded by xgb, when the size is 7 and the walklength is 6, the obtained user behavior prediction model performs best, namely, the model 31, the AUC value reaches 0.7466, which is 0.0228 higher than that of the model 4 on the verification set, and when the size is 6 and the walklength is 7, the obtained user behavior prediction model has the lowest AUC value, namely, the model 23, and the AUC value reaches 0.7437, which is only 0.0029 lower than that of the model 31. The specific experimental data for each model on the validation set are shown in table 6. In summary, the user behavior prediction model based on deep walk is superior to other models in performance whether on the verification set or the test set.

TABLE 6 AUC values for each model of test set

The calculation of the maximum mutual information used in the embodiment and the construction of the confusion matrix is realized by the pandas and numpy together, and the visualization of the confusion matrix is realized by matplotlib. When model fusion is performed, the first step requires that the learning ability of a single learner is relatively strong. In the embodiment, 6 single models with strong generalization ability are selected from the models, namely the model 29, the model 47, the model 16, the model 51, the model 30, the model 54, the model 38 and the model 20, for integrated learning. Maximum Mutual Information (MIC) calculations were performed between these 6 models and a confusion matrix visualization was constructed as shown in fig. 9. The Maximum Information (MIC) values between each model are shown in table 7.

TABLE 7 MIC values from model to model

As shown in fig. 9, lighter color indicates greater variability between the two models. In the embodiment, model fusion can be performed by selecting three pairs of models, namely, model 20 and model 30, model 30 and model 29, and model 47 and model 20, which have lighter colors and maximum mutual information MIC lower than 0.5, wherein the two models, namely model 30 and model 29, have the lightest color in the confusion matrix and the maximum information coefficient MIC is the lowest. The model 16 and the model 51 and the model 16 and the model 20 can be fused together by fusing the two pairs of the darker color and the maximum information MIC higher than 0.5 to form a contrast experiment. The model fusion mode is Bagging fusion, and the weight of each single model is set to be 0.5.

FIG. 10 is a graph showing the comparison of the effects of the validation set after model fusion. As can be seen from the table, when two single models with lighter confusion matrix colors, i.e. larger differences, are merged, the ensemble learning can be effective. When the model 30 and the model 29 are fused, the effect is optimal, the AUC value reaches 0.7561, because the difference between the model 30 and the model 29 is the largest, the AUC value is respectively improved by 0.0192 and 0.013 compared with the model 30 and the model 29, the AUC value is respectively higher than that of the single model when the model 20 and the model 30 are fused and the model 47 and the model 20 are fused in the same way, the AUC value after the model 20 and the model 30 are fused is respectively improved by 0.008 and 0.0073 compared with that of the model 20 and the model 30, the AUC value after the model 47 and the model 20 are fused is respectively improved by 0.0072 and 0.00139 compared with that of the model 47 and the model 20, on the contrary, the effect of fusing the two models with smaller difference is poor, and the AUC values after the model 16 and the model 51 or the model 16 and the model 20 are fused are both weaker than the expression capability of the single model. The experiments prove that on the verification set, better learning capacity can be obtained only by finding a single learner with smaller similarity for fusion in model fusion, the gain brought by the two models with the largest difference is the largest, and the expression capacity of the models is weakened instead of being improved when the two similar models are fused. The results of the specific experiments are shown in table 8.

TABLE 8 AUC comparison table for model fusion validation set

Fig. 11 shows the results of model fusion on the test set, i.e. the new data set. It can be seen from this figure that model fusion can still be applied to new datasets. The method is consistent with the training set, when the model 30 and the model 29 are fused, the AUC value is highest and reaches 0.7612, which is 0.0265 and 0.0151 higher than the AUC value of the model 30 and the model 29 respectively, the AUC value after the model 20 and the model 30 are fused reaches 0.7586, which is 0.0145 and 0.0139 respectively higher than the AUC value of the model 20 and the model 30 respectively, and the AUC value after the model 47 and the model 20 are fused reaches 0.7566 which is 0.0098 and 0.0209 respectively higher than the AUC value of the model 47 and the model 20 respectively. The AUC values when model 16 and model 51 or model 16 and model 20 are fused are weaker than the learning ability of the model alone. The experimental results on the verification set and the test set show that the selective model fusion method based on the MIC and the confusion matrix can find out the single learner with larger difference items for fusion, so that the relatively excellent generalization capability is obtained, and the expression capability of the model is not increased or decreased when two single models with smaller differences are fused. The results of the specific experiments are shown in table 9.

TABLE 9 AUC comparison table of model fusion validation set

FIG. 12 is a graph comparing the AUC-based ranking of fusion methods in the validation set with the simple fusion method. As can be seen from the figure, the AUC value of the fusion method based on the AUC ordering is higher than that of the common average weighted fusion method, and the expression capability of the model is stronger. After the model 20 and the model 30 are fused, the AUC value is improved to the highest degree compared with other common weighted fusion methods based on AUC sequencing, the AUC reaches 0.7611 and is improved by 0.0025, the AUC after the model 30 and the model 29 are fused is the highest AUC of the fused AUC of the three models and reaches 0.7622, the AUC value is improved by 0.001 compared with the AUC value of the common fusion method, the AUC value after the model 47 and the model 20 are fused is improved by 0.002, and the expression capacity of the model is poorer than that of the model after the first two models are fused. Experiments prove that the fusion method based on AUC (AUC-average score) sequencing has stronger learning capability on a verification set than the common weighted fusion method. The results of the specific experiments are shown in table 10.

TABLE 10 AUC ranking fusion validation set comparison table

Fig. 13 is a graph comparing the fusion method based on AUC ranking on the test set, i.e., the new data set, and the simple fusion method. As can be seen from the figure, the fusion method based on the AUC ordering has higher AUC value on the verification set than the common average weighted fusion method, and the expression capability of the model is stronger. After the AUC is sequenced based on AUC, the fused AUC value of the model 20 and the model 30 is improved to the highest degree compared with the AUC value obtained by other common weighted fusion methods, the AUC reaches 0.7662 and is improved by 0.0076, and the fused AUC of the model 30 and the model 29 is improved by 0.0031 and is up to 0.7622 compared with the AUC value obtained by common fusion methods, the fused AUC value of the model 47 and the model 20 is improved by 0.006, and the expression capability of the model is lower than that of the model obtained by fusing the first two models. Experiments prove that the fusion method based on AUC (AUC-average score) ordering has stronger expression capability than the common weighted fusion method on a verification set or a test set. The results of the specific experiments are shown in table 11.

TABLE 11 AUC ranking fusion test set comparison Table

Through the test results and analysis, the performance of model fusion performed by using a single learner with smaller similarity is better than that of a single learner, and the performance of integrated learning performed by using two single learners with the largest difference is the best. And the user behavior prediction model obtained by the model fusion method based on AUC sequencing has better performance than that of a simple weighted fusion method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The user behavior prediction method based on deep walking and ensemble learning is characterized by comprising the following steps of:

2. The method for predicting user behavior based on deep walking and ensemble learning of claim 1, further comprising:

and step S5, fusing two models with the largest difference in the plurality of single user behavior prediction models to obtain a user behavior prediction model.

3. The method for predicting user behavior based on deep walking and ensemble learning according to claim 2, wherein the step S5 specifically includes:

4. The method for predicting user behavior based on deep walking and ensemble learning according to claim 1,2 or 3, wherein the step S2 is implemented to construct the user profile from three angles, which are the basic information of the user, the activity information of the user and the statistical information of the user operation behavior.

5. The method for predicting user behavior based on deep walking and ensemble learning according to claim 1,2 or 3, wherein the random walking process in step S3 is specifically: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the migration, and repeating the process continuously until the set migration length is reached, and stopping the migration, thereby obtaining a new piece of user behavior sequence data.

6. The user behavior prediction system based on deep walking and ensemble learning is characterized by comprising a data acquisition module, a preprocessing module, a user portrait module, a random walking module and a training module;

7. The deep walking and ensemble learning based user behavior prediction method according to claim 6, further comprising: a fusion module;

8. The deep walking and ensemble learning based user behavior prediction method according to claim 7, wherein the fusion module includes a selection unit, a calculation unit, and a fusion unit;

9. The method for predicting user behavior based on deep walking and ensemble learning according to claim 6,7 or 8, wherein the user profile module constructs the user profile from three angles, which are basic information of the user, activity information of the user and statistical information of user operation behavior.

10. The deep walking and ensemble learning based user behavior prediction method according to claim 6,7 or 8, wherein the random walk module is configured to perform the following process: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the migration, and repeating the process continuously until the set migration length is reached, and stopping the migration, thereby obtaining a new piece of user behavior sequence data.