CN111177579A

CN111177579A - Integrated diversity enhanced ultra-deep factorization machine model and construction method and application thereof

Info

Publication number: CN111177579A
Application number: CN201911304556.9A
Authority: CN
Inventors: 陈岭; 施鸿裕
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2020-05-19
Anticipated expiration: 2039-12-17
Also published as: CN111177579B

Abstract

The invention discloses an integrated diversity enhanced ultra-deep factorization machine model and a construction method and application thereof, and the integrated diversity enhanced ultra-deep factorization machine model specifically comprises the following steps: 1) constructing a training data set; 2) acquiring low-dimensional embedded vector representation corresponding to the original characteristic vector by using a full-connection network, and constructing an initial characteristic matrix; 3) acquiring an output matrix and an output vector of each cross layer by using the integrated diversity enhancement cross network, and calculating a diversity index; 4) calculating weight values for the diversity indexes of different cross layers by using a self-attention mechanism; 5) predicting based on the output vector of the integrated diversity enhancement cross network, and outputting a predicted value; 6) and training the model by using the overall loss consisting of the accuracy loss, the diversity loss and the regular term to obtain the integrated diversity enhanced ultra-deep factorization model with optimized parameters. The integrated diversity enhanced ultra-deep factorization machine model has wide application prospects in the fields of online advertisements, recommendation systems and the like.

Description

Integrated diversity enhanced ultra-deep factorization machine model and construction method and application thereof

Technical Field

The invention relates to the field of feature learning, in particular to an integrated diversity enhanced ultra-deep factorization machine model and application thereof.

Background

Feature learning is an important basis in the field of machine learning, and extraction and construction of effective features play an important role in prediction tasks. Feature crossing is a widely used way of feature construction. Feature crossing refers to the crossing and combining of two or more original features to obtain a new feature. For example, in the task of predicting house prices, house prices with superior "geographical location" and large "house type" are obviously higher. Based on the method, the advantages and disadvantages of the geographic position and the size of the house type are combined in a cross mode, and the geographic position and the house type which are constructed as new characteristics play a key role in predicting the house price. Therefore, how to construct and select effective cross features in feature learning becomes one of research hotspots in the field of machine learning, and has a wide application prospect.

Conventional methods of constructing cross-features can be divided into feature engineering based methods and decomposition based methods. Feature engineering based approaches typically use domain knowledge to manually construct cross-features by engineers. However, the feature dimension space is often large, and high time and labor cost are consumed; furthermore, manually constructed cross-features are often designed for specific tasks and are difficult to generalize to other application scenarios. The decomposition-based method utilizes the thought of matrix decomposition to model the cross features into the inner product of the hidden vectors after the feature weight coefficient matrix decomposition, thereby greatly reducing the number of model parameters. However, considering the problem of computational complexity, the decomposition-based method can only utilize the cross features of low order, and limits the performance of the model to some extent. To address this problem, researchers have proposed deep learning based methods to learn cross-features.

Deep learning based methods typically feed the original features directly into the deep neural network to obtain high-order cross information, but they ignore the importance of low-order cross features. The ultra-deep factorization machine model is a current advanced cross feature construction method based on deep learning. The ultra-deep factorization model combines a self-designed Compressed cross-Network (Compressed Interaction Network) and a deep neural Network to learn cross features. Compared with the traditional method for constructing the cross features, the extremely deep factorization model considers the cross features of low order and high order at the same time, can realize vector-wise crossing and is more interpretable. However, the extremely deep factorization model can be regarded as an integrated learning process when a plurality of cross feature vectors are learned, diversity information among different cross feature vectors is ignored, loss is reduced only by depending on single accuracy target driving, overfitting is easily caused, and the generalization capability of the model is limited to a certain extent.

Disclosure of Invention

The invention aims to solve the technical problem of how to effectively utilize diversity information and design a diversity measurement index to obtain more diverse cross characteristics, and provides an integrated diversity enhanced ultra-deep factorization machine model and a construction method and application thereof.

The technical scheme of the invention is as follows:

a method of constructing an integrated diversity enhanced extremely deep factorisation model, the method comprising the steps of:

(1) dividing original data into category type characteristics and numerical type characteristics, and respectively coding the category type characteristics and the numerical type characteristics to obtain a training set;

(2) sending each high-dimensional sparse feature vector after coding into a single-layer full-connection network to obtain corresponding low-dimensional embedded vector representation, and constructing an initial feature matrix;

(3) inputting the initial characteristic matrix into the integrated diversity enhancement cross network, obtaining an output matrix of each cross layer of the integrated diversity enhancement cross network according to the initial characteristic matrix, summing and pooling row vectors in the output matrix of each cross layer respectively to obtain an output vector of each cross layer, splicing the output vectors of all cross layers to obtain an output vector of the integrated diversity enhancement cross network, and calculating a predicted value of the output vector of the integrated diversity enhancement cross network by using a sigmoid activation function;

(4) calculating the diversity index of the output matrix of each cross layer, and calculating the weight values of the diversity indexes of different cross layers by adopting a self-attention mechanism;

(5) constructing an overall loss according to the diversity index of the output matrix of each cross layer, the corresponding weight value, and the difference between the predicted value and the label value of the sample;

(6) and according to the overall loss, utilizing all samples in the training set to iteratively optimize parameters of a full-connection network, an integrated diversity enhancement cross network, an attention mechanism and a sigmoid activation function, and obtaining an integrated diversity enhancement ultra-deep factorization model when the parameters are determined.

The integrated diversity enhanced ultra-deep factorization machine model is constructed by the method for constructing the integrated diversity enhanced ultra-deep factorization machine model.

The application of the integrated diversity enhanced ultra-deep factorization machine model in advertisement click rate prediction is characterized in that user advertisement click data and a corresponding click label are used as samples, the integrated diversity enhanced ultra-deep factorization machine model for predicting the advertisement click rate is constructed by the aid of the construction method of the integrated diversity enhanced ultra-deep factorization machine model, and when the integrated diversity enhanced ultra-deep factorization machine model is applied, the user advertisement click data is input into the integrated diversity enhanced ultra-deep factorization machine model, so that whether a user clicks an advertisement can be predicted.

The application of the integrated diversity enhanced ultra-deep factorization machine model in commodity purchasing prediction of a user is characterized in that the user purchasing behavior data and a corresponding purchasing label are used as samples, the integrated diversity enhanced ultra-deep factorization machine model for purchasing commodities is constructed by the aid of the construction method of the integrated diversity enhanced ultra-deep factorization machine model, and when the integrated diversity enhanced ultra-deep factorization machine model is applied, the user purchasing behavior data is input into the integrated diversity enhanced ultra-deep factorization machine model, so that whether the user purchases commodities can be predicted.

The invention considers diversity and accuracy at the same time in the learning process of the cross features. Compared with the prior art, the method has the advantages that:

1) the extremely deep factorization machine model with the enhanced integrated diversity is provided, diversity indexes are introduced, the diversity and the accuracy are considered in a target function at the same time, the over-fitting problem is relieved, and the generalization capability and the model performance are improved.

2) In the diversity index, a self-attention mechanism is introduced to distinguish the importance of the diversity index of the cross features of different cross layers so as to fully mine and utilize the diversity information in the output vectors of different cross layers.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a frame diagram of an ultra-deep factorization model with integrated diversity enhancement provided by an embodiment of the invention.

Fig. 2 is a diagram of an integrated diversity-enhanced cross network structure according to an embodiment of the present invention.

Fig. 3 is a flow chart illustrating the construction of an ultra-deep factorization model with integrated diversity enhancement according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.

Fig. 1 is a frame diagram of an ultra-deep factorization model with integrated diversity enhancement provided by an embodiment of the invention. Fig. 2 is a diagram of an integrated diversity-enhanced cross network structure according to an embodiment of the present invention. Fig. 3 is a flow chart illustrating the construction of an ultra-deep factorization model with integrated diversity enhancement according to an embodiment of the present invention. Referring to fig. 1,2 and 3, the method for constructing the integrated diversity enhanced ultra-deep factorization model provided by the embodiment includes the following steps:

step 1, dividing original data into category type characteristics and numerical type characteristics, and respectively coding the category type characteristics and the numerical type characteristics to obtain a complete training set.

Raw data can be classified into categorical and numerical characteristics. And respectively encoding single-value and multi-value characteristics by using a one-hot encoding (one-hot encoding) method and a multi-hot encoding (multi-hot encoding) method aiming at the class-type characteristics. And aiming at the numerical characteristics, carrying out independent thermal coding after discretization treatment by using a box dividing method.

And 2, batching the training set according to a fixed batch size, wherein the total number of batches is N.

Batch size S artificially set according to experience_batThe training data set was batched for a total number of batches of N. The specific calculation method is as follows:

wherein N is_samIs the total number of samples in the training dataset.

And step 3, sequentially selecting a batch of training samples with the index p from the training set, wherein p is 1,2, … and N. Steps 4-14 are repeated for each training sample in the batch.

Step 4, each high-dimensional sparse feature vector x after being coded_{fea_i}Sending into single-layer full-connection network to obtain corresponding low-dimensional embedded vector representation x_{emb_i}And constructing an initial feature matrix X⁰。

Firstly, each high-dimensional sparse feature vector x after being coded is_{fea_i}Sending the data into a single-layer full-connection network to obtain corresponding d-dimension dense embedded vector representation

Where i is 1,2, …, m, m is the number of feature vectors.

Second, the embedded vectors of all features are represented as x_{emb_i}Splicing to obtain initial feature vector representation x₀＝[x_{emb_1},x_{emb_2},…,x_{emb_m}]And an initial feature matrix

Wherein, X⁰Row i in (b) corresponds to the initial feature vector representation x₀X in (2)_{emb_i}。

Step 5, utilizing two initial feature matrixes X⁰Calculating to obtain an output matrix X of the 1 st cross layer¹。

Output matrix of the 1 st cross layer

From two initial feature matrices X⁰The calculation is carried out in the following specific way:

wherein the content of the first and second substances,

representing the l row vector in the output matrix of the 1 st cross layer, l is more than or equal to 1 and less than or equal to e₁，e₁Is the output matrix X of the 1 st cross layer¹The number of rows of (c).

Is X¹The weight parameter of the l-th row in (c),

and

respectively represent initial feature matrix X⁰The ith and jth row vectors of (a),

representing the Hadamard Product (Hadamard Product) between vectors.

Step 6, utilizing the initial characteristic matrix X⁰And the output matrix X of the k-1 th cross layer^k-1Calculating to obtain an output matrix X of the kth cross layer^kWhere K is 2,3,4, …, K.

The output matrix of the k-th cross layer is composed of the initial characteristic matrix X⁰And the output matrix X of the previous cross layer^k-1The calculation is carried out in the following specific way:

wherein the content of the first and second substances,

representing the l row vector in the output matrix of the k-th cross layer, l is more than or equal to 1 and less than or equal to e_k，e_kIs the output matrix X of the k-th cross layer^kThe number of rows of (c). K is 2,3,4, …, K being the number of interleaved layers.

Output matrix X representing the k-1 th cross layer^k-1The vector of the ith row of (a),

is X^kThe weight parameter of the l-th row in (1).

Step 7, respectively outputting the matrix X to each cross layer^kRow vector of

Performing summation pooling, wherein l is more than or equal to 1 and less than or equal to e_k，k＝1,2,3,…,K。

Output matrix X for each cross layer separately^kRow vector of

And (3) carrying out summation pooling, namely carrying out accumulation summation on each element in the row vector, wherein the specific calculation mode is as follows:

thus, the output matrix X of each cross layer can be adjusted^kIs converted into an output vector s^kWherein, in the step (A),

k＝1,2,3,…,K。

step 8, output vectors s of all the cross layers are processed^kSplicing is carried out to obtain the output x of the integrated diversity enhanced cross network_dcin＝[s¹,s²,…,s^K]。

Output x of integrated diversity enhanced crossbar network_dcinFrom the output vector s of K cross layers^kIs formed of (a) x_dcin＝[s¹,s²,…,s^K]Containing the output vectors of the 1 st to K th interleaved layers.

Step 9, output matrix X of each cross layer^kThe diversity index Div is calculated.

Based on Negative Correlation Learning (Negative Correlation Learning) theory in ensemble Learning, the output matrix X of different cross layers^kAnd (3) calculating the diversity index Div in the following specific calculation mode:

wherein the content of the first and second substances,

measure the row vector in each cross layer

And all row vector means of the cross layer

The euclidean distance between.

Step 10, utilizing a self-attention mechanism to calculate a weighted value a for the diversity index Div of different cross layers_k。

By introducing a self-attention mechanism, the output vector s of each cross layer is converted into a vector^kSending the data into a multilayer perceptron to obtain the weight value a of the diversity index Div of different cross layers_k. The specific calculation method is as follows:

a′_k＝h^TReLU(Ws^k+b) (8)

where h, W, and b are learnable parameters, and ReLU is a nonlinear activation function.

Step 11, integrating the output x of the diversity enhancement cross network_dcinSending the sigmoid activation function into the sigmoid activation function and obtaining a predicted value

Output x of cross network to be integrated diversity enhancement_dcinSending the sigmoid activation function to obtain the final predicted value of the model

The specific calculation method is as follows:

wherein, W_dcinTo be a weight parameter, σ (-) is a sigmoid activation function.

Step 12, calculating the loss of diversity

I.e. the sum of the diversity indicators based on different weight values.

The diversity index and weight value a of different cross layers obtained in step 9 and step 10_kCalculating loss of diversity

Namely, the specific calculation method is as follows based on the sum of the diversity indexes with different weighted values:

wherein, D is all training sample sets in the batch, and N is the total number of samples.

Step 13, calculating accuracy loss

I.e. based on all sample tag values y and model prediction values in the batch

The average logarithmic loss in between, i.e., loss of accuracy;

calculating all sample label values y and model predicted values in the batch by using an accuracy loss function

Accuracy loss therebetween

The specific calculation method is as follows:

step 14, loss according to diversity

And loss of accuracy

Calculating the overall loss

Total loss of mass

By loss of accuracy

Loss of diversity

And L2 regular term λ_nThe method comprises the following three parts:

wherein λ is_dTo control the parameters of the balance between loss of diversity and loss of accuracy, Θ represents all the parameters of the model, i.e. the parameters including the fully connected network, the diversity enhancement crossover network, the attention mechanism and the sigmoid activation function.

Step 15, according to the overall loss

Parameters in the entire model are adjusted.

And step 16, repeating the steps 3-15 until all batches of the training data set participate in model training.

And 17, repeating the steps 3-16 until the specified iteration times are reached, so that the integrated diversity enhanced ultra-deep factorization model can be obtained.

The integrated diversity enhanced ultra-deep factorization model can be applied to the field of online advertising and recommendation systems.

An application of an integrated diversity enhanced ultra-deep factorization machine model in advertisement click rate prediction. When the method is applied to the field of online advertisements, the original data are user advertisement click data (including demographic characteristics of users, basic attribute characteristics of advertisements and situation characteristics corresponding to click behaviors), and after preprocessing and model training, a trained integrated diversity enhanced ultra-deep factorization model is obtained and can be used for predicting the click rate of the user advertisement, namely predicting the probability of clicking one advertisement by the user.

An application of an integrated diversity enhanced ultra-deep factorization model in commodity purchasing prediction of a user. When the method is applied to a recommendation system, the original data is user purchasing behavior data (including user demographic characteristics, basic attribute characteristics of the commodity and situation characteristics corresponding to purchasing behaviors), and can be used for predicting whether the user will purchase a new commodity.

The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims

1. A construction method of an integrated diversity enhanced ultra-deep factorization machine model, which is characterized by comprising the following steps:

2. The method for constructing the integrated diversity enhanced ultra-deep factorization machine model according to claim 1, wherein the concrete process of the step (2) is as follows:

Wherein i is 1,2, …, m, m is the number of the feature vectors;

second, all the embedded vectors are represented as x_{emb_i}Splicing to obtain initial feature vector representation x₀＝[x_{emb_1},x_{emb_2},…,x_{emb_m}]And an initial feature matrix

3. The method for constructing an integrated diversity enhanced ultra-deep factorization model as claimed in claim 1, wherein in the step (3),

output matrix of the 1 st cross layer

wherein the content of the first and second substances,

representing the l row vector in the output matrix of the 1 st cross layer, l is more than or equal to 1 and less than or equal to e₁，e₁Is the output matrix X of the 1 st cross layer¹The number of rows of (a) to (b),

is X¹The weight parameter of the l-th row in (c),

and

representing the Hadamard product between the vectors;

output matrix X of k-th cross layer^kFrom an initial feature matrix X⁰And the output matrix X of the previous cross layer^k-1The calculation is carried out in the following specific way:

wherein the content of the first and second substances,

representing the l row vector in the output matrix of the k-th cross layer, l is more than or equal to 1 and less than or equal to e_k，e_kIs the output matrix X of the k-th cross layer^kK is 2,3,4, …K, K is the number of crossing layers,

is X^kThe weight parameter of the l-th row in (1).

4. The method for constructing an integrated diversity enhanced ultra-deep factorization model as claimed in claim 1, wherein in the step (3),

output matrix X for each cross layer separately^kRow vector of

5. the method for constructing an integrated diversity enhanced ultra-deep factorization model as claimed in claim 1, wherein in the step (4),

based on the negative correlation learning theory in ensemble learning, the output matrix X of different cross layers^kAnd (3) calculating the diversity index Div in the following specific calculation mode:

wherein the content of the first and second substances,

measure the row vector in each cross layer

And all row vector means of the cross layer

The euclidean distance between.

6. The method for constructing an integrated diversity enhanced ultra-deep factorization model as claimed in claim 1, wherein in the step (4),

by introducing a self-attention mechanism, the output vector s of each cross layer is converted into a vector^kSending the data into a multilayer perceptron to obtain the weight value a of the diversity index Div of different cross layers_kThe specific calculation method is as follows:

a′_k＝h^TReLU(Ws^k+b)

where h, W, and b are learnable parameters, and ReLU (-) is a non-linear activation function.

7. The method for constructing an integrated diversity enhanced ultra-deep factorization model as claimed in claim 1, wherein in the step (5),

according to the diversity index and the weight value a of different cross layers_kComputing loss of diversity

wherein D is all training sample sets in the batch, and N is the total number of samples;

according to the sample label value y and the predicted value

Loss of computational accuracy

The specific calculation method is as follows:

total loss of mass

By loss of accuracy

Loss of diversity

And L2 regular term λ_nThe method comprises the following three parts:

wherein λ is_dTo control parameters that balance between loss of diversity and loss of accuracyAnd Θ denotes all parameters.

8. An integrated diversity-enhanced extreme depth factorization machine model, which is constructed by the method for constructing the integrated diversity-enhanced extreme depth factorization machine model according to any one of claims 1 to 7.

9. An application of an integrated diversity enhanced ultra-deep factorization machine model in advertisement click rate prediction is characterized in that user advertisement click data and a corresponding click label are used as samples, the integrated diversity enhanced ultra-deep factorization machine model for predicting advertisement click rate is constructed by the construction method of the integrated diversity enhanced ultra-deep factorization machine model according to claims 1-7, and when the application is performed, the user advertisement click data is input into the integrated diversity enhanced ultra-deep factorization machine model, so that whether a user clicks an advertisement or not can be predicted.

10. An application of an integrated diversity enhanced ultra-deep factorization machine model in commodity purchasing prediction of a user is characterized in that user purchasing behavior data and a corresponding purchasing label are used as samples, the integrated diversity enhanced ultra-deep factorization machine model used for purchasing commodities is constructed by the method for constructing the integrated diversity enhanced ultra-deep factorization machine model according to claims 1-7, and in application, the user purchasing behavior data is input into the integrated diversity enhanced ultra-deep factorization machine model, so that whether the user purchases commodities can be predicted.