CN113326433A - Personalized recommendation method based on ensemble learning - Google Patents

Personalized recommendation method based on ensemble learning Download PDF

Info

Publication number
CN113326433A
CN113326433A CN202110629501.6A CN202110629501A CN113326433A CN 113326433 A CN113326433 A CN 113326433A CN 202110629501 A CN202110629501 A CN 202110629501A CN 113326433 A CN113326433 A CN 113326433A
Authority
CN
China
Prior art keywords
data
user
score
personalized recommendation
test
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110629501.6A
Other languages
Chinese (zh)
Other versions
CN113326433B (en
Inventor
段勇
杨堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenyang University of Technology
Original Assignee
Shenyang University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenyang University of Technology filed Critical Shenyang University of Technology
Publication of CN113326433A publication Critical patent/CN113326433A/en
Application granted granted Critical
Publication of CN113326433B publication Critical patent/CN113326433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning

Abstract

The invention relates to the field of machine learning and recommendation systems, in particular to an ensemble learning-based personalized recommendation method. The data preprocessing module is mainly responsible for reintegrating data features, and solves the problem of difficult extraction of complex features by constructing new features and popularly learning dimensionality reduction; the model establishing and optimizing module is mainly responsible for establishing a personalized ensemble learning prediction model based on the fused data, and carrying out Bayesian optimization on the basis of the establishment of the prediction model to improve the accuracy of personalized recommendation; and the personalized recommendation module is mainly responsible for acquiring a data result of the prediction model, and acquiring and verifying a personalized recommendation result by a Top N recommendation method. The method can improve the accuracy of personalized recommendation through ensemble learning; in addition, the method integrates popular learning to reduce dimensions and realize the integration of data features, thereby solving the problem of difficult extraction of complex features.

Description

Personalized recommendation method based on ensemble learning
Technical Field
The invention relates to the field of machine learning and recommendation systems, in particular to a personalized recommendation method based on popular learning LPP (local Preserving projection algorithm) and integrated learning GBDT (Gradient Boosting Decision Tree).
Background
In recent years, with the continuous update of internet technology and computer technology, the internet brings huge information data volume and also aggravates the phenomenon of information overload. Although the selection range of information resources for users is expanded, how to quickly and effectively screen out information useful for the users from huge data becomes a great problem in the development of the contemporary internet. Many existing web applications (e.g., web portals, search engines, etc.) are essentially one way to help users filter information. However, these methods can only meet the mainstream requirements of users, the problem of personalization is not considered, and the problem of information overload is still not solved well. Personalized recommendation is an important information filtering means and is an effective method for solving the problem of information overload.
With the development of the machine learning era, it is a trend to apply the machine learning method to the field of recommendation algorithms. Personalized recommendation also relies on many machine learning methods such as support vector machines, decision trees, neural networks, deep learning, clustering, dimensionality reduction, regression prediction, ensemble learning, and the like. The personalized recommendation method based on machine learning can effectively solve the problems that a similarity calculation method is monotonous, the similarity calculation complexity is high, the potential interest of a user is difficult to mine, user tag information and demographic information are difficult to utilize, commodity feature extraction is difficult and the like, but the user tag information, the demographic information and the commodity feature information are poor in effect in the aspect of solving the cold start problem and are necessary information for obtaining the potential interest of the user.
Disclosure of Invention
Object of the Invention
The invention provides an individualized recommendation method based on a local retention projection algorithm and ensemble learning, and aims to solve the problem of information overload in a recommendation system and improve the efficiency and precision of individualized recommendation.
Technical scheme
A personalized recommendation method based on ensemble learning is characterized by comprising the following steps:
step 1: analyzing the dimension attribute of the personalized recommendation data, and dividing the personalized recommendation data into user-article-score data; performing data association on the associated user-item-scoring dimension;
step 2: after the processing is finished, analyzing the data type of each dimension attribute of the user-article-score, and converting the data type into the data type required by ensemble learning;
and step 3: generating characteristic attributes according to the 'grading' attributes in the 'user-item-grading' dimension attributes;
and 4, step 4: all the obtained data were normalized and calculated as follows:
Figure RE-GDA0003155284190000021
wherein vv represents the original value of the data, v' represents the value after normalization processing, min represents the minimum value of the column where vv is located, and max represents the maximum value of the column where vv is located;
and 5: let the "user-item-score" dataset A in the original space have m sample points x1,x2,...,xmSample point xiIs a vector of dimension l, i is an integer from 1 to m, and a matrix formed by the m samples according to columns is X; performing dimension reduction on the data set A by using a popular learning LPP method, wherein the dimension-reduced data set B is formed by sample points y1,y2,...,ymComposition, sample point yiIs an n-dimensional vector, the m samples form a matrix Y according to columns, wherein l is more than n;
step 6: and D, performing dimension reduction on the data set B according to the following steps of 8: 2, dividing the ratio of the training set Train into a training set Train and a Test set Test, wherein a data matrix corresponding to the training set Train is Y';
and 7: establishing an individualized recommendation model by adopting an ensemble learning GBDT method;
and 8: optimizing GBDT model parameters by adopting a Bayesian method;
and step 9: selecting the optimal hyper-parameter combination obtained through Bayesian optimization to retrain the GBDT personalized recommendation model;
step 10: and performing Top N recommendation and effect verification according to the finally obtained prediction result of the personalized recommendation model on the test set.
In step 3, the number of times each user scores the item is counted, and the formula is as follows:
Figure RE-GDA0003155284190000031
b represents the b-th user in a data set A of 'user-item-scoring', wherein d users exist in the data set A, (b) is the score of the user b for each item, and CountRating means 'the sum of times of article review by each user'.
The step 5 specifically includes the following steps:
step 5.1: constructing a graph, and calculating a sample x in a user-article-score data set AiAnd sample xjThe euclidean distances of all samples except for the one shown below are as follows:
Figure RE-GDA0003155284190000032
wherein epsilon is a manually set threshold value, the average value of samples is taken, m is the total number of the samples in the data set, if the Euclidean distance is smaller than the value epsilon, the two samples are considered to be very close to each other, and one side is established between a node i and a node j of the graph;
step 5.2: determining the weight, if the node i is connected with the node j, the weight of the edge between the node i and the node j is calculated by the following formula of the nuclear thermal function:
Figure RE-GDA0003155284190000033
ωijrepresenting the weight, x, between i nodes and j nodesiAnd xjFor samples in the "user-item-score" dataset a, t is a manually set real number greater than 0;
step 5.3: calculating a projection matrix, wherein a formula for calculating the projection matrix is as follows:
XLXTa=λXDXTa (5)
suppose the solution in the formula is a0,a1,...,al-1And their corresponding eigenvalues λ are ordered from small to large, the projective transformation matrix is C ═ a0,a1,...,al-1) And then the reduced sample point yi=CTxi
Wherein X is the matrix X mentioned in step 5, and the adjacent matrix W is determined by the weight ω in step twoijForming; the main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, wherein the weighted degree of the node i is the sum of the weights of all the edges associated with the node, i.e. the sum of each row element of the adjacency matrix W; the placian matrix L is defined as L ═ D-W.
The step 7 comprises the following steps:
step 7.1: the GBDT model is defined by the following formula:
Figure RE-GDA0003155284190000041
y 'is Y' mentioned in step 6, K is the round of the scoring prediction learner, and K is the total round of the scoring prediction learner; f. ofk(Y') score prediction learning for the k-th roundDevice, hk(Y') represents the kth CART (Classification and Regression Trees) decision Regression tree;
step 7.2: constructing a CART decision regression tree, namely h (Y') in the step 7.1;
step 7.3: the scoring prediction learning device adopts a forward step-by-step algorithm; the model of the k step is formed by the model of the k-1 step, namely the k step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
fk(Y′)=fk-1(Y′)+βk (7)
fk(Y') prediction learner for k-th round of scoring, fk-1(Y') prediction learner for k-1 st round of scoring, betakRepresenting the residual error generated in the k round;
step 7.4: and continuing iteration until the iteration is completed, and completing model building.
In the step 7.2, the method comprises the following steps:
step 7.21: partitioning the preprocessed data set B into H1,H2,...HoThe output value of each region is respectively as follows: p is a radical of1,p2,...,po
Step 7.22: recursively dividing each region into two sub-regions and determining an output value on each sub-region; selecting an optimal segmentation variable q and a segmentation point s according to the following formula;
Figure RE-GDA0003155284190000051
p1for the region H divided in step 7.211Output of p2For the region H divided in step 7.212Output of uvAnd wvRespectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the maximum value of v is the number of divided region samples; traversing the variable q, scanning a segmentation point s for the fixed segmentation variable q, and selecting a pair (q, s) which enables the above formula to reach the minimum value; dividing the region by the selected pair (q, s) and determiningA corresponding output value;
step 7.23: continuing to call steps 7.21 and 7.22 for the two sub-regions until a stop condition is met;
step 7.24: repartitioning of input space into o regions H'1,H′2,...H′oGenerating a score prediction CART decision regression tree, wherein the formula is as follows:
Figure RE-GDA0003155284190000052
h (u) is a predicted CART decision regression tree, H'vFor the divided areas, O is the subscript of the divided areas, and O is the total number of the divided areas; p is a radical ofoFor fixed output values of the region partitioned in step 7.21, q 'and s' are the optimal solutions iterated through step 7.21 and step 7.22.
The step 8 comprises the following steps:
step 8.1: initializing dataset D '═ x'1,y′1),...,(x′n,y′n) Wherein, y'i=f′(x′i) (ii) a f '(x') is the mapping relation from the dimension attribute to the score in the data;
step 8.2: GBDT model uses selected hyper-parameter combinations x'iTraining is performed to calculate f '(x'i);
Step 8.3: calculating the next super-reference combination to the super-reference x 'by adopting a collection function'i+1
Step 8.4: repeating the step 8.2 and the step 8.3, and iterating for T' times;
step 8.5: the hyper-parametric combination of the optimized objective function f '(x') is output.
The step 10 includes the steps of:
step 10.1: setting N values, namely N items recommended to the users, and defining the number of the users as count;
step 10.2: aiming at each user, a real recommendation list generated on a Test set Test is marked as T (all), grading prediction is carried out on the Test set Test according to the GBDT recommendation model completed by the Bayesian optimization, and an obtained result is defined as a Test evaluation set;
step 10.3: scoring and sequencing the test scoring sets, recommending the first N articles to the users, and recording a Top N recommendation list obtained by each user as T (test);
step 10.4: verifying the accuracy and recall result of the test evaluation set;
step 10.5: calculating the length size of T (test);
step 10.6: and calculating the length size of T (all);
step 10.7: calculating the intersection T (U) of the Top N recommendation list of each user and T (test);
step 10.8: calculating the accuracy:
Figure RE-GDA0003155284190000061
accumulating the accuracy rate generated by each user, and dividing the sum by the count to obtain the average accuracy rate;
step 10.9: calculating the recall ratio:
Figure RE-GDA0003155284190000062
and accumulating the recall rate generated by each user, and dividing the sum of the recall rates by the count to obtain the average recall rate.
Advantages and effects
1. The invention utilizes the related technology in the field of machine learning, aims at the problem of information overload in the current society, solves the problem of difficult extraction of complex features through popular learning, reduces the dimension information of data feature attributes, reduces the model training time, improves the learning capability of the model, and greatly improves the recommendation efficiency.
2. Personalized recommendation is performed through ensemble learning, and a recommendation model is optimized through Bayesian, so that recommendation precision is improved, useful information can be quickly and effectively screened from huge data, and utilization efficiency of the information is improved.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flow chart of data feature preprocessing;
fig. 3 is a flow chart of personalized recommendation.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings.
A personalized recommendation method based on popular learning LPP and integrated learning GBDT can improve the accuracy of personalized recommendation through integrated learning; in addition, the method integrates popular learning to reduce dimensions and realize the integration of data features, thereby solving the problem of difficult extraction of complex features.
FIG. 1 is a general flow chart of the present invention, which includes the following 10 steps, wherein steps 1-6 are the recommended data preprocessing portion of FIG. 1; step 7 is constructing a personalized recommendation model part in the attached figure 1; step 8 and step 9 are the optimization model part in fig. 1; step 10 is the personalized recommendation part in fig. 1.
The data preprocessing module is mainly responsible for reintegrating data features, and solves the problem of difficult extraction of complex features by constructing new features and popularly learning dimension reduction; the model establishing and optimizing module is mainly responsible for establishing a personalized ensemble learning prediction model based on the fused data, and carrying out Bayesian optimization on the basis of the establishment of the prediction model to improve the accuracy of personalized recommendation; and the personalized recommendation module is mainly responsible for acquiring a data result of the prediction model, and acquiring and verifying a personalized recommendation result by a Top N recommendation method.
The detailed steps are as follows:
a recommended data preprocessing part:
FIG. 2 is a flow chart of the characteristic data preprocessing of the present invention, and the specific implementation steps are as follows:
step 1: analyzing the dimension attribute of the personalized recommendation data, and dividing the personalized recommendation data into user-article-score data; and performing data association on the associated user-item-scoring dimension.
Step 2: and after the processing is finished, analyzing the data type of each dimension attribute, namely 'user-item-scoring', and converting the data type into the data type required by ensemble learning.
And step 3: generating characteristic attributes according to the 'grading' attributes in the 'user-item-grading' dimension attributes, wherein the formula is as follows:
Figure RE-GDA0003155284190000081
b represents the b-th user in a data set A of 'user-item-scoring', wherein the data set A comprises d users in total, and R (b) is the scoring of the user b for each item.
And 4, step 4: all the obtained data were normalized and calculated as follows:
Figure RE-GDA0003155284190000082
wherein vv represents the original value of the data, v' represents the value after normalization, min represents the minimum value of the column in which vv is located, and max represents the maximum value of the column in which vv is located.
And 5: let the "user-item-score" dataset A in the original space have m sample points x1,x2,...,xmSample point xiIs a vector of dimension l, i is an integer from 1 to m, and the matrix formed by the m samples according to columns is X. Performing dimension reduction processing on the data set A by using the popular learning LPP, wherein the data set B after dimension reduction is a data set formed by sample points y1,y2,...,ymComposition, sample point yiIs an n-dimensional vector, the m samples form a matrix of Y in columns, where l > n. The method comprises the following specific steps:
step 5.1: constructing a graph, and calculating a sample x in a user-article-score data set AiAnd sample xjThe euclidean distances of all samples except for the one shown below are as follows:
Figure RE-GDA0003155284190000083
wherein epsilon is a manually set threshold value, generally, the average value of samples is taken, m is the total number of samples in the data set, if the distance is less than a certain value epsilon, two samples are considered to be very close, and one side is established between a node i and a node j of the graph.
Step 5.2: determining the weight, if the node i is connected with the node j, the weight of the edge between the node i and the node j is calculated by the following formula of the nuclear thermal function:
Figure RE-GDA0003155284190000091
ωijrepresenting the weight, x, between i nodes and j nodesiAnd xjFor the samples in the "user-item-score" dataset a, t is an artificially set real number greater than 0.
Step 5.3: and calculating a projection matrix, wherein the formula for calculating the projection matrix is as follows.
XLXTa=λXDXTa
Suppose the solution in the formula is a0,a1,...,al-1And their corresponding eigenvalues λ are ordered from small to large, the projective transformation matrix is C ═ a0,a1,...,al-1) And then the reduced sample point yi=CTxi
Wherein the adjacency matrix W is determined by the weight ω in step twoijAnd (4) forming. The main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, where the weighted degree of the node i is the sum of the weights of all the edges associated with the node, i.e. the sum of each row element of the adjacency matrix W. The placian matrix L is defined as L ═ D-W.
Step 6: and D, performing dimension reduction on the data set B according to the following steps of 8: 2 into a training set Train and a Test set Test, wherein the data matrix corresponding to the training set Train is Y'.
Constructing a personalized recommendation model part:
and 7: an integrated learning GBDT method is adopted to establish a personalized recommendation model, a process schematic diagram is shown in an attached figure 3, and the method comprises the following specific steps:
step 7.1: the GBDT model is defined by the following formula:
Figure RE-GDA0003155284190000092
y 'is Y' mentioned in step 6, K is the round of the score prediction learner, and K is the total iteration number of the score prediction learner. f. ofk(Y') score prediction learner for k-th round, hk(Y') represents the kth CART decision regression tree.
Step 7.2: constructing a CART decision regression tree, namely h (Y') in the step 7.1, and specifically comprising the following steps:
step 7.21: partitioning the preprocessed data set B into H1,H2,...HoThe output value of each region is respectively as follows: p is a radical of1,p2,...,po
Step 7.22: recursively divides each region into two sub-regions and determines an output value on each sub-region. And selecting an optimal segmentation variable q and a segmentation point s according to the following formula.
Figure RE-GDA0003155284190000101
p1For the region H divided in step 7.211Output of p2For the region H divided in step 7.212Output of uvAnd wvRespectively expressed as the characteristic attribute and the score of the data in the corresponding region, wherein the maximum value of v is the number of samples in the divided region. The variable q is traversed, the fixed segmentation variable q is scanned for segmentation points s, and the pair (q, s) that makes the above formula reach the minimum value is selected. The selected pair (q, s) is used to divide the region and determine the corresponding output value.
Step 7.23: the steps 7.21 and 7.22 are continued to be invoked for both sub-areas until the stop condition is fulfilled.
Step 7.24: repartitioning of input space into o regions H'1,H′2,...H′oGenerating a score prediction CART decision regression tree, wherein the formula is as follows:
Figure RE-GDA0003155284190000102
h (u) is a predicted CART decision regression tree, H'vFor the divided regions, O is indicated as a divided region index, and O is indicated as the total number of divided regions. p is a radical ofoFor fixed output values of the region partitioned in step 7.21, q 'and s' are the optimal solutions iterated through step 7.21 and step 7.22.
Step 7.3: the scoring prediction learner adopts a forward step-by-step algorithm. The model of the k step is formed by the model of the k-1 step, namely the k step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
fk(Y′)=fk-1(Y′)+βk
fk(Y') prediction learner for k-th round of scoring, fk-1(Y') prediction learner for k-1 st round of scoring, betakRepresenting the residual error produced by the k-th round.
Step 7.4: and continuing iteration until the iteration is completed, and completing model building.
And (3) optimizing a model part:
and 8: a Bayesian method is adopted to optimize GBDT model parameters, and the specific steps are as follows:
step 8.1: initializing dataset D '═ x'1,y′1),...,(x′n,y′n) Wherein, y'i=f′(x′i) (ii) a The objective function f '(x') is the mapping of the dimension attributes in the data to the scores.
Step 8.2: GBDT model uses selected hyper-parameter combinations x'iTraining is performed to calculate f '(x'i);
Step 8.3: calculating the next super-reference combination to the super-reference x 'by adopting a collection function'i+1
Step 8.4: and repeating the steps of 8.2 and 8.3, and iterating for T' times.
Step 8.5: the hyper-parametric combination of the optimized objective function f '(x') is output.
And step 9: and selecting the optimal hyperparameter combination obtained through Bayesian optimization to retrain the GBDT personalized recommendation model.
The personalized recommendation part:
step 10: and performing Top N recommendation and effect verification according to the finally obtained prediction result of the personalized recommendation model on the Test set Test, and specifically comprising the following steps:
step 10.1: setting the N value, namely N items recommended to the user, and defining the number of the users as count.
Step 10.2: and (3) aiming at each user, recording a real recommendation list generated on the Test set Test as T (all), and performing score prediction on the Test set Test according to the GBDT recommendation model completed by the Bayesian optimization, wherein an obtained result is defined as the Test score set.
Step 10.3: and grading and sequencing the test scoring set, recommending the first N items to the user, and recording a Top N recommendation list obtained by each user as T (test).
Step 10.4: and verifying the accuracy and recall result of the test evaluation set.
Step 10.5: the length size of T (test) is calculated.
Step 10.6: and calculates the T (all) length size.
Step 10.7: and calculating the intersection T (U) of the Top N recommendation list of each user and T (test).
Step 10.8: calculating the accuracy:
Figure RE-GDA0003155284190000121
and accumulating the accuracy rate generated by each user, and dividing the sum by the count to obtain the average accuracy rate.
Step 10.9: calculating the recall ratio:
Figure RE-GDA0003155284190000122
and accumulating the recall rate generated by each user, and dividing the sum of the recall rates by the count to obtain the average recall rate.
The technical characteristics form an embodiment of the invention, which has strong adaptability and implementation effect, and unnecessary technical characteristics can be increased or decreased according to actual needs to meet the requirements of different situations.

Claims (7)

1. A personalized recommendation method based on ensemble learning is characterized by comprising the following steps:
step 1: analyzing the dimension attribute of the personalized recommendation data, and dividing the personalized recommendation data into user-article-score data; performing data association on the associated user-item-scoring dimension;
step 2: after the processing is finished, analyzing the data type of each dimension attribute of the user-article-score, and converting the data type into the data type required by ensemble learning;
and step 3: generating characteristic attributes according to the 'grading' attributes in the 'user-item-grading' dimension attributes;
and 4, step 4: all the obtained data were normalized and calculated as follows:
Figure FDA0003103091740000011
wherein vv represents the original value of the data, v' represents the value after normalization processing, min represents the minimum value of the column where vv is located, and max represents the maximum value of the column where vv is located;
and 5: let the "user-item-score" dataset A in the original space have m sample points x1,x2,...,xmSample point xiIs a vector of dimension l, i is an integer from 1 to m, and a matrix formed by the m samples according to columns is X; performing dimensionality reduction on the data set A by using a popular learning local preserving projection algorithm, wherein the data set B subjected to dimensionality reduction is a data set formed by sample points y1,y2,...,ymComposition, sample point yiIs an n-dimensional vector, the m samples form a matrix Y according to columns, wherein l is more than n;
step 6: and D, performing dimension reduction on the data set B according to the following steps of 8: 2, dividing the ratio of the training set Train into a training set Train and a Test set Test, wherein a data matrix corresponding to the training set Train is Y';
and 7: establishing an individualized recommendation model by adopting an ensemble learning gradient boosting decision tree method;
and 8: optimizing gradient boosting decision tree model parameters by adopting a Bayes method;
and step 9: selecting the optimal hyper-parameter combination obtained through Bayesian optimization to retrain the gradient boosting decision tree personalized recommendation model;
step 10: and performing Top N recommendation and effect verification according to the finally obtained prediction result of the personalized recommendation model on the test set.
2. The ensemble learning-based personalized recommendation method according to claim 1, wherein: in step 3, the number of times each user scores the item is counted, and the formula is as follows:
Figure FDA0003103091740000021
b represents the b-th user in a data set A of 'user-item-scoring', wherein d users exist in the data set A, (b) is the score of the user b for each item, and CountRating means 'the sum of times of article review by each user'.
3. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 5 specifically includes the following steps:
step 5.1: constructing a graph, and calculating a sample x in a user-article-score data set AiAnd sample xjThe euclidean distances of all samples except for the one shown below are as follows:
Figure FDA0003103091740000022
wherein epsilon is a manually set threshold value, the average value of samples is taken, m is the total number of the samples in the data set, if the Euclidean distance is smaller than the value epsilon, the two samples are considered to be very close to each other, and one side is established between a node i and a node j of the graph;
step 5.2: determining the weight, if the node i is connected with the node j, the weight of the edge between the node i and the node j is calculated by the following formula of the nuclear thermal function:
Figure FDA0003103091740000023
ωijrepresenting the weight, x, between i nodes and j nodesiAnd xjFor samples in the "user-item-score" dataset a, t is a manually set real number greater than 0;
step 5.3: calculating a projection matrix, wherein a formula for calculating the projection matrix is as follows:
XLXTa=λXDXTa (5)
suppose the solution in the formula is a0,a1,...,al-1And their corresponding eigenvalues λ are ordered from small to large, the projective transformation matrix is C ═ a0,a1,...,al-1) And then the reduced sample point yi=CTxi
Wherein X is the matrix X mentioned in step 5, and the adjacent matrix W is determined by the weight ω in step twoijForming; the main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, wherein the weighted degree of the node i is the sum of the weights of all the edges associated with the node, i.e. the sum of each row element of the adjacency matrix W; the placian matrix L is defined as L ═ D-W.
4. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 7 comprises the following steps:
step 7.1: the gradient boosting decision tree model is defined, and the formula is as follows:
Figure FDA0003103091740000031
y 'is Y' mentioned in step 6, K is the round of the scoring prediction learner, and K is the total round of the scoring prediction learner; f. ofk(Y') score prediction learner for k-th round, hk(Y') representing the kth classification regression decision tree;
step 7.2: constructing a classification regression decision tree, namely h (Y') in the step 7.1;
step 7.3: the scoring prediction learning device adopts a forward step-by-step algorithm; the model of the k step is formed by the model of the k-1 step, namely the k step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
fk(Y′)=fk-1(Y′)+βk (7)
fk(Y') prediction learner for k-th round of scoring, fk-1(Y') prediction learner for k-1 st round of scoring, betakRepresenting the residual error generated in the k round;
step 7.4: and continuing iteration until the iteration is completed, and completing model building.
5. The ensemble learning-based personalized recommendation method according to claim 4, wherein: in the step 7.2, the method comprises the following steps:
step 7.21: partitioning the preprocessed data set B into H1,H2,...HoThe output value of each region is respectively as follows: p is a radical of1,p2,...,po
Step 7.22: recursively dividing each region into two sub-regions and determining an output value on each sub-region; selecting an optimal segmentation variable q and a segmentation point s according to the following formula;
Figure FDA0003103091740000041
p1for the region H divided in step 7.211Output of p2For the region H divided in step 7.212Output of uvAnd wvRespectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the maximum value of v is the number of divided region samples; traversing the variable q, scanning a segmentation point s for the fixed segmentation variable q, and selecting a pair (q, s) which enables the above formula to reach the minimum value; dividing the region by the selected pair (q, s) and determining a corresponding output value;
step 7.23: continuing to call steps 7.21 and 7.22 for the two sub-regions until a stop condition is met;
step 7.24: repartitioning of input space into o regions H'1,H′2,...H′oGenerating a score prediction classification regression decision tree, wherein the formula is as follows:
Figure FDA0003103091740000042
h (u) is a score prediction classification regression decision tree, H'vFor the divided areas, O is the subscript of the divided areas, and O is the total number of the divided areas; p is a radical ofoFor fixed output values of the region partitioned in step 7.21, q 'and s' are the optimal solutions iterated through step 7.21 and step 7.22.
6. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 8 comprises the following steps:
step 8.1: initializing dataset D '═ x'1,y′1),...,(x′n,y′n) Wherein, y'i=f′(x′i) (ii) a f '(x') is the mapping relation from the dimension attribute to the score in the data;
step 8.2: gradient lifting decision tree model uses selected super-reference combination x'iTraining is performed to calculate f '(x'i);
And 8, step 8.3: calculating the next super-reference combination to the super-reference x 'by adopting a collection function'i+1
Step 8.4: repeating the step 8.2 and the step 8.3, and iterating for T' times;
step 8.5: the hyper-parametric combination of the optimized objective function f '(x') is output.
7. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 10 includes the steps of:
step 10.1: setting N values, namely N items recommended to the users, and defining the number of the users as count;
step 10.2: aiming at each user, a real recommendation list generated on a Test set Test is marked as T (all), grading prediction is carried out on the Test set Test according to a gradient promotion decision tree recommendation model completed by Bayesian optimization, and an obtained result is defined as a Test grading set;
step 10.3: scoring and sequencing the test scoring sets, recommending the first N articles to the users, and recording a Top N recommendation list obtained by each user as T (test);
step 10.4: verifying the accuracy and recall result of the test evaluation set;
step 10.5: calculating the length size of T (test);
step 10.6: and calculating the length size of T (all);
step 10.7: calculating the intersection T (U) of the Top N recommendation list of each user and T (test);
step 10.8: calculating the accuracy:
Figure FDA0003103091740000051
accumulating the accuracy rate generated by each user, and dividing the sum by the count to obtain the average accuracy rate;
step 10.9: calculating the recall ratio:
Figure FDA0003103091740000052
and accumulating the recall rates generated by each user, and dividing the sum by the countTo average recall.
CN202110629501.6A 2021-03-26 2021-06-07 Personalized recommendation method based on ensemble learning Active CN113326433B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021103231807 2021-03-26
CN202110323180 2021-03-26

Publications (2)

Publication Number Publication Date
CN113326433A true CN113326433A (en) 2021-08-31
CN113326433B CN113326433B (en) 2023-10-10

Family

ID=77419834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110629501.6A Active CN113326433B (en) 2021-03-26 2021-06-07 Personalized recommendation method based on ensemble learning

Country Status (1)

Country Link
CN (1) CN113326433B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843928A (en) * 2016-03-28 2016-08-10 西安电子科技大学 Recommendation method based on double-layer matrix decomposition
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN110109902A (en) * 2019-03-18 2019-08-09 广东工业大学 A kind of electric business platform recommender system based on integrated learning approach
CN110297978A (en) * 2019-06-28 2019-10-01 四川金蜜信息技术有限公司 Personalized recommendation algorithm based on integrated recurrence
CN110348580A (en) * 2019-06-18 2019-10-18 第四范式(北京)技术有限公司 Construct the method, apparatus and prediction technique, device of GBDT model
WO2020233245A1 (en) * 2019-05-20 2020-11-26 山东科技大学 Method for bias tensor factorization with context feature auto-encoding based on regression tree
CN112183946A (en) * 2020-09-07 2021-01-05 腾讯音乐娱乐科技(深圳)有限公司 Multimedia content evaluation method, device and training method thereof

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105843928A (en) * 2016-03-28 2016-08-10 西安电子科技大学 Recommendation method based on double-layer matrix decomposition
CN108763362A (en) * 2018-05-17 2018-11-06 浙江工业大学 Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point
CN110109902A (en) * 2019-03-18 2019-08-09 广东工业大学 A kind of electric business platform recommender system based on integrated learning approach
WO2020233245A1 (en) * 2019-05-20 2020-11-26 山东科技大学 Method for bias tensor factorization with context feature auto-encoding based on regression tree
CN110348580A (en) * 2019-06-18 2019-10-18 第四范式(北京)技术有限公司 Construct the method, apparatus and prediction technique, device of GBDT model
CN110297978A (en) * 2019-06-28 2019-10-01 四川金蜜信息技术有限公司 Personalized recommendation algorithm based on integrated recurrence
CN112183946A (en) * 2020-09-07 2021-01-05 腾讯音乐娱乐科技(深圳)有限公司 Multimedia content evaluation method, device and training method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
聂黎生;: "基于行为分析的学习资源个性化推荐", 计算机技术与发展, no. 07 *

Also Published As

Publication number Publication date
CN113326433B (en) 2023-10-10

Similar Documents

Publication Publication Date Title
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
Chen et al. A matting method based on full feature coverage
CN112101430B (en) Anchor frame generation method for image target detection processing and lightweight target detection method
CN109948149B (en) Text classification method and device
CN107683469A (en) A kind of product classification method and device based on deep learning
US9323886B2 (en) Performance predicting apparatus, performance predicting method, and program
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN108897791B (en) Image retrieval method based on depth convolution characteristics and semantic similarity measurement
CN114841257B (en) Small sample target detection method based on self-supervision comparison constraint
CN113674334B (en) Texture recognition method based on depth self-attention network and local feature coding
CN105184260A (en) Image characteristic extraction method, pedestrian detection method and device
CN107918772A (en) Method for tracking target based on compressive sensing theory and gcForest
CN107291825A (en) With the search method and system of money commodity in a kind of video
CN111738055A (en) Multi-class text detection system and bill form detection method based on same
CN111833322B (en) Garbage multi-target detection method based on improved YOLOv3
CN110929848A (en) Training and tracking method based on multi-challenge perception learning model
CN112115291B (en) Three-dimensional indoor model retrieval method based on deep learning
CN110020435B (en) Method for optimizing text feature selection by adopting parallel binary bat algorithm
US20220245510A1 (en) Multi-dimensional model shape transfer
CN107491782A (en) Utilize the image classification method for a small amount of training data of semantic space information
Castellano et al. Deep convolutional embedding for digitized painting clustering
CN115641177A (en) Prevent second and kill prejudgement system based on machine learning
CN116610778A (en) Bidirectional image-text matching method based on cross-modal global and local attention mechanism
Saqib et al. Intelligent dynamic gesture recognition using CNN empowered by edit distance
CN108257148B (en) Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant