CN113326433A - Personalized recommendation method based on ensemble learning - Google Patents
Personalized recommendation method based on ensemble learning Download PDFInfo
- Publication number
- CN113326433A CN113326433A CN202110629501.6A CN202110629501A CN113326433A CN 113326433 A CN113326433 A CN 113326433A CN 202110629501 A CN202110629501 A CN 202110629501A CN 113326433 A CN113326433 A CN 113326433A
- Authority
- CN
- China
- Prior art keywords
- data
- user
- score
- personalized recommendation
- test
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
- G06F18/24155—Bayesian classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Abstract
The invention relates to the field of machine learning and recommendation systems, in particular to an ensemble learning-based personalized recommendation method. The data preprocessing module is mainly responsible for reintegrating data features, and solves the problem of difficult extraction of complex features by constructing new features and popularly learning dimensionality reduction; the model establishing and optimizing module is mainly responsible for establishing a personalized ensemble learning prediction model based on the fused data, and carrying out Bayesian optimization on the basis of the establishment of the prediction model to improve the accuracy of personalized recommendation; and the personalized recommendation module is mainly responsible for acquiring a data result of the prediction model, and acquiring and verifying a personalized recommendation result by a Top N recommendation method. The method can improve the accuracy of personalized recommendation through ensemble learning; in addition, the method integrates popular learning to reduce dimensions and realize the integration of data features, thereby solving the problem of difficult extraction of complex features.
Description
Technical Field
The invention relates to the field of machine learning and recommendation systems, in particular to a personalized recommendation method based on popular learning LPP (local Preserving projection algorithm) and integrated learning GBDT (Gradient Boosting Decision Tree).
Background
In recent years, with the continuous update of internet technology and computer technology, the internet brings huge information data volume and also aggravates the phenomenon of information overload. Although the selection range of information resources for users is expanded, how to quickly and effectively screen out information useful for the users from huge data becomes a great problem in the development of the contemporary internet. Many existing web applications (e.g., web portals, search engines, etc.) are essentially one way to help users filter information. However, these methods can only meet the mainstream requirements of users, the problem of personalization is not considered, and the problem of information overload is still not solved well. Personalized recommendation is an important information filtering means and is an effective method for solving the problem of information overload.
With the development of the machine learning era, it is a trend to apply the machine learning method to the field of recommendation algorithms. Personalized recommendation also relies on many machine learning methods such as support vector machines, decision trees, neural networks, deep learning, clustering, dimensionality reduction, regression prediction, ensemble learning, and the like. The personalized recommendation method based on machine learning can effectively solve the problems that a similarity calculation method is monotonous, the similarity calculation complexity is high, the potential interest of a user is difficult to mine, user tag information and demographic information are difficult to utilize, commodity feature extraction is difficult and the like, but the user tag information, the demographic information and the commodity feature information are poor in effect in the aspect of solving the cold start problem and are necessary information for obtaining the potential interest of the user.
Disclosure of Invention
Object of the Invention
The invention provides an individualized recommendation method based on a local retention projection algorithm and ensemble learning, and aims to solve the problem of information overload in a recommendation system and improve the efficiency and precision of individualized recommendation.
Technical scheme
A personalized recommendation method based on ensemble learning is characterized by comprising the following steps:
step 1: analyzing the dimension attribute of the personalized recommendation data, and dividing the personalized recommendation data into user-article-score data; performing data association on the associated user-item-scoring dimension;
step 2: after the processing is finished, analyzing the data type of each dimension attribute of the user-article-score, and converting the data type into the data type required by ensemble learning;
and step 3: generating characteristic attributes according to the 'grading' attributes in the 'user-item-grading' dimension attributes;
and 4, step 4: all the obtained data were normalized and calculated as follows:
wherein vv represents the original value of the data, v' represents the value after normalization processing, min represents the minimum value of the column where vv is located, and max represents the maximum value of the column where vv is located;
and 5: let the "user-item-score" dataset A in the original space have m sample points x1,x2,...,xmSample point xiIs a vector of dimension l, i is an integer from 1 to m, and a matrix formed by the m samples according to columns is X; performing dimension reduction on the data set A by using a popular learning LPP method, wherein the dimension-reduced data set B is formed by sample points y1,y2,...,ymComposition, sample point yiIs an n-dimensional vector, the m samples form a matrix Y according to columns, wherein l is more than n;
step 6: and D, performing dimension reduction on the data set B according to the following steps of 8: 2, dividing the ratio of the training set Train into a training set Train and a Test set Test, wherein a data matrix corresponding to the training set Train is Y';
and 7: establishing an individualized recommendation model by adopting an ensemble learning GBDT method;
and 8: optimizing GBDT model parameters by adopting a Bayesian method;
and step 9: selecting the optimal hyper-parameter combination obtained through Bayesian optimization to retrain the GBDT personalized recommendation model;
step 10: and performing Top N recommendation and effect verification according to the finally obtained prediction result of the personalized recommendation model on the test set.
In step 3, the number of times each user scores the item is counted, and the formula is as follows:
b represents the b-th user in a data set A of 'user-item-scoring', wherein d users exist in the data set A, (b) is the score of the user b for each item, and CountRating means 'the sum of times of article review by each user'.
The step 5 specifically includes the following steps:
step 5.1: constructing a graph, and calculating a sample x in a user-article-score data set AiAnd sample xjThe euclidean distances of all samples except for the one shown below are as follows:
wherein epsilon is a manually set threshold value, the average value of samples is taken, m is the total number of the samples in the data set, if the Euclidean distance is smaller than the value epsilon, the two samples are considered to be very close to each other, and one side is established between a node i and a node j of the graph;
step 5.2: determining the weight, if the node i is connected with the node j, the weight of the edge between the node i and the node j is calculated by the following formula of the nuclear thermal function:
ωijrepresenting the weight, x, between i nodes and j nodesiAnd xjFor samples in the "user-item-score" dataset a, t is a manually set real number greater than 0;
step 5.3: calculating a projection matrix, wherein a formula for calculating the projection matrix is as follows:
XLXTa=λXDXTa (5)
suppose the solution in the formula is a0,a1,...,al-1And their corresponding eigenvalues λ are ordered from small to large, the projective transformation matrix is C ═ a0,a1,...,al-1) And then the reduced sample point yi=CTxi;
Wherein X is the matrix X mentioned in step 5, and the adjacent matrix W is determined by the weight ω in step twoijForming; the main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, wherein the weighted degree of the node i is the sum of the weights of all the edges associated with the node, i.e. the sum of each row element of the adjacency matrix W; the placian matrix L is defined as L ═ D-W.
The step 7 comprises the following steps:
step 7.1: the GBDT model is defined by the following formula:
y 'is Y' mentioned in step 6, K is the round of the scoring prediction learner, and K is the total round of the scoring prediction learner; f. ofk(Y') score prediction learning for the k-th roundDevice, hk(Y') represents the kth CART (Classification and Regression Trees) decision Regression tree;
step 7.2: constructing a CART decision regression tree, namely h (Y') in the step 7.1;
step 7.3: the scoring prediction learning device adopts a forward step-by-step algorithm; the model of the k step is formed by the model of the k-1 step, namely the k step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
fk(Y′)=fk-1(Y′)+βk (7)
fk(Y') prediction learner for k-th round of scoring, fk-1(Y') prediction learner for k-1 st round of scoring, betakRepresenting the residual error generated in the k round;
step 7.4: and continuing iteration until the iteration is completed, and completing model building.
In the step 7.2, the method comprises the following steps:
step 7.21: partitioning the preprocessed data set B into H1,H2,...HoThe output value of each region is respectively as follows: p is a radical of1,p2,...,po;
Step 7.22: recursively dividing each region into two sub-regions and determining an output value on each sub-region; selecting an optimal segmentation variable q and a segmentation point s according to the following formula;
p1for the region H divided in step 7.211Output of p2For the region H divided in step 7.212Output of uvAnd wvRespectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the maximum value of v is the number of divided region samples; traversing the variable q, scanning a segmentation point s for the fixed segmentation variable q, and selecting a pair (q, s) which enables the above formula to reach the minimum value; dividing the region by the selected pair (q, s) and determiningA corresponding output value;
step 7.23: continuing to call steps 7.21 and 7.22 for the two sub-regions until a stop condition is met;
step 7.24: repartitioning of input space into o regions H'1,H′2,...H′oGenerating a score prediction CART decision regression tree, wherein the formula is as follows:
h (u) is a predicted CART decision regression tree, H'vFor the divided areas, O is the subscript of the divided areas, and O is the total number of the divided areas; p is a radical ofoFor fixed output values of the region partitioned in step 7.21, q 'and s' are the optimal solutions iterated through step 7.21 and step 7.22.
The step 8 comprises the following steps:
step 8.1: initializing dataset D '═ x'1,y′1),...,(x′n,y′n) Wherein, y'i=f′(x′i) (ii) a f '(x') is the mapping relation from the dimension attribute to the score in the data;
step 8.2: GBDT model uses selected hyper-parameter combinations x'iTraining is performed to calculate f '(x'i);
Step 8.3: calculating the next super-reference combination to the super-reference x 'by adopting a collection function'i+1;
Step 8.4: repeating the step 8.2 and the step 8.3, and iterating for T' times;
step 8.5: the hyper-parametric combination of the optimized objective function f '(x') is output.
The step 10 includes the steps of:
step 10.1: setting N values, namely N items recommended to the users, and defining the number of the users as count;
step 10.2: aiming at each user, a real recommendation list generated on a Test set Test is marked as T (all), grading prediction is carried out on the Test set Test according to the GBDT recommendation model completed by the Bayesian optimization, and an obtained result is defined as a Test evaluation set;
step 10.3: scoring and sequencing the test scoring sets, recommending the first N articles to the users, and recording a Top N recommendation list obtained by each user as T (test);
step 10.4: verifying the accuracy and recall result of the test evaluation set;
step 10.5: calculating the length size of T (test);
step 10.6: and calculating the length size of T (all);
step 10.7: calculating the intersection T (U) of the Top N recommendation list of each user and T (test);
step 10.8: calculating the accuracy:accumulating the accuracy rate generated by each user, and dividing the sum by the count to obtain the average accuracy rate;
step 10.9: calculating the recall ratio:and accumulating the recall rate generated by each user, and dividing the sum of the recall rates by the count to obtain the average recall rate.
Advantages and effects
1. The invention utilizes the related technology in the field of machine learning, aims at the problem of information overload in the current society, solves the problem of difficult extraction of complex features through popular learning, reduces the dimension information of data feature attributes, reduces the model training time, improves the learning capability of the model, and greatly improves the recommendation efficiency.
2. Personalized recommendation is performed through ensemble learning, and a recommendation model is optimized through Bayesian, so that recommendation precision is improved, useful information can be quickly and effectively screened from huge data, and utilization efficiency of the information is improved.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a flow chart of data feature preprocessing;
fig. 3 is a flow chart of personalized recommendation.
Detailed Description
The following description of the embodiments of the present invention is provided in order to better understand the present invention for those skilled in the art with reference to the accompanying drawings.
A personalized recommendation method based on popular learning LPP and integrated learning GBDT can improve the accuracy of personalized recommendation through integrated learning; in addition, the method integrates popular learning to reduce dimensions and realize the integration of data features, thereby solving the problem of difficult extraction of complex features.
FIG. 1 is a general flow chart of the present invention, which includes the following 10 steps, wherein steps 1-6 are the recommended data preprocessing portion of FIG. 1; step 7 is constructing a personalized recommendation model part in the attached figure 1; step 8 and step 9 are the optimization model part in fig. 1; step 10 is the personalized recommendation part in fig. 1.
The data preprocessing module is mainly responsible for reintegrating data features, and solves the problem of difficult extraction of complex features by constructing new features and popularly learning dimension reduction; the model establishing and optimizing module is mainly responsible for establishing a personalized ensemble learning prediction model based on the fused data, and carrying out Bayesian optimization on the basis of the establishment of the prediction model to improve the accuracy of personalized recommendation; and the personalized recommendation module is mainly responsible for acquiring a data result of the prediction model, and acquiring and verifying a personalized recommendation result by a Top N recommendation method.
The detailed steps are as follows:
a recommended data preprocessing part:
FIG. 2 is a flow chart of the characteristic data preprocessing of the present invention, and the specific implementation steps are as follows:
step 1: analyzing the dimension attribute of the personalized recommendation data, and dividing the personalized recommendation data into user-article-score data; and performing data association on the associated user-item-scoring dimension.
Step 2: and after the processing is finished, analyzing the data type of each dimension attribute, namely 'user-item-scoring', and converting the data type into the data type required by ensemble learning.
And step 3: generating characteristic attributes according to the 'grading' attributes in the 'user-item-grading' dimension attributes, wherein the formula is as follows:
b represents the b-th user in a data set A of 'user-item-scoring', wherein the data set A comprises d users in total, and R (b) is the scoring of the user b for each item.
And 4, step 4: all the obtained data were normalized and calculated as follows:
wherein vv represents the original value of the data, v' represents the value after normalization, min represents the minimum value of the column in which vv is located, and max represents the maximum value of the column in which vv is located.
And 5: let the "user-item-score" dataset A in the original space have m sample points x1,x2,...,xmSample point xiIs a vector of dimension l, i is an integer from 1 to m, and the matrix formed by the m samples according to columns is X. Performing dimension reduction processing on the data set A by using the popular learning LPP, wherein the data set B after dimension reduction is a data set formed by sample points y1,y2,...,ymComposition, sample point yiIs an n-dimensional vector, the m samples form a matrix of Y in columns, where l > n. The method comprises the following specific steps:
step 5.1: constructing a graph, and calculating a sample x in a user-article-score data set AiAnd sample xjThe euclidean distances of all samples except for the one shown below are as follows:
wherein epsilon is a manually set threshold value, generally, the average value of samples is taken, m is the total number of samples in the data set, if the distance is less than a certain value epsilon, two samples are considered to be very close, and one side is established between a node i and a node j of the graph.
Step 5.2: determining the weight, if the node i is connected with the node j, the weight of the edge between the node i and the node j is calculated by the following formula of the nuclear thermal function:
ωijrepresenting the weight, x, between i nodes and j nodesiAnd xjFor the samples in the "user-item-score" dataset a, t is an artificially set real number greater than 0.
Step 5.3: and calculating a projection matrix, wherein the formula for calculating the projection matrix is as follows.
XLXTa=λXDXTa
Suppose the solution in the formula is a0,a1,...,al-1And their corresponding eigenvalues λ are ordered from small to large, the projective transformation matrix is C ═ a0,a1,...,al-1) And then the reduced sample point yi=CTxi。
Wherein the adjacency matrix W is determined by the weight ω in step twoijAnd (4) forming. The main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, where the weighted degree of the node i is the sum of the weights of all the edges associated with the node, i.e. the sum of each row element of the adjacency matrix W. The placian matrix L is defined as L ═ D-W.
Step 6: and D, performing dimension reduction on the data set B according to the following steps of 8: 2 into a training set Train and a Test set Test, wherein the data matrix corresponding to the training set Train is Y'.
Constructing a personalized recommendation model part:
and 7: an integrated learning GBDT method is adopted to establish a personalized recommendation model, a process schematic diagram is shown in an attached figure 3, and the method comprises the following specific steps:
step 7.1: the GBDT model is defined by the following formula:
y 'is Y' mentioned in step 6, K is the round of the score prediction learner, and K is the total iteration number of the score prediction learner. f. ofk(Y') score prediction learner for k-th round, hk(Y') represents the kth CART decision regression tree.
Step 7.2: constructing a CART decision regression tree, namely h (Y') in the step 7.1, and specifically comprising the following steps:
step 7.21: partitioning the preprocessed data set B into H1,H2,...HoThe output value of each region is respectively as follows: p is a radical of1,p2,...,po。
Step 7.22: recursively divides each region into two sub-regions and determines an output value on each sub-region. And selecting an optimal segmentation variable q and a segmentation point s according to the following formula.
p1For the region H divided in step 7.211Output of p2For the region H divided in step 7.212Output of uvAnd wvRespectively expressed as the characteristic attribute and the score of the data in the corresponding region, wherein the maximum value of v is the number of samples in the divided region. The variable q is traversed, the fixed segmentation variable q is scanned for segmentation points s, and the pair (q, s) that makes the above formula reach the minimum value is selected. The selected pair (q, s) is used to divide the region and determine the corresponding output value.
Step 7.23: the steps 7.21 and 7.22 are continued to be invoked for both sub-areas until the stop condition is fulfilled.
Step 7.24: repartitioning of input space into o regions H'1,H′2,...H′oGenerating a score prediction CART decision regression tree, wherein the formula is as follows:
h (u) is a predicted CART decision regression tree, H'vFor the divided regions, O is indicated as a divided region index, and O is indicated as the total number of divided regions. p is a radical ofoFor fixed output values of the region partitioned in step 7.21, q 'and s' are the optimal solutions iterated through step 7.21 and step 7.22.
Step 7.3: the scoring prediction learner adopts a forward step-by-step algorithm. The model of the k step is formed by the model of the k-1 step, namely the k step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
fk(Y′)=fk-1(Y′)+βk
fk(Y') prediction learner for k-th round of scoring, fk-1(Y') prediction learner for k-1 st round of scoring, betakRepresenting the residual error produced by the k-th round.
Step 7.4: and continuing iteration until the iteration is completed, and completing model building.
And (3) optimizing a model part:
and 8: a Bayesian method is adopted to optimize GBDT model parameters, and the specific steps are as follows:
step 8.1: initializing dataset D '═ x'1,y′1),...,(x′n,y′n) Wherein, y'i=f′(x′i) (ii) a The objective function f '(x') is the mapping of the dimension attributes in the data to the scores.
Step 8.2: GBDT model uses selected hyper-parameter combinations x'iTraining is performed to calculate f '(x'i);
Step 8.3: calculating the next super-reference combination to the super-reference x 'by adopting a collection function'i+1;
Step 8.4: and repeating the steps of 8.2 and 8.3, and iterating for T' times.
Step 8.5: the hyper-parametric combination of the optimized objective function f '(x') is output.
And step 9: and selecting the optimal hyperparameter combination obtained through Bayesian optimization to retrain the GBDT personalized recommendation model.
The personalized recommendation part:
step 10: and performing Top N recommendation and effect verification according to the finally obtained prediction result of the personalized recommendation model on the Test set Test, and specifically comprising the following steps:
step 10.1: setting the N value, namely N items recommended to the user, and defining the number of the users as count.
Step 10.2: and (3) aiming at each user, recording a real recommendation list generated on the Test set Test as T (all), and performing score prediction on the Test set Test according to the GBDT recommendation model completed by the Bayesian optimization, wherein an obtained result is defined as the Test score set.
Step 10.3: and grading and sequencing the test scoring set, recommending the first N items to the user, and recording a Top N recommendation list obtained by each user as T (test).
Step 10.4: and verifying the accuracy and recall result of the test evaluation set.
Step 10.5: the length size of T (test) is calculated.
Step 10.6: and calculates the T (all) length size.
Step 10.7: and calculating the intersection T (U) of the Top N recommendation list of each user and T (test).
Step 10.8: calculating the accuracy:and accumulating the accuracy rate generated by each user, and dividing the sum by the count to obtain the average accuracy rate.
Step 10.9: calculating the recall ratio:and accumulating the recall rate generated by each user, and dividing the sum of the recall rates by the count to obtain the average recall rate.
The technical characteristics form an embodiment of the invention, which has strong adaptability and implementation effect, and unnecessary technical characteristics can be increased or decreased according to actual needs to meet the requirements of different situations.
Claims (7)
1. A personalized recommendation method based on ensemble learning is characterized by comprising the following steps:
step 1: analyzing the dimension attribute of the personalized recommendation data, and dividing the personalized recommendation data into user-article-score data; performing data association on the associated user-item-scoring dimension;
step 2: after the processing is finished, analyzing the data type of each dimension attribute of the user-article-score, and converting the data type into the data type required by ensemble learning;
and step 3: generating characteristic attributes according to the 'grading' attributes in the 'user-item-grading' dimension attributes;
and 4, step 4: all the obtained data were normalized and calculated as follows:
wherein vv represents the original value of the data, v' represents the value after normalization processing, min represents the minimum value of the column where vv is located, and max represents the maximum value of the column where vv is located;
and 5: let the "user-item-score" dataset A in the original space have m sample points x1,x2,...,xmSample point xiIs a vector of dimension l, i is an integer from 1 to m, and a matrix formed by the m samples according to columns is X; performing dimensionality reduction on the data set A by using a popular learning local preserving projection algorithm, wherein the data set B subjected to dimensionality reduction is a data set formed by sample points y1,y2,...,ymComposition, sample point yiIs an n-dimensional vector, the m samples form a matrix Y according to columns, wherein l is more than n;
step 6: and D, performing dimension reduction on the data set B according to the following steps of 8: 2, dividing the ratio of the training set Train into a training set Train and a Test set Test, wherein a data matrix corresponding to the training set Train is Y';
and 7: establishing an individualized recommendation model by adopting an ensemble learning gradient boosting decision tree method;
and 8: optimizing gradient boosting decision tree model parameters by adopting a Bayes method;
and step 9: selecting the optimal hyper-parameter combination obtained through Bayesian optimization to retrain the gradient boosting decision tree personalized recommendation model;
step 10: and performing Top N recommendation and effect verification according to the finally obtained prediction result of the personalized recommendation model on the test set.
2. The ensemble learning-based personalized recommendation method according to claim 1, wherein: in step 3, the number of times each user scores the item is counted, and the formula is as follows:
b represents the b-th user in a data set A of 'user-item-scoring', wherein d users exist in the data set A, (b) is the score of the user b for each item, and CountRating means 'the sum of times of article review by each user'.
3. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 5 specifically includes the following steps:
step 5.1: constructing a graph, and calculating a sample x in a user-article-score data set AiAnd sample xjThe euclidean distances of all samples except for the one shown below are as follows:
wherein epsilon is a manually set threshold value, the average value of samples is taken, m is the total number of the samples in the data set, if the Euclidean distance is smaller than the value epsilon, the two samples are considered to be very close to each other, and one side is established between a node i and a node j of the graph;
step 5.2: determining the weight, if the node i is connected with the node j, the weight of the edge between the node i and the node j is calculated by the following formula of the nuclear thermal function:
ωijrepresenting the weight, x, between i nodes and j nodesiAnd xjFor samples in the "user-item-score" dataset a, t is a manually set real number greater than 0;
step 5.3: calculating a projection matrix, wherein a formula for calculating the projection matrix is as follows:
XLXTa=λXDXTa (5)
suppose the solution in the formula is a0,a1,...,al-1And their corresponding eigenvalues λ are ordered from small to large, the projective transformation matrix is C ═ a0,a1,...,al-1) And then the reduced sample point yi=CTxi;
Wherein X is the matrix X mentioned in step 5, and the adjacent matrix W is determined by the weight ω in step twoijForming; the main diagonal of the diagonal matrix D is the weighted degree of each vertex of the graph constructed in step one, wherein the weighted degree of the node i is the sum of the weights of all the edges associated with the node, i.e. the sum of each row element of the adjacency matrix W; the placian matrix L is defined as L ═ D-W.
4. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 7 comprises the following steps:
step 7.1: the gradient boosting decision tree model is defined, and the formula is as follows:
y 'is Y' mentioned in step 6, K is the round of the scoring prediction learner, and K is the total round of the scoring prediction learner; f. ofk(Y') score prediction learner for k-th round, hk(Y') representing the kth classification regression decision tree;
step 7.2: constructing a classification regression decision tree, namely h (Y') in the step 7.1;
step 7.3: the scoring prediction learning device adopts a forward step-by-step algorithm; the model of the k step is formed by the model of the k-1 step, namely the k step of the score prediction learner is closely related to the score prediction learner of the previous k-1 step, and the formula is as follows:
fk(Y′)=fk-1(Y′)+βk (7)
fk(Y') prediction learner for k-th round of scoring, fk-1(Y') prediction learner for k-1 st round of scoring, betakRepresenting the residual error generated in the k round;
step 7.4: and continuing iteration until the iteration is completed, and completing model building.
5. The ensemble learning-based personalized recommendation method according to claim 4, wherein: in the step 7.2, the method comprises the following steps:
step 7.21: partitioning the preprocessed data set B into H1,H2,...HoThe output value of each region is respectively as follows: p is a radical of1,p2,...,po;
Step 7.22: recursively dividing each region into two sub-regions and determining an output value on each sub-region; selecting an optimal segmentation variable q and a segmentation point s according to the following formula;
p1for the region H divided in step 7.211Output of p2For the region H divided in step 7.212Output of uvAnd wvRespectively representing the characteristic attribute and the score of the data in the corresponding region, wherein the maximum value of v is the number of divided region samples; traversing the variable q, scanning a segmentation point s for the fixed segmentation variable q, and selecting a pair (q, s) which enables the above formula to reach the minimum value; dividing the region by the selected pair (q, s) and determining a corresponding output value;
step 7.23: continuing to call steps 7.21 and 7.22 for the two sub-regions until a stop condition is met;
step 7.24: repartitioning of input space into o regions H'1,H′2,...H′oGenerating a score prediction classification regression decision tree, wherein the formula is as follows:
h (u) is a score prediction classification regression decision tree, H'vFor the divided areas, O is the subscript of the divided areas, and O is the total number of the divided areas; p is a radical ofoFor fixed output values of the region partitioned in step 7.21, q 'and s' are the optimal solutions iterated through step 7.21 and step 7.22.
6. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 8 comprises the following steps:
step 8.1: initializing dataset D '═ x'1,y′1),...,(x′n,y′n) Wherein, y'i=f′(x′i) (ii) a f '(x') is the mapping relation from the dimension attribute to the score in the data;
step 8.2: gradient lifting decision tree model uses selected super-reference combination x'iTraining is performed to calculate f '(x'i);
And 8, step 8.3: calculating the next super-reference combination to the super-reference x 'by adopting a collection function'i+1;
Step 8.4: repeating the step 8.2 and the step 8.3, and iterating for T' times;
step 8.5: the hyper-parametric combination of the optimized objective function f '(x') is output.
7. The ensemble learning-based personalized recommendation method according to claim 1, wherein: the step 10 includes the steps of:
step 10.1: setting N values, namely N items recommended to the users, and defining the number of the users as count;
step 10.2: aiming at each user, a real recommendation list generated on a Test set Test is marked as T (all), grading prediction is carried out on the Test set Test according to a gradient promotion decision tree recommendation model completed by Bayesian optimization, and an obtained result is defined as a Test grading set;
step 10.3: scoring and sequencing the test scoring sets, recommending the first N articles to the users, and recording a Top N recommendation list obtained by each user as T (test);
step 10.4: verifying the accuracy and recall result of the test evaluation set;
step 10.5: calculating the length size of T (test);
step 10.6: and calculating the length size of T (all);
step 10.7: calculating the intersection T (U) of the Top N recommendation list of each user and T (test);
step 10.8: calculating the accuracy:accumulating the accuracy rate generated by each user, and dividing the sum by the count to obtain the average accuracy rate;
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2021103231807 | 2021-03-26 | ||
CN202110323180 | 2021-03-26 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113326433A true CN113326433A (en) | 2021-08-31 |
CN113326433B CN113326433B (en) | 2023-10-10 |
Family
ID=77419834
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110629501.6A Active CN113326433B (en) | 2021-03-26 | 2021-06-07 | Personalized recommendation method based on ensemble learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113326433B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105843928A (en) * | 2016-03-28 | 2016-08-10 | 西安电子科技大学 | Recommendation method based on double-layer matrix decomposition |
CN108763362A (en) * | 2018-05-17 | 2018-11-06 | 浙江工业大学 | Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point |
CN110109902A (en) * | 2019-03-18 | 2019-08-09 | 广东工业大学 | A kind of electric business platform recommender system based on integrated learning approach |
CN110297978A (en) * | 2019-06-28 | 2019-10-01 | 四川金蜜信息技术有限公司 | Personalized recommendation algorithm based on integrated recurrence |
CN110348580A (en) * | 2019-06-18 | 2019-10-18 | 第四范式(北京)技术有限公司 | Construct the method, apparatus and prediction technique, device of GBDT model |
WO2020233245A1 (en) * | 2019-05-20 | 2020-11-26 | 山东科技大学 | Method for bias tensor factorization with context feature auto-encoding based on regression tree |
CN112183946A (en) * | 2020-09-07 | 2021-01-05 | 腾讯音乐娱乐科技(深圳)有限公司 | Multimedia content evaluation method, device and training method thereof |
-
2021
- 2021-06-07 CN CN202110629501.6A patent/CN113326433B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105843928A (en) * | 2016-03-28 | 2016-08-10 | 西安电子科技大学 | Recommendation method based on double-layer matrix decomposition |
CN108763362A (en) * | 2018-05-17 | 2018-11-06 | 浙江工业大学 | Method is recommended to the partial model Weighted Fusion Top-N films of selection based on random anchor point |
CN110109902A (en) * | 2019-03-18 | 2019-08-09 | 广东工业大学 | A kind of electric business platform recommender system based on integrated learning approach |
WO2020233245A1 (en) * | 2019-05-20 | 2020-11-26 | 山东科技大学 | Method for bias tensor factorization with context feature auto-encoding based on regression tree |
CN110348580A (en) * | 2019-06-18 | 2019-10-18 | 第四范式(北京)技术有限公司 | Construct the method, apparatus and prediction technique, device of GBDT model |
CN110297978A (en) * | 2019-06-28 | 2019-10-01 | 四川金蜜信息技术有限公司 | Personalized recommendation algorithm based on integrated recurrence |
CN112183946A (en) * | 2020-09-07 | 2021-01-05 | 腾讯音乐娱乐科技(深圳)有限公司 | Multimedia content evaluation method, device and training method thereof |
Non-Patent Citations (1)
Title |
---|
聂黎生;: "基于行为分析的学习资源个性化推荐", 计算机技术与发展, no. 07 * |
Also Published As
Publication number | Publication date |
---|---|
CN113326433B (en) | 2023-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112733749B (en) | Real-time pedestrian detection method integrating attention mechanism | |
Chen et al. | A matting method based on full feature coverage | |
CN112101430B (en) | Anchor frame generation method for image target detection processing and lightweight target detection method | |
CN109948149B (en) | Text classification method and device | |
CN107683469A (en) | A kind of product classification method and device based on deep learning | |
US9323886B2 (en) | Performance predicting apparatus, performance predicting method, and program | |
CN110175628A (en) | A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation | |
CN108897791B (en) | Image retrieval method based on depth convolution characteristics and semantic similarity measurement | |
CN114841257B (en) | Small sample target detection method based on self-supervision comparison constraint | |
CN113674334B (en) | Texture recognition method based on depth self-attention network and local feature coding | |
CN105184260A (en) | Image characteristic extraction method, pedestrian detection method and device | |
CN107918772A (en) | Method for tracking target based on compressive sensing theory and gcForest | |
CN107291825A (en) | With the search method and system of money commodity in a kind of video | |
CN111738055A (en) | Multi-class text detection system and bill form detection method based on same | |
CN111833322B (en) | Garbage multi-target detection method based on improved YOLOv3 | |
CN110929848A (en) | Training and tracking method based on multi-challenge perception learning model | |
CN112115291B (en) | Three-dimensional indoor model retrieval method based on deep learning | |
CN110020435B (en) | Method for optimizing text feature selection by adopting parallel binary bat algorithm | |
US20220245510A1 (en) | Multi-dimensional model shape transfer | |
CN107491782A (en) | Utilize the image classification method for a small amount of training data of semantic space information | |
Castellano et al. | Deep convolutional embedding for digitized painting clustering | |
CN115641177A (en) | Prevent second and kill prejudgement system based on machine learning | |
CN116610778A (en) | Bidirectional image-text matching method based on cross-modal global and local attention mechanism | |
Saqib et al. | Intelligent dynamic gesture recognition using CNN empowered by edit distance | |
CN108257148B (en) | Target suggestion window generation method of specific object and application of target suggestion window generation method in target tracking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |