CN106202151A

CN106202151A - One is used for improving the multifarious method of personalized recommendation system

Info

Publication number: CN106202151A
Application number: CN201610463223.0A
Authority: CN
Inventors: 李方敏; 栾悉道; 龙妍
Original assignee: Changsha University
Current assignee: Changsha University
Priority date: 2016-06-23
Filing date: 2016-06-23
Publication date: 2016-12-07

Abstract

The invention discloses a kind of raising multifarious method of personalized recommendation system, first it particularly as follows: obtain user's score data collection, then above-mentioned three kinds of conventional proposed algorithms are used to recommend on score data collection, next recommendation results is carried out threshold value control and sequence, then final result is presented to user.By recommendation results is resequenced, the list content of consequently recommended result can be changed such that it is able to improve system Biodiversity, the method simultaneously using threshold value control when sequence, make to recommend article to be liked article by user as far as possible, it is ensured that the prediction accuracy of commending system.Test result indicate that, relative to existing personalized recommendation system method, method proposed by the invention can increase substantially system Biodiversity in the case of less reduction prediction accuracy, reaches control system accuracy and the purpose of multiformity balance.

Description

One is used for improving the multifarious method of personalized recommendation system

Technical field

The invention belongs to the Internet, mobile Internet and computer network field, be used for improving more particularly, to one The property multifarious method of commending system.

Background technology

Owing to internet retailer field also exists long tail effect, i.e. 80 the percent of Merchant sales volume comes from its percentage 20 commodity, and if able to propose the sale of retailer's long-tail article, then one-tenth is doubled by the turnover of internet retailer Long, and user's uniqueness preference can be met.The most in recent years, in conjunction with computer networking technology and the personalization of big data processing technique Commending system progressively causes the great attention of people in e-commerce field, and obtains a wide range of applications.Personalization pushes away The system of recommending is the information filtering in order to realize personal interest based on user, and personalized recommendation system is an information filtering system System, relevant information and data to user are analyzed and excavate, thus finding user interest place, finding the need that user is implied Ask, then recommend for it.By The long tail, if the service that provided of businessman or commodity perfect can meet user's Individual demand, user is the most high to businessman's satisfaction and degree of belief, then will necessarily bring huge profit to businessman, and individual Property commending system be businessman for meeting the important means of the individual demand of user, simultaneously personalized recommendation system also be solve The certainly important means of the long-tail phenomenon of internet retailer.

The at present the most widely used method for personalized recommendation system have collaborative filtering method based on user and Collaborative filtering methods based on article.

For collaborative filtering method based on user, those have mutual user to be considered place with some identical items In same neighborhood.According to some statistical datas, if there is similar preference past user, then in future, they will Continue to have similar preference.If having a user buy or have rated a new article, then these article will be pushed away Recommending to the neighbor user of this user, the method is mainly for how to carry out recommending and calculating between user for large-scale user Similarity.But, the defect of the method is, the method does not consider the diverse problems of commending system, thus causes interconnection The long-tail article of net retailer are unsalable.

Collaborative filtering methods based on article and above-mentioned collaborative filtering method based on user are essentially identical, and its difference exists In, its similarity needing to calculate article, and the similarity of non-computational user.But this algorithm also fails to consider commending system Diverse problems, the long-tail article also resulting in internet retailer are unsalable.

Summary of the invention

For disadvantages described above or the Improvement requirement of prior art, the invention provides a kind of for improving personalized recommendation system Unite multifarious method, it is intended that solve the multiformity not considering in personalized recommendation system present in existing method, Thus cause the technical problem that the long-tail article of internet retailer are unsalable.

For achieving the above object, according to one aspect of the present invention, it is provided that a kind of raising personalized recommendation system is various The method of property, comprises the following steps:

(1) obtain user's score data collection from website, and this user's score data collection is deposited in the way of text Storage, this is concentrated by score data and includes article ID corresponding to ID, this ID and this user scoring to these article Value；

(2) the user's score data collection using the proposed algorithm being used for personalized recommendation system to obtain step (1) is carried out Prediction and recommendation process, thus the multiple users concentrated for user's score data generate the recommendation list of correspondence, this recommendation respectively List includes article ID corresponding to ID, this ID and this user prediction score value to these article；

(3) to user's score data collection, asking for the popularity degree of its article, this popularity degree is to be commented by these article The score value of these article is determined by the number of the user of valency, user, user is entered the prediction score value of article with controlling threshold value Row compares, and the popularity degree above or equal to prediction article corresponding to score value controlling threshold value is ranked up, to obtain Whole ranking results；

(4) take multiple results of front end in ranking results and feed back to user as recommendation list.

Preferably, step (2) specifically includes following sub-step:

(2-1) according to the user's score data collection obtained, the similarity between all users is calculated:

s i m (a, b) = \frac{Σ_{p &Element; P} R (a, p) R (b, p)}{\sqrt{Σ_{p &Element; P} R {(a, p)}^{2}} \sqrt{Σ_{p &Element; P} R {(b, p)}^{2}}}

Wherein (a, b) represents the similarity between user a and b to sim, and P represents the set of all items, and p represents in set P Article, R (a, p) and R (b p) represents that user a and user b is for the score value of article p respectively；

(2-2) for each user, front K the user the highest with this user's similarity neighbour as this user is chosen Occupying user, wherein K is the integer between 50 to 300；

(2-3) for each user, the score value of the article that its K neighbor user was marked is analyzed, with Dope this user and most possibly beat multiple article of high score, and these article that this user may beat high score recommend use Family.

Preferably, step (2-3) specifically used below equation:

R^{*} (u, i) = \overset{&OverBar;}{R (u)} + {kΣ}_{v &Element; N (u)} (R (v, i) - \overset{&OverBar;}{R (v)}) \times s i m (u, v)

Wherein

k = \frac{1}{Σ_{v &Element; N (u)} s i m (u, v)};

Obtain after being substituted into above-mentioned formula:

R^{*} (u, i) = \overset{&OverBar;}{R (u)} + \frac{Σ_{v &Element; N (u)} (R (v, i) - \overset{&OverBar;}{R (v)}) \times s i m (u, v)}{Σ_{v &Element; N (u)} | s i m (u, v) |}

Wherein R^*(u, i) represents the user u prediction score value for article i,It is that user u is for its all items Average score value, k is normalization factor, and N (u) represents the set of all neighbor users of user u.

Preferably, step (3) specifically includes following sub-step:

(3-1) to user's score data collection, according to the people of the user that this user's score data is concentrated article be evaluated Popularity degree several, that the score value of these article is asked for its article by user；

(3-2) the prediction score value of article is compared by user with controlling threshold value, above or equal to controlling threshold value The popularity degree of prediction article corresponding to score value be ranked up, to obtain final ranking results.

Preferably, the process of the popularity degree obtaining article in step (3-1) is represented by

Wherein rank_PopularityI () represents the method using the sequence of article popularity degree, Represent for all users gathers each user u in U, there is user u to the score value R of article i (u, i) individual Number.

rank_{ReversePrediction}(i)=R^*(u,i)

Wherein rank_{ReversePrediction}I () represents the method using prediction score value inverted order, this prediction score value can represent The popularity degree of article.

{rank}_{A v e r a g e R a t i n g} (i) = \overset{&OverBar;}{R (i)}

Wherein have

\overset{&OverBar;}{R (i)} = \frac{1}{| U (i) |} Σ_{u &Element; U (i)} R (u, i)

rank_{AverageRating}I () represents the method using the sequence of article average score value；

rank_{AbsoluteLikeability}(i)=| U_H(i)|

Wherein U_H(i)=and u ∈ U (i) | R (u, i) >=T_H}

rank_{AbsoluteLikeability}I () represents the absolute pouplarity using article, and u ∈ U (i) | R (u, i) >=T_H} Represent that the score value that article i is beaten by user u is more than threshold value T_HQuantity.

rank_{RelativeLikeability}(i)=| U_H(i)/U(i)|

Wherein rank_{RelativeLikeability}I () represents that article are relative to pouplarity.

{rank}_{R a t i n g V a r i a n c e} (i) = \frac{1}{| U (i) |} Σ_{u &Element; U (i)} {(R (u, i) - \overset{&OverBar;}{R (i)})}^{2}

Wherein rank_{RatingVariance}I () represents that the score value of article is deviateed the degree of this article average score value by user.

Preferably, step (3-2) specifically uses below equation:

{rank}_{x} (i, T_{R}) = \{\begin{matrix} {rank}_{x} (i), R^{*} (u, i) &Element; [T_{R}, T_{\max}] \\ {rank}_{S \tan d a r d} (i), R^{*} (u, i) &Element; [T_{H}, T_{R}) \end{matrix}

Wherein, rank_x(i,T_R) represent that use controls threshold value T_RThe function that article i is ranked up, rank_xI () represents on The popularity degree of article, T_maxRepresent the upper limit (such as in the scoring system of 5 points of systems, this value is equal to 5) of score value, control Threshold value T_R∈[T_H,T_max], rank_StandardI () is the sort method of existing standard, and have:

rank_Standard(i)=R^*(u,i)^-1。

It is another aspect of this invention to provide that provide a kind of raising multifarious system of personalized recommendation system, including:

First module, for obtaining user's score data collection, and by this user's score data collection with text from website Mode stores, and this is concentrated by score data and includes article ID corresponding to ID, this ID and this user to this The score value of article；

Second module, for the user's scoring using the proposed algorithm being used for personalized recommendation system to obtain the first module Data set is predicted and recommendation process, thus the multiple users concentrated for user's score data generate the recommendation row of correspondence respectively Table, includes article ID corresponding to ID, this ID and this user and marks the prediction of these article in this recommendation list Value；

Three module, for user's score data collection, asks for the popularity degree of its article, and this popularity degree is by this The score value of these article is determined by the number of the user that article are evaluated, user, by user to the prediction score value of article with Controlling threshold value to compare, the popularity degree above or equal to prediction article corresponding to score value controlling threshold value is arranged Sequence, to obtain final ranking results；

4th module, feeds back to user for taking multiple results of front end in ranking results as recommendation list.

In general, by the contemplated above technical scheme of the present invention compared with prior art, it is possible to show under acquirement Benefit effect:

1 compares with the method in conventional personalized recommendation system, the present invention be directed to improve in personalized recommendation system Multiformity and design, be significantly increased improving on diversity index, thus reduce the long tail effect of internet retailer The unsalable problem of article brought.

2, the present invention uses flexibly, can conveniently can be added in existing personalized recommendation system, without changing The core architecture of this personalized recommendation system, brings huge change and burden will not to original system.

3, by using the step (3) in the inventive method, it is ensured that the accuracy of personalized recommendation system and multiformity.

4, by the step (3) of the present invention, recommendation results can be increased substantially while accuracy recommended by less sacrifice Overall multiformity.

Accompanying drawing explanation

Fig. 1 is the process structure chart of the personalized recommendation system that the present invention is suitable for.

Fig. 2 is the recommended flowsheet schematic diagram of present invention commending system based on user.

Fig. 3 is the schematic diagram that the present invention selects neighbor user.

Fig. 4 is the general thoughts schematic diagram of present invention sort algorithm based on threshold value.

Fig. 5 is the design of experiment of the present invention.

Fig. 6 is the sort algorithm result figure on accuracy and multifarious impact, wherein: (a) is to use based on popularity Method, (b) be use average score method, (c) be use absolute pouplarity method, (d) be use the most welcome Degree method, (e) is to use scoring variance method, and (f) is to use scoring inverted order method.

Fig. 7 is relational result figure between loss of accuracy and multiformity.

Fig. 8 is to recommend long-tail article proportion figure in article.

Fig. 9 is that the present invention is for improving the flow chart of the multifarious method of personalized recommendation system.

Detailed description of the invention

In order to make the purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, right The present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, and It is not used in the restriction present invention.If additionally, technical characteristic involved in each embodiment of invention described below The conflict of not constituting each other just can be mutually combined.

The Integral Thought of the present invention is, first obtains user's score data collection, then often uses on score data collection Recommend by proposed algorithm, next recommendation results is carried out threshold value control and sequence, then final result is presented to user. By recommendation results is resequenced, it is possible to change the list content of consequently recommended result such that it is able to improve system many Sample, the method simultaneously using threshold value control when sequence so that recommend article to be liked article by user as far as possible, it is ensured that to push away Recommend the prediction accuracy of system.

Typical personalized recommendation system processes structure as shown in Figure 1.In the entire system, it is assumed that U (user, i.e. user) For participating in the set of all users in personalized recommendation system, I (article, i.e. item) be in system all items (such as book Nationality, film, music etc.) set, the relation of the most each user u ∈ U and article i ∈ I can regard as R (u, i), R (u, i) With the relation between U, I is shown below:

R:U×I→Rating

R is a scoring from practical significance, indicates the user u preference degree to article i.Commending system Comprise two steps: predict and recommend.The first step at commending system i.e. predicts that task is to use known score data to come in advance Survey the user's possible mark to article of not marking.

Comprise the following steps as it is shown in figure 9, the present invention improves the multifarious method of personalized recommendation system:

(1) obtain user's score data collection from website, and this user's score data collection is deposited in the way of text Storage, this is concentrated by score data and includes article ID corresponding to ID, this ID and this user scoring to these article Value；Specifically, it is that the method using web crawlers obtains user's score data collection in the present invention；

(4) take multiple results of front end in ranking results and feed back to user as recommendation list；In the present invention, fetch bit in 5 results of ranking results front end feed back to user as recommendation list.

As in figure 2 it is shown, the step of the inventive method (2) specifically includes following sub-step:

(2-1) according to the user's score data collection obtained, the similarity between all users is calculated；

Specifically, this step is to use below equation:

s i m (a, b) = \frac{Σ_{p &Element; P} R (a, p) R (b, p)}{\sqrt{Σ_{p &Element; P} R {(a, p)}^{2}} \sqrt{Σ_{p &Element; P} R {(b, p)}^{2}}}

(2-2) for each user, front K the user the highest with this user's similarity neighbour as this user is chosen Occupy user；In the present embodiment, the span of K is the integer between 50 to 300；As shown in Figure 3.

(2-3) for each user, the score value of the article that its K neighbor user was marked is analyzed, with Dope this user and most possibly beat multiple article of high score, and these article that this user may beat high score recommend use Family；

Specifically, this step is specifically used below equation:

R^{*} (u, i) = \overset{&OverBar;}{R (u)} + {kΣ}_{v &Element; N (u)} (R (v, i) - \overset{&OverBar;}{R (v)}) \times s i m (u, v)

Wherein,Obtain after being substituted into above-mentioned formula:

R^{*} (u, i) = \overset{&OverBar;}{R (u)} + \frac{Σ_{v &Element; N (u)} (R (v, i) - \overset{&OverBar;}{R (v)}) \times s i m (u, v)}{Σ_{v &Element; N (u)} | s i m (u, v) |}

Wherein R^*(u, i) represents the user u prediction score value for article i,It is that user u is for its all items Average score value, k is normalization factor, and N (u) represents the set of all neighbor users of user u；

The step (3) of the inventive method specifically includes following sub-step:

Specifically, the process of the popularity degree that the present invention obtains article can use following six kinds of methods to calculate:

The first,

The second, rank_{ReversePredicti}o_n(i)=R^*(u,i)

The third,

Wherein have

\overset{&OverBar;}{R (i)} = \frac{1}{| U (i) |} Σ_{u &Element; U (i)} R (u, i)

rank_{AverageRating}I () represents the method using the sequence of article average score value,

4th kind, rank_{AbsoluteLikeability}(i)=| U_H(i)|

Wherein U_H(i)=and u ∈ U (i) | R (u, i) >=T_H}

rank_{AbsoluteLikeability}I () represents the absolute pouplarity using article, and u ∈ U (i) | R (u, i) >=T_H} Represent that the score value that article i is beaten by user u is more than threshold value T_HQuantity, in the present embodiment, threshold value T_HValue can be free Set, this value is the least, represent user the strictest for the standards of grading of article, otherwise then represent the loosest, in the present invention its Value is 3.5 (on the premise of 5 points of full marks processed of standard)；

5th kind, rank_{RelativeLikeability}(i)=| UH (i)/U (i) |

6th kind,

This step specifically uses below equation:

{rank}_{x} (i, T_{R}) = \{\begin{matrix} {rank}_{x} (i), R^{*} (u, i) &Element; [T_{R}, T_{\max}] \\ {rank}_{S \tan d a r d} (i), R^{*} (u, i) &Element; [T_{H}, T_{R}) \end{matrix}

Wherein, rank_x(i,T_R) represent that use controls threshold value T_RThe function that article i is ranked up, rank_xI () represents on State the popularity degree of article acquired in step (3-1), T_maxRepresent that the upper limit of score value is (such as at the scoring system of 5 points of systems In, this value is equal to 5), control threshold value T_R∈[T_H,T_max], rank_StandardI () is the sort method of existing standard, and have:

rank_Standard(i)=R^*(u,i)^-1

Here, when introducing control threshold value T_RAfter so that those prediction scorings are higher than T_RAccording to original rank_x(i) Result be ranked up, and those prediction scorings are less than T_R, then use rank_StandardI () is ranked up.Meanwhile, also can Control all prediction scorings higher than T_RArticle all come all prediction scorings less than T_RArticle before.So, when increasing T_R Value time, more high accuracy and the lowest multifarious article will be filtered out (because standard sorted so can be become closer to；If subtracting Little T_RValue, then function rank can be made_X(i,T_R) closer to rank_xI (), can improve the accurate of system in this case Spend and reduce its multiformity.Therefore, it can by selecting different control threshold values T_R, realize between accuracy and multiformity Relation balance.The general thoughts of sort algorithm based on threshold value is as shown in Figure 4.

As shown in Fig. 4 (a), use the sort method of standard, directly candidate item is ranked up, in advance according to prediction scoring Test and appraisal score value is the highest, and article sequence is the most forward.Then select prediction the highest front 5 article of scoring to recommend user, and be Ensure that recommended article all meet user preference, select the scoring of article all at T_HOn, it is recommended that the recommendation quality that system is overall As shown in the rectangular histogram of side.As shown in Fig. 4 (b), employ ranking functions rank_x(i), used herein based on popularity Sort method, has then obtained one group of new recommendation list, and popularity is relatively low, but prediction scoring is at T_HOn article recommend To user.In this Groups List, user can be appreciated that some minority's article, and the article of this part are in length in whole commending system Portion, although their popularity is the highest, but the evaluation of these article is probably and likes (predicting that score value is higher than by user T_H), the degree of correlation of these article and user is described well, after employing the method for this sequence, it is possible to increase system Multiformity, also reduce the accuracy of system simultaneously, the overall of system recommends quality still as shown in the rectangular histogram of side.Such as Fig. 4 Shown in (c), control threshold value T by adjusting_R, can select different article are recommended user, reach to reduce reduction standard as far as possible While exactness, improve the multifarious purpose of system.

In order to carry out fair and reasonable Performance Evaluation, the present invention is given in personalized recommendation system evaluation procedure several The definition of quantitative assessing index.

(1) accuracy

Score data in, scoring interval value be 1～5, higher numerical value represents user and more has a preference for these article.Root According to the definition of general commending system, by scoring, higher than 3.5, (threshold value of high scoring article, is designated as T_H) conduct " high ranking " thing Product, by the scoring article being designated as " not high ranking " less than 3.5.Additionally, in actual commending system, because user is the most only Paying close attention to several maximally related recommendation article, therefore commending system would generally provide N number of article of top ranked, will recommend use N number of article of family u are designated as:

L_N(u)={ i₁,...,i_N}

Wherein,

R^*(u,i_k)≥T_H,k∈{1,2,...,N}

Therefore article in, the accuracy of assessment commending system based on real high ranking article proportion, high row Name article proportion is designated as correct (L_N(u)), in the middle of this, find out N number of maximally related " high ranking " article recommend use Family, is precision-in-top-N (i.e. top N recommends the accuracy of article), and formula is as follows:

p r e c i s i o n - i n - t o p - N = \frac{Σ_{u &Element; U} | c o r r e c t (L_{N} (u)) |}{Σ_{u &Element; U} | L_{N} (u) |}

Wherein,

correct(L_N(u))={ i ∈ L_N(u)|R(u,ⁱ)≥T_H}

But it is dependent on accuracy and can not well find the recommendation article needed for user.Commending system is necessary not only for standard Really, in addition it is also necessary to practical value.It follows that introduce another evaluation index of personalized recommendation system, i.e. commending system is many Sample.

(2) multiformity

Multiformity is for assessing the commending system excavation ability to long-tail article.Multiformity can be entered by different methods Row assessment, in this article, the quantity of all different article that use commending system can be recommended is estimated, and formula is as follows:

d i v e r s i t y - i n - t o p - N = | \underset{u &Element; U}{\cup} L_{N} (u) |

Wherein diversity-in-top-N represents that top N recommends the multiformity of article.

Can ensure that more article display when the multiformity of system is higher to user, for RECOMENDATION, Ratio shared by popular article is very big, so causing multiformity the lowest.For a good personalized recommendation system, it should Need the much higher sample of comparison, more article so could be allowed to obtain recommended chance.Multiformity is also that product carries simultaneously The index being concerned about very much for business, each article can be recommended at least one user by the system that multiformity is the highest.

Experiment embodiment

In order to verify that being based on threshold value control personalized recommendation system sort method can actually lose minimum accuracy In the case of promote the multiformity of commending system, carried out the feasibility of proof scheme by following experimental procedure:

(1) MovieLens and Netflix raw data set is obtained.

(2) concentrate because of initial data and comprise the part information data the most unrelated with this algorithm, and userId and movieId It is worth excessive, is unfavorable for that computer carries out computing, it is therefore desirable to raw data set is carried out decentration process, i.e. removes data set In type, the redundant information such as timestamp, and score data is re-started mapping so that the value of userId and movieId is from 1 Start counting up.

(3) set of source data is divided into training set (comprising 60% source data) and test set (comprising 40% source data), and protects Including at least 5 film score data of a user in card test set.

(4) in training set, proposed algorithm based on user is used respectively, based on article proposed algorithm with based on singular value Decompose proposed algorithm, obtain six groups of score in predicting data lists of total of MovieLens and Netflix.

(5) in above recommendation list, use method based on sequence, obtain the rearrangement list of recommendation list, i.e. present to The recommendation list of user.

(6) assessment proposes the performance of solution, precision-in-top-N and diversity-in-top-N.

Experimental design is as shown in Figure 5.

Experiment contrasts on MovieLens data set with Netflix data set respectively and uses 6 kinds of different sort methods to pushing away Recommend the impact of result and the prediction accuracy of commending system and multifarious change.In experiment, the evaluation metrics of commending system is Precision-in-top-N and diversity-in-top-N.In experimentation, the method for off-line verification is used to enter Row experiment, uses the method having arrived cross validation to test simultaneously.In the general experimental technique of off-line verification, first have to by Data set is divided into training set (Training Set) and two parts of test set (Test Set), comprises 60% in training set Original score data, comprises the original score data of 40% in test set.The score data being then based in training set, use pushes away Recommend algorithm recommendation results is predicted, and the actual result with test set that predicts the outcome is compared, calculate prediction The evaluation metrics such as accuracy and multiformity.In experimentation, 5 groups of training sets of stochastic generation and 5 groups of test sets, finally by 5 times The meansigma methods of experimental result, as the whole result of experiment, covers every score data, it is ensured that experimental data to the full extent Accuracy.

Prediction accuracy that Fig. 6 is obtained by six kinds of methods in step (3-1) and multifarious Performance comparision, demonstrate this The design of multiformity solution and personalized recommendation system row based on control threshold value in the personalized recommendation system that invention proposes The feasibility of sequence method.In figure, data are and use different proposed algorithms to be controlled threshold value again on Netflix data set Relation between the multiformity and the accuracy that obtain after sequence.The control threshold value sort method of the present invention can be to varying degrees Improving multiformity, the optimum simultaneously predicted depends on selected data set and selected recommendation method.Meanwhile, individual character The designer changing commending system can select various sequence neatly according to different application scenarios and the data collected Mode thus reach optimal recommendation effect.

Fig. 7 provides in six kinds of methods employed in step of the present invention (3-1) and loses between accuracy and multiformity Relation.Figure compares between loss of accuracy and the multiformity gain between all of sort algorithm based on control threshold value Relation.It can be seen that the algorithm proposed can improve multiformity by the prediction accuracy of sacrificial system, use Different proposed algorithms and sort method show situation on different data sets and differ.In this is tested, select respectively Select sort method based on popularity and preferable multiformity can be obtained relative to pouplarity sort method based on article and increase Benefit.

Fig. 8 is to use after the inventive method the Performance Evaluation for personalized recommendation system long tail effect.Figure calculates Long-tail article recommend the percentage ratio in article all users.Because the multiformity of assessment personalized recommendation system is by calculating All different numbers recommending article, therefore can improve multiformity by recommending some new articles to small part user, Thus cannot determine that process proposed herein the most really can change the long-tail distribution of article.Here, have evaluated sequence The impact that long-tail is distributed by algorithm.According to " sixteen rules " of long-tail distribution, the article of definition 20% are bestseller items, remaining 80% is long-tail article.It can be seen that the rearrangement algorithm proposed can significantly improve the recommendation hundred of long-tail article Proportion by subtraction.Therefore, here turned out that the sort algorithm proposed not only improves is multifarious index, and can have Effect improves the accounting of long-tail article.The most just the distribution of long-tail article can be improved on the whole.

As it will be easily appreciated by one skilled in the art that and the foregoing is only presently preferred embodiments of the present invention, not in order to Limit the present invention, all any amendment, equivalent and improvement etc. made within the spirit and principles in the present invention, all should comprise Within protection scope of the present invention.

Claims

1. one kind is improved the multifarious method of personalized recommendation system, it is characterised in that comprise the following steps:

(1) obtain user's score data collection from website, and this user's score data collection is stored in the way of text, should Concentrate by score data and include article ID corresponding to ID, this ID and this user score value to these article；

(2) the user's score data collection using the proposed algorithm being used for personalized recommendation system to obtain step (1) is predicted And recommendation process, thus the multiple users concentrated for user's score data generate the recommendation list of correspondence, this recommendation list respectively In include article ID corresponding to ID, this ID and this user prediction score value to these article；

(3) to user's score data collection, asking for the popularity degree of its article, this popularity degree is evaluated by these article The score value of these article is determined by the number of user, user, user is compared the prediction score value of article with controlling threshold value Relatively, the popularity degree above or equal to prediction article corresponding to score value controlling threshold value is ranked up, final to obtain Ranking results；

Method the most according to claim 1, it is characterised in that step (2) specifically includes following sub-step:

Wherein (a, b) represents the similarity between user a and b to sim, and P represents the set of all items, and p represents the thing in set P Product, (a, p) (b p) represents the user a and the user b score value for article p to R respectively with R；

(2-2) for each user, front K the user the highest with this user's similarity neighbours as this user are chosen User, wherein K is the integer between 50 to 300；

(2-3) for each user, the score value of the article that its K neighbor user was marked is analyzed, with prediction Go out this user and most possibly beat multiple article of high score, and these article that this user may beat high score recommend user.

Method the most according to claim 2, it is characterised in that step (2-3) specifically used below equation:

Wherein

Obtain after being substituted into above-mentioned formula:

Wherein R^*(u, i) represents the user u prediction score value for article i,It is average for its all items of user u Score value, k is normalization factor, and N (u) represents the set of all neighbor users of user u.

Method the most according to claim 3, it is characterised in that step (3) specifically includes following sub-step:

(3-1) to user's score data collection, according to the number of user that this user's score data is concentrated article be evaluated, use The score value of these article is asked for the popularity degree of its article by family；

(3-2) the prediction score value of article is compared by user with controlling threshold value, above or equal to controlling the pre-of threshold value The popularity degree of the article that test and appraisal score value is corresponding is ranked up, to obtain final ranking results.

Method the most according to claim 4, it is characterised in that obtain the process of the popularity degree of article in step (3-1) It is represented by

Wherein rank_PopularityI () represents the method using the sequence of article popularity degree,Represent For all users gather each user u in U, there is user u score value R (u, number i) to article i.

rank_{ReversePrediction}(i)=R^*(u,i)

Wherein rank_{ReversePrediction}I () represents the method using prediction score value inverted order, this prediction score value can represent article Popularity degree.

Wherein have

rank_{AverageRating}(i): represent the method using the sequence of article average score value.

rank_{AbsoluteLikeability}(i)=| U_H(i)|

Wherein U_H(i)=and u ∈ U (i) | R (u, i) >=T_H}

rank_{AbsoluteLikeability}I (): table UH (showing i) uses the absolute pouplarity of article, u ∈ U (i) | R (u, i) >=T_H} Represent that the score value that article i is beaten by user u is more than threshold value T_HQuantity.

rank_{RelativeLikeability}(i)=| U_H(i)/U(i)|

Wherein rank_{RelativeLikeability}I () table U shows H thing (i) product/U phase (i to) pouplarity,

Or be expressed as

10. according to the method described in any one in claim 5 to 9, it is characterised in that step (3-2) specifically use with Lower formula:

Wherein, rank_x(i,T_R) represent the function using control threshold value TR that article i is ranked up, rank_xI () represents upper article Popularity degree, T_maxRepresent the upper limit (such as in the scoring system of 5 points of systems, this value is equal to 5) of score value, control threshold value T_R∈[T_H,T_max], rank_StandardI () is the sort method of existing standard, and have:

rank_Standard(I)=R^*(u,i)^-1。