CN105095477A

CN105095477A - Recommendation algorithm based on multi-index grading

Info

Publication number: CN105095477A
Application number: CN201510493550.6A
Authority: CN
Inventors: 陈健; 林世杭
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2015-08-12
Filing date: 2015-08-12
Publication date: 2015-11-25

Abstract

The invention discloses a recommendation algorithm based on multi-index grading. The recommendation algorithm comprises the following steps of firstly, recognizing index keywords, secondly, extracting suggestion grading, thirdly, constructing a user and commodity similarity matrix, fourthly, using a two-way clustering algorithm for obtaining a clustering matrix, fifthly, conducting single in-cluster recommendation and sixthly using a comprehensive function algorithm for obtaining a final recommendation result. According to the recommendation algorithm, the problem that a user may need individual recommendations for different index preferences for different commodities can be solved, the high accuracy is achieved, and the recommendation result with the higher quality can be obtained.

Description

A kind of proposed algorithm based on multi objective scoring

Technical field

The present invention relates to the technical field of Technologies of Recommendation System in E-Commerce, refer in particular to a kind of proposed algorithm based on multi objective scoring.

Background technology

By being initiatively that user pushes its interested information of possibility or service, commending system helps user obtain more useful informations and save retrieval time.The realization of conventional recommendation systems depends on collaborative filtering, although achieve successfully within the specific limits, but collaborative filtering often only utilizes the hobby of single comprehensive grading to user to portray, comprehensive grading can only portray the degree that user likes commodity, likes the reason of these commodity but to know nothing to user.In order to carry out more careful portray and improve the accuracy of recommendation results to the preference information of user, emerging commending system should be devoted to obtain user to the score information of the different index of commodity and be used.Here, the attribute that index expression commodity are total, such as hotel, its geographic position, room, service etc. can the indexs of this hotel's quality of user profile.

Along with appearance and the development of Web2.0 technology, increasing large-scale website encourages user to carry out interaction with website in many ways, and this makes commending system obtain and utilizes multi objective score information to become possibility.In recent years, many scholars are while emphasizing multi objective scoring importance, also point out that user is that the comment that commodity are write is significant, this kind of comment on commodity often contains the evaluation information of a large number of users to commodity, in other words, except directly being provided by user, multi objective scoring also can rely on certain comment digging technology and obtain by comment on commodity.

At present, the commending system based on multi objective scoring all achieves certain achievement in research with the commending system excavated based on comment on commodity.But these achievements in research are to a great extent based on a set hypothesis: user all takes identical index preference to all commodity, in fact, such hypothesis and our daily cognitive presence error.

Summary of the invention

The object of the invention is to overcome the deficiencies in the prior art, a kind of proposed algorithm based on multi objective scoring is provided, to solve the different index preferences of personalized recommendation problem user may to have to(for) different commodity.

For achieving the above object, technical scheme provided by the present invention is: a kind of proposed algorithm based on multi objective scoring, comprises the following steps:

1) identification of index keyword

1.1) each comment of data centralization is divided into sentence { x ₁, x ₂..., and construct one by the list there is characteristics key word forming;

1.2) according to original Keyword List, by each sentence mark in comment corpus to the index with it with maximum word frequency registration;

1.3) χ is used ²statistical indicator weighs the word frequency relation between each index and key word, and adds in the Keyword List of this index by the key word that front t has the highest word frequency dependence;

1.4) repeat said process until algorithm meets end condition, namely the Keyword List of index remain unchanged or algorithm cycle index arrive threshold value;

2) opinion score extracts

After recognition value index and relevant feature critical word thereof, grammatical analysis is carried out to the statement in comment and extracts the suggestion of user to index or feature critical word;

For each comment, it is calculated as follows about the opinion score of a kth index:

o_{k} = \frac{Σ_{s &Element; {OP}_{k}} s c o r e (s)}{| {OP}_{k} |}

Wherein, s represents the adjective of statement suggestion; OP _krepresent the set be made up of the suggestion adjective about a kth index; | OP _k| represent the number of set element; Score (s) represents the suggestion polarity of adjective s, namely+1 ,-1 or 0; By such mode, non-structured comment text can be converted into a vectorial O be made up of opinion score _u,i=[o _{u, i, 1}..., o _{u, i, k}]; The span of the opinion score extracted is [-1,1], the scope of the multi objective scoring that user directly provides is then for [1,5], in order to make both, there is identical span, adopt the mode of equidistant conversion to be converted to by opinion score within interval [1,5], concrete conversion formula is as follows:

o _after＝o _before×2+3

Wherein, o _beforeand o _afterrepresenting the opinion score data before and after conversion respectively, by adopting above formula, can guarantee that multi objective score data has identical span with opinion score data, both can be directly used in respectively the process of carrying out recommending thus and compare their effect;

3) user and commodity similarity matrix build

In commending system, use U={u ₁..., u _nrepresent the set of user, I={i ₁..., i _mrepresent the set of commodity, wherein n and m represents the sum of user and commodity respectively; User can be expressed as one for the evaluation of a certain commodity and be marked the scoring vector r=[r formed by comprehensive grading and multi objective ₀, r ₁, r _k], wherein r ₀represent comprehensive grading, r ₁, r _krepresent the scoring about k index, this scoring vector also can be made up of comprehensive grading and opinion score, i.e. r=[r ₀, o ₁, o _k], wherein r ₀, o ₁, o _krepresent from the comment on commodity that user writes, excavate the opinion score obtained; In experimentation, can directly by r=[r ₀, r ₁, r _k] replace with r=[r ₀, o ₁, o _k] and among the process of cluster and recommendation; Target is simultaneously to user { u ₁..., u _nand commodity { i ₁..., i _mcluster is c bunch; Cluster result should be represented as a partitioned matrix M ∈ [0,1] ^{(n+m) × c}, wherein each element M _i,jrepresent that corresponding element object i belongs to the probability of bunch j, therefore, M when element object i belongs to bunch j time _i,j> 0, otherwise M _i,j=0; Due to M _i,jsize directly reacted the possibility that this element object i belongs to bunch j, so every a line sum of partitioned matrix M requires to be 1; In addition, if limit that each element object can add bunch maximum number, such as l bunch, i.e. 1≤l≤c, so at most only may obtain l nonzero value in the every a line in M; Above-mentioned partitioned matrix can be rewritten as:

M = [\begin{matrix} P \\ Q \end{matrix}]

Wherein, P ∈ [0,1] ^{n × c}for the partitioned matrix about user, Q ∈ [0,1] ^{m × c}for the partitioned matrix about commodity;

For user, similarity matrix SU ∈ [-1,1] ^{n × n}build in the following ways:

{SU}_{x, y} = (\begin{matrix} \underset{i &Element; {CI}_{x, y}}{Σ} \frac{(r_{x, i} - \overset{&OverBar;}{r_{x}}) \cdot (r_{y, i} - \overset{&OverBar;}{r_{y}})}{| r_{x, i} | \cdot | r_{y, i} |} / | {CI}_{x, y} | & i f | {CI}_{x, y} | &NotEqual; 0 \\ 0 & o t h e r w i s e \end{matrix}

Wherein, r _x,iand r _y,irepresent user u respectively _xand u _yto the scoring vector of commodity i, with represent user u respectively _xand u _yaverage score vector, CI _x,yrepresent user u _xand u _ythe common commodity set commented on, | CI _x,y| represent and belong to CI _x,ythe number of commodity; About the similarity matrix SI ∈ [-1,1] of commodity ^{m × m}can build in the following ways:

{SI}_{x, y} = (\begin{matrix} \underset{u &Element; {CU}_{x, y}}{Σ} \frac{(r_{u, x} - \overset{&OverBar;}{r_{x}}) \cdot (r_{u, y} - \overset{&OverBar;}{r_{y}})}{| r_{u, x} | \cdot | r_{u, y} |} / | {CU}_{x, y} | & i f | {CU}_{x, y} | &NotEqual; 0 \\ 0 & o t h e r w i s e \end{matrix}

Wherein, r _u,xand r _u,yrepresent that user u is to commodity i respectively _xand i _yscoring vector, with represent user u respectively _xand u _yaverage score vector, CU _x,yrepresent once to commodity i _xand i _ycarried out user's set of marking, | CU _x,y| represent and belong to CU _x,ythe number of user;

4) bidirectional clustering algorithm is used to obtain cluster matrix

In order to bidirectional clustering can be carried out to user and commodity, propose by minimizing following objective function by the user that is closely related or commodity association;

ϵ (P, Q) = Σ_{i = 1}^{n} Σ_{j = 1}^{n} (| | \frac{p_{i}}{\sqrt{D_{i i}^{r o w}}} - \frac{p_{j}}{\sqrt{D_{j j}^{c o l}}} | |^{2} \cdot {SU}_{i, j}) + Σ_{i = 1}^{m} Σ_{j = 1}^{m} (| | \frac{q_{i}}{\sqrt{E_{i i}^{r o w}}} - \frac{q_{j}}{\sqrt{E_{j j}^{c o l}}} | |^{2} \cdot {SI}_{i, j})

Wherein, p _ii-th row of partitioned matrix P, with be about user to angle matrix, account form is: with q _ii-th row of partitioned matrix Q, with be about commodity to angle matrix, account form is: with

Changed by algebraically, above formula can be converted into:

Wherein:

X = {(D^{r o w})}^{- \frac{1}{2}} S U {(D^{c o l})}^{- \frac{1}{2}}, Y = {(E^{r o w})}^{- \frac{1}{2}} S I {(E^{c o l})}^{- \frac{1}{2}}, K = [\begin{matrix} I_{n} - X & 0 \\ 0 & I_{m} - Y \end{matrix}]

I _n∈ R ^{n × n}representation unit matrix; Solve following optimization problem:

\min_{M} T r (M^{T} K M)

Meet: M ∈ [0,1] ^{(n+m) × c}, P1 _c=1 _n+m, | p _i|=l, i=1 ..., (n+m); Parameter c be cluster bunch number and l be each user or commodity can belong to bunch maximum number, i.e. 1≤l≤c; In addition, symbol || represent the number of a vectorial nonzero element;

Propose a two stage strategy to solve above formula, specifically describe as follows:

4.1) search for a shared lower dimensional space to represent all users and merchandise news, optimum reservation user and the t of merchandise news tie up matrix Z ' can by obtaining following problem solving:

\min_{Z} T r (Z^{T} K Z)

Meet: Z ∈ [0,1] ^{(n+m) × t}, Z ^tz=I _t; Wherein, I _t∈ R _{t × t}representation unit matrix and Z ^tz=I _t; Here, Z ^tz=I _tbe mainly used in avoiding matrix Z arbitrary extension; Because k is a positive semidefinite matrix, so can by solving acquisition to eigenvalue problem KZ=λ Z, namely Z '=[z for optimum solution Z ' ₁..., z _t], wherein z ₁..., z _tit is the minimal characteristic vector retained according to the eigenwert of matrix k;

4.2) cluster is carried out to user and commodity simultaneously, i.e. bidirectional clustering;

Consider that user and commodity all can appear in one or more bunch simultaneously, the matrix Z ' proposed remaining user and merchandise news to the full extent above performs FuzzyC-Means clustering algorithm; Perform the process of FuzzyC-Means clustering algorithm, namely following objective function carried out to the process of iteration optimization:

\min J (M, V) = \min Σ_{i = 1}^{m + n} Σ_{j = 1}^{c} {(M_{i, j})}^{θ} d {(e_{i}, v_{j})}^{2}

Wherein M _i,jrepresent element e _ibelong to the probability of bunch j, v _jrepresent the center of bunch j; Function d (﹒) represent Euclidean distance function, θ represents the parameter of the fog-level for controlling cluster result; In iterative process each time, algorithm upgrades the element of matrix M and V according to following formula:

M_{i, j} = {(d (e_{i}, v_{j}))}^{2 / (1 - θ)} / [Σ_{l = 1}^{c} {(d (e_{i}, v_{l}))}^{2 / (1 - θ)}]

v_{j} = [Σ_{i = 1}^{m + n} {M_{i, j}}^{θ} \cdot e_{i}] / [Σ_{i = 1}^{m + n} {M_{i, j}}^{θ}]

Wherein j=1 ... if the gap of c objective function minJ (M, V) in two continuous print iterative process is not less than threshold epsilon, and algorithm will be terminated; After solving matrix M, in the every a line of matrix, maximum and summation the exceedes predetermined threshold value element of l will be retained, and be normalized, and therefore, in matrix M, the element sum of every a line is 1;

5) recommend in single bunch

5.1) aggregate function algorithm is used to obtain recommending in single bunch

Recommend method based on aggregate function is generally supposed: user marks to the comprehensive grading of commodity and multi objective and is closely related, and namely comprehensive grading is often marked by multi objective and determined; Thus, the recommend method based on aggregate function proposes to utilize multi objective scoring structure about the aggregate function of comprehensive grading, is predicted by the scoring of the aggregate function built to user; Propose to use the homing method based on principal component analysis build the aggregate function about comprehensive grading and use it for calculated recommendation result; Principal component analysis is a kind of method for carrying out Dimension Reduction Analysis to data, and its essential core thought is that the main composition by extracting minority from data sample represents all data samples; How to select main composition mainly to carry out according to the eigenwert variance of sample data, namely each main composition selected is all that in data sample, eigenwert variance is maximum; According to being incoherent mutually between the major component that eigenwert variance is chosen, the co-linear relationship impact existed between multi objective scoring can be got rid of thus; On this basis, the main composition that utilization is chosen builds the aggregate function about dependent variable;

Return after building the aggregate function about comprehensive grading to user using main composition, targeted customer to be predicted by following formula may marking of Candidate Recommendation commodity:

r_{u, i} = Σ_{c = 1}^{k} w_{c} (\underset{u^{'} &Element; C U}{Σ} r_{u^{'}, c} / | C U |)

Wherein, r _u,irepresent that targeted customer u marks to the prediction of Candidate Recommendation commodity i, w _crepresent the coefficient about index c in aggregate function, r _{u ', c}represent that user u ' is to the scoring of commodity i on index c, cu represents the user's set being positioned at same cluster and carrying out commodity i marking, | CU| represents the number of the user being arranged in set cu;

5.2) collaborative filtering function algorithm is used to obtain recommending in single bunch

The core concept of collaborative filtering based on multi objective scoring is: even if user by cluster to same have identical or similar index preference bunch in, they neither have on all four index preference; In other words, when prediction recommendation results, should be treated by differentiation with the different user in cluster; Thus, propose to use the collaborative filtering based on multi objective scoring to produce recommendation results, specific formula for calculation is:

r_{u, i} = \overset{&OverBar;}{r_{u}} + \frac{Σ_{u^{'} &Element; C U} s i m (u, u^{'}) \times (r_{u^{'}, i} - \overset{&OverBar;}{r_{u^{'}}})}{Σ_{u^{'} &Element; C U} s i m (u, u^{'})}

Wherein, represent the average of the comprehensive grading of user u, r _{u ', i}represent that user u ' is to the comprehensive grading of commodity i, sim (u, u ') represent utilize multi objective score calculation about the Interest Similarity between user u and u ';

Take the computing method based on Euclidean distance, specifically describe as follows:

User u _xand u _ybe r to two of commodity i scoring vectors _x,i=[r _{x, 1}, r _x,k] and r _y,i=[r _{y, 1}, r _y,k], both Euclidean distances are calculated as follows:

d (r_{x, i}, r_{y, i}) = \sqrt{Σ_{c = 0}^{k} {| r_{x, c} - r_{y, c} |}^{2}}

User u _xand u _yoverall distance be calculated as the average of the Euclidean distance of the scoring vector of the commodity that they commented on jointly, that is:

d (u_{x}, u_{y}) = \underset{i &Element; C I}{Σ} \frac{d (r_{x, i}, r_{y, i})}{| C I |}

If the Interest Similarity of two users is higher, then their overall distance should be less; In other words, there is reverse-power between the two; Thus, user u _xand u _yinterest Similarity be calculated as follows:

s i m (u_{x}, u_{y}) = \frac{1}{1 + d (u_{x}, u_{y})};

6) comprehensive function algorithm is used to obtain final recommendation results

The bidirectional clustering algorithm based on multi objective scoring adopted above, after cluster, same user or commodity allow to appear in multiple bunches simultaneously, the proposed algorithm proposed only utilizes the score data prediction recommendation results existed in bunch at every turn, the recommendation results that one or more derives from different bunches will be obtained like this, therefore, need to find suitable strategy these recommendation results to be integrated and return to targeted customer as final recommendation results; Because the clustering algorithm proposed is based on following two hypothesis: if 1. two users give identical or similar comprehensive grading and multi objective scoring to same or multiple commodity, these two users very likely belong to one or more bunch simultaneously; If 2. two commodity are given identical or similar comprehensive grading by one or more user and multi objective is marked, these two commodity very likely belong to one or more bunch simultaneously; Therefore, can by the element belonged to about user and commodity in the partitioned matrix M of distribution after cluster, i.e. M _i,j, regard as the similarity degree of other element in this element object i and bunch j, namely user with bunch in the similarity of index preference of other user, or, commodity with bunch in other commodity by the similarity of user comment; When comprehensive multiple recommendation results, need about the similarity indicated value of user and commodity and M _i,jin considering, following comprehensive strategic is proposed thus:

R_{u, i} = \{\begin{matrix} Σ_{l = 1}^{h} \Pr e (u, i, l) \cdot M_{u, l} \cdot M_{i, l} & i f u a n d i b e l o n g t o o n e \\ 0 & o t h e r w i s e \end{matrix}

Wherein, R _u,irepresent the final prediction scoring of user u to commodity i, Pre (u, i, l) represents that user u is to the recommendation results of commodity i in bunch l; Use above-mentioned comprehensive strategic, only have when user u and commodity i belongs to one or more bunch time, i.e. M simultaneously _x,l≠ 0, M _y,l≠ 0, l=1 ..., h, h≤c; Proposed algorithm could produce and predict the outcome; In addition, parameter h represent parameter recommend be having of considering maximum be subordinate to probability bunch number, just only have front h to have maximum be subordinate to probability bunch information can be considered, into interior generations recommendation, to do so mainly in order to filtered noise information.

In step 1) in, weigh the χ of the word frequency dependence between a feature critical word w and index A ²statistical indicator is calculated as follows:

χ^{2} (ω, A) = \frac{C \times {(C_{1} C_{4} - C_{2} C_{3})}^{2}}{(C_{1} + C_{3}) \times (C_{4} + C_{2}) \times (C_{1} + C_{2}) \times (C_{4} + C_{3})}

Wherein, c represents the number of times that all feature critical words occur, C ₁representation feature key word w appears at the number of times belonged in the sentence of index A, C ₂representation feature key word w appears at the number of times do not belonged in the sentence of index A, C ₃represent and belong to index A's but do not comprise the number of the sentence of feature critical word w, C ₄represent the number not comprising again the sentence of feature critical word w neither belonging to index A.

Compared with prior art, tool has the following advantages and beneficial effect in the present invention:

The present invention propose to use bidirectional clustering algorithm according to user to the evaluation information of commodity simultaneously by user and commercial articles clustering in different bunches, after cluster, the commodity that the user being positioned at same cluster should belong to this bunch have identical index preference.It should be noted that and adopt bidirectional clustering algorithm in this paper, user or commodity can belong to one or more different bunches simultaneously.On the basis of bidirectional clustering, the proposed algorithm that proposition two kinds is different further herein: 1) based on the aggregate function algorithm of principal component regression, consider that tradition is not suitable for process cold start-up user based on least square regression aggregate function algorithm, and the co-linear relationship can not eliminated between the scoring of multiple index, we propose to utilize the method for principal component regression to be that each user builds aggregate function and calculated recommendation result; 2) collaborative filtering, similar with traditional multi objective collaborative filtering, the various dimensions score data that we adopt user to provide calculates the similarity between user, then the comprehensive grading provided in conjunction with user produces to be recommended, unlike, the proposed algorithm that we propose, when calculating user's similarity or considering comprehensive grading, only can consider the information being positioned at same cluster, namely there is user profile that is identical or similar index preference, thus can improve recommendation quality.User has different index preferences to different commodity, by using bidirectional clustering algorithm in this paper, successfully can distinguish the different index preference of user and corresponding commodity, apply two proposed algorithms in this paper on this basis and all can improve recommendation results further, wherein multi objective collaborative filtering is in the proposed algorithm of recommending all to be better than in accuracy rate and coverage rate two based on principal component regression aggregate function.Compare the aggregate function algorithm of tradition based on least square regression, the aggregate function algorithm based on principal component regression in this paper can process the linear effect existed between the scoring of different index, makes describing more accurately the index preference of user.Multi objective scoring is all conducive to improving with opinion score two kinds of information recommends quality, and wherein multi objective scoring is conducive to improving recommendation coverage rate, and opinion score is then conducive to improving recommends accuracy rate.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of proposed algorithm of the present invention.

Embodiment

Below in conjunction with specific embodiment, the invention will be further described.

As shown in Figure 1, the proposed algorithm based on multi objective scoring described in the present embodiment, comprises the following steps:

1) identification of index keyword

User, when writing comment on commodity, except can directly comment on except the index of commodity, also can comment on the correlated characteristic under this index.For example, hotel user is when commenting on this index of service, and user also may can mention and serve relevant further feature, such as " allthestaffwerebrilliant ", and wherein " staff " serves relevant feature critical word to index just.Obviously, no matter refer to sample body, or be the feature critical word relevant to index, user should be used for their suggestion the opinion score calculating this index, thus, we adopt a kind of key word of algorithm to index based on self-propagation pattern to identify, this algorithm is by excavating the index key word corresponding to the word frequency relation recognition between candidate key, and it is as follows that it mainly performs flow process:

1.1) each comment of data centralization is divided into sentence { x ₁, x ₂..., and construct one, by minority, there is the list that characteristics key word forms, as hotel recommend in the list of geographic position index can be chosen for { location, area, street, bus}.

1.2) according to original Keyword List, by each sentence mark in comment corpus to the index with it with maximum word frequency registration.

1.3) χ is used ²statistical indicator weighs the word frequency relation between each index and key word, and is added in the Keyword List of this index by the key word that front t has the highest word frequency dependence.

1.4) repeat said process until algorithm meets end condition, namely the Keyword List of index remain unchanged or algorithm cycle index arrive threshold value.

Weigh the χ of the word frequency dependence between a feature critical word w and index A ²statistical indicator is calculated as follows:

χ^{2} (ω, A) = \frac{C \times {(C_{1} C_{4} - C_{2} C_{3})}^{2}}{(C_{1} + C_{3}) \times (C_{4} + C_{2}) \times (C_{1} + C_{2}) \times (C_{4} + C_{3})}

It should be noted that we think that feature critical word is primarily of noun or noun phrase composition, so we are when index of performance recognizer, only consider to use noun or noun phrase to form candidate feature vocabulary v.For example, when processing comment sentence " Staffwereexcellent; harborviewroomwasquiteagoodsizebyHongKongstandardsandeve rythingransmoothly ", we can by noun " staff " and noun phrase " harborviewroom " for building candidate feature vocabulary v.In addition, we are when determining the number of index, both can according to the priori (index definition as in the multi objective scoring that user directly provides) about research object, also can determine the number of index and definition according to actual observation etc.In actual experiment, the method that we take both to combine, the index first choosing definition in multi objective scoring builds index classification, then supplements according to the index classification of comment to preliminary definition of reality.

2) opinion score extracts

After recognition value index and relevant feature critical word thereof, we carry out grammatical analysis to the statement in comment and extract the suggestion of user to index or feature critical word.After carrying out grammatical analysis to comment sentence, we can obtain the grammer dependence of sentence between different words.The same with the research work that great majority carry out comment on commodity excavation, we think that adjective is the main carriers of user's expression of opinion.Meanwhile, most consumers' opinions belongs to the wherein a kind of of following two kinds of grammatical representation forms: Adjectivalmodifiers, such as " agreatlocation ", and key word " location " modified in adjective " great "; Such as " allthestaffwerebrilliant ", key word " staff " is the main body that adjective " brilliant " is modified.By the specific syntactic pattern of grammatical analysis identification, we can extract the suggestion that user states product features key word.In order to further the suggestion that user states is converted into score data, we use subjective clue dictionary (SubjectiveClueLexicon) to determine the adjectival feeling polarities of these expression on feature critical word, i.e. front evaluation, unfavorable ratings or neutral evaluation.By carrying out polarity orientation to adjective, adjective such as " good ", " excellent ", " brilliant ", " wonderful " etc. with front polarity will be assigned+1, adjective such as " bad ", " awful ", " disappointed " etc. with negative polarity will be assigned-1, in addition, be judged as neutral adjective such as " normal ", " average " etc. and will be assigned 0.It should be noted that in the process of carrying out opinion score extraction, if detect in sentence to there is negative word, as " not ", " no ", " never " etc., corresponding suggestion polarity will be reversed.

o_{k} = \frac{Σ_{s &Element; {OP}_{k}} s c o r e (s)}{| {OP}_{k} |}

Wherein, s represents the adjective of statement suggestion; OP _krepresent the set be made up of the suggestion adjective about a kth index; | OP _k| represent the number of set element; Score (s) represents the suggestion polarity of adjective s, namely+1 ,-1 or 0; By such mode, non-structured comment text can be converted into a vectorial O be made up of opinion score _u,i=[o _{u, i, 1}..., o _{u, i, k}]; It should be noted that, the span of the opinion score extracted is [-1,1], the scope of the multi objective scoring that user directly provides is then for [1,5], in order to make both have identical span, we adopt the mode of equidistant conversion that opinion score is converted to interval [1,5], within, concrete conversion formula is as follows:

o _after＝o _before×2+3

Wherein, o _beforeand o _afterrepresent the opinion score data before and after conversion respectively, by adopting above formula, we can guarantee that multi objective score data has identical span with opinion score data, both can be directly used in respectively the process of carrying out recommending thus and compare their effect.

3) user and commodity similarity matrix build

In commending system, we use U={u ₁..., u _nrepresent the set of user, I={i ₁..., i _mrepresent the set of commodity, wherein n and m represents the sum of user and commodity respectively; User can be expressed as one for the evaluation of a certain commodity and be marked the scoring vector r=[r formed by comprehensive grading and multi objective ₀, r ₁, r _k], wherein r ₀represent comprehensive grading, r ₁, r _krepresent the scoring about k index, this scoring vector also can be made up of comprehensive grading and opinion score, i.e. r=[r ₀, o ₁, o _k], wherein r ₀, o ₁, o _krepresent from the comment on commodity that user writes, excavate the opinion score obtained; In experimentation, can directly by r=[r ₀, r ₁, r _k] replace with r=[r ₀, o ₁, o _k] and among the process of cluster and recommendation; Target is simultaneously to user { u ₁..., u _nand commodity { i ₁..., i _mcluster is c bunch; Cluster result should be represented as a partitioned matrix M ∈ [0,1] ^{(n+m) × c}, wherein each element M _i,jrepresent that corresponding element object i belongs to the probability of bunch j, therefore, M when element object i belongs to bunch j time _i,j> 0, otherwise M _i,j=0; Due to M _i,jsize directly reacted the possibility that this element object i belongs to bunch j, so every a line sum of partitioned matrix M requires to be 1; In addition, if limit that each element object can add bunch maximum number, such as l bunch, i.e. 1≤l≤c, so at most only may obtain l nonzero value in the every a line in M; Above-mentioned partitioned matrix can be rewritten as:

M = [\begin{matrix} P \\ Q \end{matrix}]

Wherein, P ∈ [0,1] ^{n × c}for the partitioned matrix about user, Q ∈ [0,1] ^{m × c}for the partitioned matrix about commodity.

In order to carry out mathematical notation to solution in this paper, first we define a similarity matrix respectively for user and commodity.For user, similarity matrix SU ∈ [-1,1] ^{n × n}build in the following ways:

{SU}_{x, y} = (\begin{matrix} \underset{i &Element; {CI}_{x, y}}{Σ} \frac{(r_{x, i} - \overset{&OverBar;}{r_{x}}) \cdot (r_{y, i} - \overset{&OverBar;}{r_{y}})}{| r_{x, i} | \cdot | r_{y, i} |} / | {CI}_{x, y} | & i f | {CI}_{x, y} | &NotEqual; 0 \\ 0 & o t h e r w i s e \end{matrix}

{SI}_{x, y} = (\begin{matrix} \underset{u &Element; {CU}_{x, y}}{Σ} \frac{(r_{u, x} - \overset{&OverBar;}{r_{x}}) \cdot (r_{u, y} - \overset{&OverBar;}{r_{y}})}{| r_{u, x} | \cdot | r_{u, y} |} / | {CU}_{x, y} | & i f | {CU}_{x, y} | &NotEqual; 0 \\ 0 & o t h e r w i s e \end{matrix}

Wherein, r _u,xand r _u,yrepresent that user u is to commodity i respectively _xand i _yscoring vector, with represent user u respectively _xand u _yaverage score vector, CU _x,yrepresent once to commodity i _xand i _ycarried out user's set of marking, | CU _x,y| represent and belong to CU _x,ythe number of user.

4) bidirectional clustering algorithm is used to obtain cluster matrix

In order to carry out bidirectional clustering to user and commodity, we propose by minimizing following objective function by the user that is closely related or commodity association;

ϵ (P, Q) = Σ_{i = 1}^{n} Σ_{j = 1}^{n} (| | \frac{p_{i}}{\sqrt{D_{i i}^{r o w}}} - \frac{p_{j}}{\sqrt{D_{j j}^{c o l}}} | |^{2} \cdot {SU}_{i, j}) + Σ_{i = 1}^{m} Σ_{j = 1}^{m} (| | \frac{q_{i}}{\sqrt{E_{i i}^{r o w}}} - \frac{q_{j}}{\sqrt{E_{j j}^{c o l}}} | |^{2} \cdot {SI}_{i, j})

Changed by algebraically, above formula can be converted into:

Wherein:

X = {(D^{r o w})}^{- \frac{1}{2}} S U {(D^{c o l})}^{- \frac{1}{2}}, Y = {(E^{r o w})}^{- \frac{1}{2}} S I {(E^{c o l})}^{- \frac{1}{2}}, K = [\begin{matrix} I_{n} - X & 0 \\ 0 & I_{m} - Y \end{matrix}]

I _n∈ R ^{n × n}representation unit matrix.Thus, we study a question to be converted into and solve following optimization problem:

\min_{M} T r (M^{T} K M)

Meet: M ∈ [0,1] ^{(n+m) × c}, P1 _c=1 _n+m, | p _i|=l, i=1 ..., (n+m); Parameter c be cluster bunch number and l be each user or commodity can belong to bunch maximum number, i.e. 1≤l≤c; In addition, symbol || represent the number of a vectorial nonzero element.

We propose a two stage strategy and solve above formula, specifically describe as follows:

\min_{Z} T r (Z^{T} K Z)

Meet: Z ∈ [0,1] ^{(n+m) × t}, Z ^tz=I _t; Wherein, I _t∈ R _{t × t}representation unit matrix and Z ^tz=I _t; Here, Z ^tz=I _tbe mainly used in avoiding matrix Z arbitrary extension; Because k is a positive semidefinite matrix, so can by solving acquisition to eigenvalue problem KZ=λ Z, namely Z '=[z for optimum solution Z ' ₁..., z _t], wherein z ₁..., z _tit is the minimal characteristic vector retained according to the eigenwert of matrix k.

Consider that user and commodity all can appear in one or more bunch simultaneously, we propose above to perform FuzzyC-Means clustering algorithm to the matrix Z ' remaining user and merchandise news to the full extent; Perform the process of FuzzyC-Means clustering algorithm, namely following objective function carried out to the process of iteration optimization (namely minimizing):

\min J (M, V) = \min Σ_{i = 1}^{m + n} Σ_{j = 1}^{c} {(M_{i, j})}^{θ} d {(e_{i}, v_{j})}^{2}

M_{i, j} = {(d (e_{i}, v_{j}))}^{2 / (1 - θ)} / [Σ_{l = 1}^{c} {(d (e_{i}, v_{l}))}^{2 / (1 - θ)}]

v_{j} = [Σ_{i = 1}^{m + n} {M_{i, j}}^{θ} \cdot e_{i}] / [Σ_{i = 1}^{m + n} {M_{i, j}}^{θ}]

Wherein j=1 .., if the gap of c objective function minJ (M, V) in two continuous print iterative process is not less than threshold epsilon, algorithm will be terminated; After solving matrix M, in the every a line of matrix, maximum and summation exceedes predetermined threshold value (such as 0.9) element of l will be retained, and be normalized, and therefore, in matrix M, the element sum of every a line is 1.

In short, first the proposed clustering algorithm based on multi objective scoring builds similarity measurements moment matrix to user and commodity respectively according to the feature of multi objective scoring, then the clustering problem of research is converted into an optimization problem, on this basis in conjunction with the thought of FuzzyC-Means clustering algorithm, success carries out bidirectional clustering to user and commodity, by user and the commodity probability distribution in different bunches, the index preference used is evaluated to user portray in commodity process.

5) recommend in single bunch

Recommend method based on aggregate function is generally supposed: user marks to the comprehensive grading of commodity and multi objective and is closely related, and namely comprehensive grading is often marked by multi objective and determined.Thus, the recommend method based on aggregate function proposes to utilize multi objective scoring structure about the aggregate function of comprehensive grading, is predicted by the scoring of the aggregate function built to user.We propose to use the homing method based on principal component analysis build the aggregate function about comprehensive grading and use it for calculated recommendation result.Principal component analysis is a kind of method for carrying out Dimension Reduction Analysis to data, and its essential core thought is that the main composition by extracting minority from data sample represents all data samples.How to select main composition mainly to carry out according to the eigenwert variance of sample data, namely each main composition selected is all that in data sample, eigenwert variance is maximum.It is specifically intended that be incoherent mutually between the major component chosen according to eigenwert variance, the co-linear relationship impact existed between multi objective scoring can be got rid of thus.On this basis, the main composition that utilization is chosen builds the aggregate function about dependent variable.

r_{u, i} = Σ_{c = 1}^{k} w_{c} (\underset{u^{'} &Element; C U}{Σ} r_{u^{'}, c} / | C U |)

Wherein, r _u,irepresent that targeted customer u marks to the prediction of Candidate Recommendation commodity i, w _crepresent the coefficient about index c in aggregate function, r _{u ', c}represent that user u ' is to the scoring of commodity i on index c, cu represents the user's set being positioned at same cluster and carrying out commodity i marking, | CU| represents the number of the user being arranged in set cu.

The core concept of collaborative filtering based on multi objective scoring is: even if user by cluster to same have identical or similar index preference bunch in, they neither have on all four index preference.In other words, when prediction recommendation results, should be treated by differentiation with the different user in cluster.Thus, we propose to use the collaborative filtering based on multi objective scoring to produce recommendation results, and specific formula for calculation is:

r_{u, i} = \overset{&OverBar;}{r_{u}} + \frac{Σ_{u^{'} &Element; C U} s i m (u, u^{'}) \times (r_{u^{'}, i} - \overset{&OverBar;}{r_{u^{'}}})}{Σ_{u^{'} &Element; C U} s i m (u, u^{'})}

Wherein, represent the average of the comprehensive grading of user u, r _{u ', i}represent that user u ' is to the comprehensive grading of commodity i, sim (u, u ') represent utilize multi objective score calculation about the Interest Similarity between user u and u '.

We take the computing method based on Euclidean distance, specifically describe as follows:

d (r_{x, i}, r_{y, i}) = \sqrt{Σ_{c = 0}^{k} {| r_{x, c} - r_{y, c} |}^{2}}

d (u_{x}, u_{y}) = \underset{i &Element; C I}{Σ} \frac{d (r_{x, i}, r_{y, i})}{| C I |}

s i m (u_{x}, u_{y}) = \frac{1}{1 + d (u_{x}, u_{y})} .

Above we adopt based on multi objective scoring bidirectional clustering algorithm, after cluster, same user or commodity allow to appear in multiple bunches simultaneously, the proposed algorithm proposed only utilizes the score data prediction recommendation results existed in bunch at every turn, the recommendation results that one or more derives from different bunches will be obtained like this, therefore, need to find suitable strategy these recommendation results to be integrated and return to targeted customer as final recommendation results; Because the clustering algorithm proposed is based on following two hypothesis: if 1. two users give identical or similar comprehensive grading and multi objective scoring to same or multiple commodity, these two users very likely belong to one or more bunch simultaneously; If 2. two commodity are given identical or similar comprehensive grading by one or more user and multi objective is marked, these two commodity very likely belong to one or more bunch simultaneously; Therefore, can by the element belonged to about user and commodity in the partitioned matrix M of distribution after cluster, i.e. M _i,j, regard as the similarity degree of other element in this element object i and bunch j, namely user with bunch in the similarity of index preference of other user, or, commodity with bunch in other commodity by the similarity of user comment; When comprehensive multiple recommendation results, need about the similarity indicated value of user and commodity and M _i,jin considering, following comprehensive strategic is proposed thus:

R_{u, i} = \{\begin{matrix} Σ_{l = 1}^{h} \Pr e (u, i, l) \cdot M_{u, l} \cdot M_{i, l} & i f u a n d i b e l o n g t o o n e \\ 0 & o t h e r w i s e \end{matrix}

Wherein, R _u,irepresent the final prediction scoring of user u to commodity i, Pre (u, i, l) represents that user u is to the recommendation results of commodity i in bunch l; Use above-mentioned comprehensive strategic, we it should be noted that, only have when user u and commodity i belongs to one or more bunch simultaneously time, i.e. M _x,l≠ 0, M _y,l≠ 0, l=1 ..., h, h≤c; Our proposed algorithm could produce and predict the outcome; In addition, parameter h represent parameter recommend be having of considering maximum be subordinate to probability bunch number, just only have front h to have maximum be subordinate to probability bunch information can be considered, into interior generations recommendation, to do so mainly in order to filter certain noise information.In actual experiment, we choose h=4.

A kind of multi objective scoring proposed algorithm of novelty is proposed: first according to user, cluster is carried out by user and commodity to the multi objective score information of commodity simultaneously herein, the user being positioned at same cluster after cluster to bunch commodity there is identical or similar index preference; On the basis of carrying out bidirectional clustering, the algorithm that proposition two kinds is different herein utilizes cluster result to produce and recommends: based on aggregate function algorithm and the multi objective collaborative filtering of principal component regression.Multi objective scoring is extracted in the comment on commodity also proposing herein in addition to utilize user to write, and among the process being applied to above-mentioned cluster and recommendation, the not same-action that the multi objective that comparing itself and user directly provides is marked.Result shows that the present invention has higher accuracy rate.

Compared to the collaborative filtering only utilizing comprehensive grading, utilize the proposed algorithm of multi objective scoring or opinion score can provide higher-quality recommendation results for user, utilize multi objective to mark and can fully take into account the satisfaction to each index of user, and user is more prone to evaluate the index ' s quality that they pay attention to when writing comment on commodity, in other words, compared to the multi objective scoring that user provides, the opinion score excavated from comment more can reflect the index preference of user, thus can obtain higher-quality recommendation results, be worthy to be popularized.

The above embodiment is only the preferred embodiment of the present invention, not limits practical range of the present invention with this, therefore the change that all shapes according to the present invention, principle are done, all should be encompassed in protection scope of the present invention.

Claims

1., based on a proposed algorithm for multi objective scoring, it is characterized in that, comprise the following steps:

1) identification of index keyword

2) opinion score extracts

o_{k} = \frac{Σ_{s &Element; {OP}_{k}} s c o r e (s)}{| {OP}_{k} |}

o _after＝o _before×2+3

3) user and commodity similarity matrix build

M = [\begin{matrix} P \\ Q \end{matrix}]

For user, similarity matrix SU ∈ [-1,1] ^{n × n}build in the following ways:

{SU}_{x, y} = (\begin{matrix} \underset{i &Element; {CI}_{x, y}}{Σ} \frac{(r_{x, i} - \overset{&OverBar;}{r_{x}}) \cdot (r_{y, i} - \overset{&OverBar;}{r_{y}})}{| r_{x, i} | \cdot | r_{y, i} |} / | {CI}_{x, y} | & i f | {CI}_{x, y} | &NotEqual; 0 \\ 0 & o t h e r w i s e \end{matrix}

{SI}_{x, y} = (\begin{matrix} \underset{u &Element; {CU}_{x, y}}{Σ} \frac{(r_{u, x} - \overset{&OverBar;}{r_{x}}) \cdot (r_{u, y} - \overset{&OverBar;}{r_{y}})}{| r_{u, x} | \cdot | r_{u, y} |} / | {CU}_{x, y} | & i f | {CU}_{x, y} | &NotEqual; 0 \\ 0 & o t h e r w i s e \end{matrix}

4) bidirectional clustering algorithm is used to obtain cluster matrix

ϵ (P, Q) = Σ_{i = 1}^{n} Σ_{j = 1}^{n} (| | \frac{p_{i}}{\sqrt{D_{i i}^{r o w}}} - \frac{p_{j}}{\sqrt{D_{j j}^{c o l}}} | |^{2} \cdot {SU}_{i, j}) + Σ_{i = 1}^{m} Σ_{j = 1}^{m} (| | \frac{q_{i}}{\sqrt{E_{i i}^{r o w}}} - \frac{q_{j}}{\sqrt{E_{j j}^{c o l}}} | |^{2} \cdot {SI}_{i, j})

Changed by algebraically, above formula can be converted into:

Wherein:

X = {(D^{r o w})}^{- \frac{1}{2}} S U {(D^{c o l})}^{- \frac{1}{2}},

Y = {(E^{r o w})}^{- \frac{1}{2}} S I {(E^{c o l})}^{- \frac{1}{2}},

K = [\begin{matrix} I_{n} - X & 0 \\ 0 & I_{m} - Y \end{matrix}]

\min_{M} T r (M^{T} K M)

\min_{Z} T r (Z^{T} K Z)

\min J (M, V) = \min Σ_{i = 1}^{m + n} Σ_{j = 1}^{c} {(M_{i, j})}^{θ} d {(e_{i}, v_{j})}^{2}

M_{i, j} = {(d (e_{i}, v_{j}))}^{2 / (1 - θ)} / [Σ_{l = 1}^{c} {(d (e_{i}, v_{l}))}^{2 / (1 - θ)}]

v_{j} = [Σ_{i = 1}^{m + n} {M_{i, j}}^{θ} \cdot e_{i}] / [Σ_{i = 1}^{m + n} {M_{i, j}}^{θ}]

Wherein i=1 .., (m+n), j=1 .., if the gap of c objective function minJ (M, V) in two continuous print iterative process is not less than threshold epsilon, algorithm will be terminated; After solving matrix M, in the every a line of matrix, maximum and summation the exceedes predetermined threshold value element of l will be retained, and be normalized, and therefore, in matrix M, the element sum of every a line is 1;

5) recommend in single bunch

r_{u, i} = Σ_{c = 1}^{k} w_{c} (\underset{u^{'} &Element; C U}{Σ} r_{u^{'}, c} / | C U |)

r_{u, i} = \overset{&OverBar;}{r_{u}} + \frac{Σ_{u^{'} &Element; C U} s i m (u, u^{'}) \times (r_{u^{'}, i} - \overset{&OverBar;}{r_{u^{'}}})}{Σ_{u^{'} &Element; C U} s i m (u, u^{'})}

User u _xand u _ybe r to two of commodity i scoring vectors _x,i=[r _{x, 1}, r _x,k] and r _y,i=] r _{y, 1}, r _y,k], both Euclidean distances are calculated as follows:

d (r_{x, i}, r_{y, i}) = \sqrt{Σ_{c = 0}^{k} {| r_{x, c} - r_{y, c} |}^{2}}

d (u_{x}, u_{y}) = \underset{i &Element; C I}{Σ} \frac{d (r_{x, i}, r_{y, i})}{| C I |}

s i m (u_{x}, u_{y}) = \frac{1}{1 + d (u_{x}, u_{y})};

R_{u, i} = \{\begin{matrix} Σ_{l = 1}^{h} \Pr e (u, i, l) \cdot M_{u, l} \cdot M_{i, l} & if u and i belong to one \\ 0 & o t h e r w i s e \end{matrix}

2. a kind of proposed algorithm based on multi objective scoring according to claim 1, is characterized in that: in step 1) in, weigh the χ of the word frequency dependence between a feature critical word w and index A ²statistical indicator is calculated as follows:

χ^{2} (ω, A) = \frac{C \times {(C_{1} C_{4} - C_{2} C_{3})}^{2}}{(C_{1} + C_{3}) \times (C_{4} + C_{2}) \times (C_{1} + C_{2}) \times (C_{4} + C_{3})}