CN105095477A - Recommendation algorithm based on multi-index grading - Google Patents

Recommendation algorithm based on multi-index grading Download PDF

Info

Publication number
CN105095477A
CN105095477A CN201510493550.6A CN201510493550A CN105095477A CN 105095477 A CN105095477 A CN 105095477A CN 201510493550 A CN201510493550 A CN 201510493550A CN 105095477 A CN105095477 A CN 105095477A
Authority
CN
China
Prior art keywords
user
commodity
represent
index
bunch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510493550.6A
Other languages
Chinese (zh)
Inventor
陈健
林世杭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201510493550.6A priority Critical patent/CN105095477A/en
Publication of CN105095477A publication Critical patent/CN105095477A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation algorithm based on multi-index grading. The recommendation algorithm comprises the following steps of firstly, recognizing index keywords, secondly, extracting suggestion grading, thirdly, constructing a user and commodity similarity matrix, fourthly, using a two-way clustering algorithm for obtaining a clustering matrix, fifthly, conducting single in-cluster recommendation and sixthly using a comprehensive function algorithm for obtaining a final recommendation result. According to the recommendation algorithm, the problem that a user may need individual recommendations for different index preferences for different commodities can be solved, the high accuracy is achieved, and the recommendation result with the higher quality can be obtained.

Description

A kind of proposed algorithm based on multi objective scoring
Technical field
The present invention relates to the technical field of Technologies of Recommendation System in E-Commerce, refer in particular to a kind of proposed algorithm based on multi objective scoring.
Background technology
By being initiatively that user pushes its interested information of possibility or service, commending system helps user obtain more useful informations and save retrieval time.The realization of conventional recommendation systems depends on collaborative filtering, although achieve successfully within the specific limits, but collaborative filtering often only utilizes the hobby of single comprehensive grading to user to portray, comprehensive grading can only portray the degree that user likes commodity, likes the reason of these commodity but to know nothing to user.In order to carry out more careful portray and improve the accuracy of recommendation results to the preference information of user, emerging commending system should be devoted to obtain user to the score information of the different index of commodity and be used.Here, the attribute that index expression commodity are total, such as hotel, its geographic position, room, service etc. can the indexs of this hotel's quality of user profile.
Along with appearance and the development of Web2.0 technology, increasing large-scale website encourages user to carry out interaction with website in many ways, and this makes commending system obtain and utilizes multi objective score information to become possibility.In recent years, many scholars are while emphasizing multi objective scoring importance, also point out that user is that the comment that commodity are write is significant, this kind of comment on commodity often contains the evaluation information of a large number of users to commodity, in other words, except directly being provided by user, multi objective scoring also can rely on certain comment digging technology and obtain by comment on commodity.
At present, the commending system based on multi objective scoring all achieves certain achievement in research with the commending system excavated based on comment on commodity.But these achievements in research are to a great extent based on a set hypothesis: user all takes identical index preference to all commodity, in fact, such hypothesis and our daily cognitive presence error.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, a kind of proposed algorithm based on multi objective scoring is provided, to solve the different index preferences of personalized recommendation problem user may to have to(for) different commodity.
For achieving the above object, technical scheme provided by the present invention is: a kind of proposed algorithm based on multi objective scoring, comprises the following steps:
1) identification of index keyword
1.1) each comment of data centralization is divided into sentence { x 1, x 2..., and construct one by the list there is characteristics key word forming;
1.2) according to original Keyword List, by each sentence mark in comment corpus to the index with it with maximum word frequency registration;
1.3) χ is used 2statistical indicator weighs the word frequency relation between each index and key word, and adds in the Keyword List of this index by the key word that front t has the highest word frequency dependence;
1.4) repeat said process until algorithm meets end condition, namely the Keyword List of index remain unchanged or algorithm cycle index arrive threshold value;
2) opinion score extracts
After recognition value index and relevant feature critical word thereof, grammatical analysis is carried out to the statement in comment and extracts the suggestion of user to index or feature critical word;
For each comment, it is calculated as follows about the opinion score of a kth index:
o k = Σ s ∈ OP k s c o r e ( s ) | OP k |
Wherein, s represents the adjective of statement suggestion; OP krepresent the set be made up of the suggestion adjective about a kth index; | OP k| represent the number of set element; Score (s) represents the suggestion polarity of adjective s, namely+1 ,-1 or 0; By such mode, non-structured comment text can be converted into a vectorial O be made up of opinion score u,i=[o u, i, 1..., o u, i, k]; The span of the opinion score extracted is [-1,1], the scope of the multi objective scoring that user directly provides is then for [1,5], in order to make both, there is identical span, adopt the mode of equidistant conversion to be converted to by opinion score within interval [1,5], concrete conversion formula is as follows:
o after=o before×2+3
Wherein, o beforeand o afterrepresenting the opinion score data before and after conversion respectively, by adopting above formula, can guarantee that multi objective score data has identical span with opinion score data, both can be directly used in respectively the process of carrying out recommending thus and compare their effect;
3) user and commodity similarity matrix build
In commending system, use U={u 1..., u nrepresent the set of user, I={i 1..., i mrepresent the set of commodity, wherein n and m represents the sum of user and commodity respectively; User can be expressed as one for the evaluation of a certain commodity and be marked the scoring vector r=[r formed by comprehensive grading and multi objective 0, r 1, r k], wherein r 0represent comprehensive grading, r 1, r krepresent the scoring about k index, this scoring vector also can be made up of comprehensive grading and opinion score, i.e. r=[r 0, o 1, o k], wherein r 0, o 1, o krepresent from the comment on commodity that user writes, excavate the opinion score obtained; In experimentation, can directly by r=[r 0, r 1, r k] replace with r=[r 0, o 1, o k] and among the process of cluster and recommendation; Target is simultaneously to user { u 1..., u nand commodity { i 1..., i mcluster is c bunch; Cluster result should be represented as a partitioned matrix M ∈ [0,1] (n+m) × c, wherein each element M i,jrepresent that corresponding element object i belongs to the probability of bunch j, therefore, M when element object i belongs to bunch j time i,j> 0, otherwise M i,j=0; Due to M i,jsize directly reacted the possibility that this element object i belongs to bunch j, so every a line sum of partitioned matrix M requires to be 1; In addition, if limit that each element object can add bunch maximum number, such as l bunch, i.e. 1≤l≤c, so at most only may obtain l nonzero value in the every a line in M; Above-mentioned partitioned matrix can be rewritten as:
M = P Q
Wherein, P ∈ [0,1] n × cfor the partitioned matrix about user, Q ∈ [0,1] m × cfor the partitioned matrix about commodity;
For user, similarity matrix SU ∈ [-1,1] n × nbuild in the following ways:
SU x , y = Σ i ∈ CI x , y ( r x , i - r x ‾ ) · ( r y , i - r y ‾ ) | r x , i | · | r y , i | / | CI x , y | i f | CI x , y | ≠ 0 0 o t h e r w i s e
Wherein, r x,iand r y,irepresent user u respectively xand u yto the scoring vector of commodity i, with represent user u respectively xand u yaverage score vector, CI x,yrepresent user u xand u ythe common commodity set commented on, | CI x,y| represent and belong to CI x,ythe number of commodity; About the similarity matrix SI ∈ [-1,1] of commodity m × mcan build in the following ways:
SI x , y = Σ u ∈ CU x , y ( r u , x - r x ‾ ) · ( r u , y - r y ‾ ) | r u , x | · | r u , y | / | CU x , y | i f | CU x , y | ≠ 0 0 o t h e r w i s e
Wherein, r u,xand r u,yrepresent that user u is to commodity i respectively xand i yscoring vector, with represent user u respectively xand u yaverage score vector, CU x,yrepresent once to commodity i xand i ycarried out user's set of marking, | CU x,y| represent and belong to CU x,ythe number of user;
4) bidirectional clustering algorithm is used to obtain cluster matrix
In order to bidirectional clustering can be carried out to user and commodity, propose by minimizing following objective function by the user that is closely related or commodity association;
ϵ ( P , Q ) = Σ i = 1 n Σ j = 1 n ( | | p i D i i r o w - p j D j j c o l | | 2 · SU i , j ) + Σ i = 1 m Σ j = 1 m ( | | q i E i i r o w - q j E j j c o l | | 2 · SI i , j )
Wherein, p ii-th row of partitioned matrix P, with be about user to angle matrix, account form is: with q ii-th row of partitioned matrix Q, with be about commodity to angle matrix, account form is: with
Changed by algebraically, above formula can be converted into:
Wherein:
X = ( D r o w ) - 1 2 S U ( D c o l ) - 1 2 , Y = ( E r o w ) - 1 2 S I ( E c o l ) - 1 2 , K = I n - X 0 0 I m - Y
I n∈ R n × nrepresentation unit matrix; Solve following optimization problem:
min M T r ( M T K M )
Meet: M ∈ [0,1] (n+m) × c, P1 c=1 n+m, | p i|=l, i=1 ..., (n+m); Parameter c be cluster bunch number and l be each user or commodity can belong to bunch maximum number, i.e. 1≤l≤c; In addition, symbol || represent the number of a vectorial nonzero element;
Propose a two stage strategy to solve above formula, specifically describe as follows:
4.1) search for a shared lower dimensional space to represent all users and merchandise news, optimum reservation user and the t of merchandise news tie up matrix Z ' can by obtaining following problem solving:
min Z T r ( Z T K Z )
Meet: Z ∈ [0,1] (n+m) × t, Z tz=I t; Wherein, I t∈ R t × trepresentation unit matrix and Z tz=I t; Here, Z tz=I tbe mainly used in avoiding matrix Z arbitrary extension; Because k is a positive semidefinite matrix, so can by solving acquisition to eigenvalue problem KZ=λ Z, namely Z '=[z for optimum solution Z ' 1..., z t], wherein z 1..., z tit is the minimal characteristic vector retained according to the eigenwert of matrix k;
4.2) cluster is carried out to user and commodity simultaneously, i.e. bidirectional clustering;
Consider that user and commodity all can appear in one or more bunch simultaneously, the matrix Z ' proposed remaining user and merchandise news to the full extent above performs FuzzyC-Means clustering algorithm; Perform the process of FuzzyC-Means clustering algorithm, namely following objective function carried out to the process of iteration optimization:
min J ( M , V ) = min Σ i = 1 m + n Σ j = 1 c ( M i , j ) θ d ( e i , v j ) 2
Wherein M i,jrepresent element e ibelong to the probability of bunch j, v jrepresent the center of bunch j; Function d (﹒) represent Euclidean distance function, θ represents the parameter of the fog-level for controlling cluster result; In iterative process each time, algorithm upgrades the element of matrix M and V according to following formula:
M i , j = ( d ( e i , v j ) ) 2 / ( 1 - θ ) / [ Σ l = 1 c ( d ( e i , v l ) ) 2 / ( 1 - θ ) ]
v j = [ Σ i = 1 m + n M i , j θ · e i ] / [ Σ i = 1 m + n M i , j θ ]
Wherein j=1 ... if the gap of c objective function minJ (M, V) in two continuous print iterative process is not less than threshold epsilon, and algorithm will be terminated; After solving matrix M, in the every a line of matrix, maximum and summation the exceedes predetermined threshold value element of l will be retained, and be normalized, and therefore, in matrix M, the element sum of every a line is 1;
5) recommend in single bunch
5.1) aggregate function algorithm is used to obtain recommending in single bunch
Recommend method based on aggregate function is generally supposed: user marks to the comprehensive grading of commodity and multi objective and is closely related, and namely comprehensive grading is often marked by multi objective and determined; Thus, the recommend method based on aggregate function proposes to utilize multi objective scoring structure about the aggregate function of comprehensive grading, is predicted by the scoring of the aggregate function built to user; Propose to use the homing method based on principal component analysis build the aggregate function about comprehensive grading and use it for calculated recommendation result; Principal component analysis is a kind of method for carrying out Dimension Reduction Analysis to data, and its essential core thought is that the main composition by extracting minority from data sample represents all data samples; How to select main composition mainly to carry out according to the eigenwert variance of sample data, namely each main composition selected is all that in data sample, eigenwert variance is maximum; According to being incoherent mutually between the major component that eigenwert variance is chosen, the co-linear relationship impact existed between multi objective scoring can be got rid of thus; On this basis, the main composition that utilization is chosen builds the aggregate function about dependent variable;
Return after building the aggregate function about comprehensive grading to user using main composition, targeted customer to be predicted by following formula may marking of Candidate Recommendation commodity:
r u , i = Σ c = 1 k w c ( Σ u ′ ∈ C U r u ′ , c / | C U | )
Wherein, r u,irepresent that targeted customer u marks to the prediction of Candidate Recommendation commodity i, w crepresent the coefficient about index c in aggregate function, r u ', crepresent that user u ' is to the scoring of commodity i on index c, cu represents the user's set being positioned at same cluster and carrying out commodity i marking, | CU| represents the number of the user being arranged in set cu;
5.2) collaborative filtering function algorithm is used to obtain recommending in single bunch
The core concept of collaborative filtering based on multi objective scoring is: even if user by cluster to same have identical or similar index preference bunch in, they neither have on all four index preference; In other words, when prediction recommendation results, should be treated by differentiation with the different user in cluster; Thus, propose to use the collaborative filtering based on multi objective scoring to produce recommendation results, specific formula for calculation is:
r u , i = r u ‾ + Σ u ′ ∈ C U s i m ( u , u ′ ) × ( r u ′ , i - r u ′ ‾ ) Σ u ′ ∈ C U s i m ( u , u ′ )
Wherein, represent the average of the comprehensive grading of user u, r u ', irepresent that user u ' is to the comprehensive grading of commodity i, sim (u, u ') represent utilize multi objective score calculation about the Interest Similarity between user u and u ';
Take the computing method based on Euclidean distance, specifically describe as follows:
User u xand u ybe r to two of commodity i scoring vectors x,i=[r x, 1, r x,k] and r y,i=[r y, 1, r y,k], both Euclidean distances are calculated as follows:
d ( r x , i , r y , i ) = Σ c = 0 k | r x , c - r y , c | 2
User u xand u yoverall distance be calculated as the average of the Euclidean distance of the scoring vector of the commodity that they commented on jointly, that is:
d ( u x , u y ) = Σ i ∈ C I d ( r x , i , r y , i ) | C I |
If the Interest Similarity of two users is higher, then their overall distance should be less; In other words, there is reverse-power between the two; Thus, user u xand u yinterest Similarity be calculated as follows:
s i m ( u x , u y ) = 1 1 + d ( u x , u y ) ;
6) comprehensive function algorithm is used to obtain final recommendation results
The bidirectional clustering algorithm based on multi objective scoring adopted above, after cluster, same user or commodity allow to appear in multiple bunches simultaneously, the proposed algorithm proposed only utilizes the score data prediction recommendation results existed in bunch at every turn, the recommendation results that one or more derives from different bunches will be obtained like this, therefore, need to find suitable strategy these recommendation results to be integrated and return to targeted customer as final recommendation results; Because the clustering algorithm proposed is based on following two hypothesis: if 1. two users give identical or similar comprehensive grading and multi objective scoring to same or multiple commodity, these two users very likely belong to one or more bunch simultaneously; If 2. two commodity are given identical or similar comprehensive grading by one or more user and multi objective is marked, these two commodity very likely belong to one or more bunch simultaneously; Therefore, can by the element belonged to about user and commodity in the partitioned matrix M of distribution after cluster, i.e. M i,j, regard as the similarity degree of other element in this element object i and bunch j, namely user with bunch in the similarity of index preference of other user, or, commodity with bunch in other commodity by the similarity of user comment; When comprehensive multiple recommendation results, need about the similarity indicated value of user and commodity and M i,jin considering, following comprehensive strategic is proposed thus:
R u , i = Σ l = 1 h Pr e ( u , i , l ) · M u , l · M i , l i f u a n d i b e l o n g t o o n e 0 o t h e r w i s e
Wherein, R u,irepresent the final prediction scoring of user u to commodity i, Pre (u, i, l) represents that user u is to the recommendation results of commodity i in bunch l; Use above-mentioned comprehensive strategic, only have when user u and commodity i belongs to one or more bunch time, i.e. M simultaneously x,l≠ 0, M y,l≠ 0, l=1 ..., h, h≤c; Proposed algorithm could produce and predict the outcome; In addition, parameter h represent parameter recommend be having of considering maximum be subordinate to probability bunch number, just only have front h to have maximum be subordinate to probability bunch information can be considered, into interior generations recommendation, to do so mainly in order to filtered noise information.
In step 1) in, weigh the χ of the word frequency dependence between a feature critical word w and index A 2statistical indicator is calculated as follows:
χ 2 ( ω , A ) = C × ( C 1 C 4 - C 2 C 3 ) 2 ( C 1 + C 3 ) × ( C 4 + C 2 ) × ( C 1 + C 2 ) × ( C 4 + C 3 )
Wherein, c represents the number of times that all feature critical words occur, C 1representation feature key word w appears at the number of times belonged in the sentence of index A, C 2representation feature key word w appears at the number of times do not belonged in the sentence of index A, C 3represent and belong to index A's but do not comprise the number of the sentence of feature critical word w, C 4represent the number not comprising again the sentence of feature critical word w neither belonging to index A.
Compared with prior art, tool has the following advantages and beneficial effect in the present invention:
The present invention propose to use bidirectional clustering algorithm according to user to the evaluation information of commodity simultaneously by user and commercial articles clustering in different bunches, after cluster, the commodity that the user being positioned at same cluster should belong to this bunch have identical index preference.It should be noted that and adopt bidirectional clustering algorithm in this paper, user or commodity can belong to one or more different bunches simultaneously.On the basis of bidirectional clustering, the proposed algorithm that proposition two kinds is different further herein: 1) based on the aggregate function algorithm of principal component regression, consider that tradition is not suitable for process cold start-up user based on least square regression aggregate function algorithm, and the co-linear relationship can not eliminated between the scoring of multiple index, we propose to utilize the method for principal component regression to be that each user builds aggregate function and calculated recommendation result; 2) collaborative filtering, similar with traditional multi objective collaborative filtering, the various dimensions score data that we adopt user to provide calculates the similarity between user, then the comprehensive grading provided in conjunction with user produces to be recommended, unlike, the proposed algorithm that we propose, when calculating user's similarity or considering comprehensive grading, only can consider the information being positioned at same cluster, namely there is user profile that is identical or similar index preference, thus can improve recommendation quality.User has different index preferences to different commodity, by using bidirectional clustering algorithm in this paper, successfully can distinguish the different index preference of user and corresponding commodity, apply two proposed algorithms in this paper on this basis and all can improve recommendation results further, wherein multi objective collaborative filtering is in the proposed algorithm of recommending all to be better than in accuracy rate and coverage rate two based on principal component regression aggregate function.Compare the aggregate function algorithm of tradition based on least square regression, the aggregate function algorithm based on principal component regression in this paper can process the linear effect existed between the scoring of different index, makes describing more accurately the index preference of user.Multi objective scoring is all conducive to improving with opinion score two kinds of information recommends quality, and wherein multi objective scoring is conducive to improving recommendation coverage rate, and opinion score is then conducive to improving recommends accuracy rate.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of proposed algorithm of the present invention.
Embodiment
Below in conjunction with specific embodiment, the invention will be further described.
As shown in Figure 1, the proposed algorithm based on multi objective scoring described in the present embodiment, comprises the following steps:
1) identification of index keyword
User, when writing comment on commodity, except can directly comment on except the index of commodity, also can comment on the correlated characteristic under this index.For example, hotel user is when commenting on this index of service, and user also may can mention and serve relevant further feature, such as " allthestaffwerebrilliant ", and wherein " staff " serves relevant feature critical word to index just.Obviously, no matter refer to sample body, or be the feature critical word relevant to index, user should be used for their suggestion the opinion score calculating this index, thus, we adopt a kind of key word of algorithm to index based on self-propagation pattern to identify, this algorithm is by excavating the index key word corresponding to the word frequency relation recognition between candidate key, and it is as follows that it mainly performs flow process:
1.1) each comment of data centralization is divided into sentence { x 1, x 2..., and construct one, by minority, there is the list that characteristics key word forms, as hotel recommend in the list of geographic position index can be chosen for { location, area, street, bus}.
1.2) according to original Keyword List, by each sentence mark in comment corpus to the index with it with maximum word frequency registration.
1.3) χ is used 2statistical indicator weighs the word frequency relation between each index and key word, and is added in the Keyword List of this index by the key word that front t has the highest word frequency dependence.
1.4) repeat said process until algorithm meets end condition, namely the Keyword List of index remain unchanged or algorithm cycle index arrive threshold value.
Weigh the χ of the word frequency dependence between a feature critical word w and index A 2statistical indicator is calculated as follows:
χ 2 ( ω , A ) = C × ( C 1 C 4 - C 2 C 3 ) 2 ( C 1 + C 3 ) × ( C 4 + C 2 ) × ( C 1 + C 2 ) × ( C 4 + C 3 )
Wherein, c represents the number of times that all feature critical words occur, C 1representation feature key word w appears at the number of times belonged in the sentence of index A, C 2representation feature key word w appears at the number of times do not belonged in the sentence of index A, C 3represent and belong to index A's but do not comprise the number of the sentence of feature critical word w, C 4represent the number not comprising again the sentence of feature critical word w neither belonging to index A.
It should be noted that we think that feature critical word is primarily of noun or noun phrase composition, so we are when index of performance recognizer, only consider to use noun or noun phrase to form candidate feature vocabulary v.For example, when processing comment sentence " Staffwereexcellent; harborviewroomwasquiteagoodsizebyHongKongstandardsandeve rythingransmoothly ", we can by noun " staff " and noun phrase " harborviewroom " for building candidate feature vocabulary v.In addition, we are when determining the number of index, both can according to the priori (index definition as in the multi objective scoring that user directly provides) about research object, also can determine the number of index and definition according to actual observation etc.In actual experiment, the method that we take both to combine, the index first choosing definition in multi objective scoring builds index classification, then supplements according to the index classification of comment to preliminary definition of reality.
2) opinion score extracts
After recognition value index and relevant feature critical word thereof, we carry out grammatical analysis to the statement in comment and extract the suggestion of user to index or feature critical word.After carrying out grammatical analysis to comment sentence, we can obtain the grammer dependence of sentence between different words.The same with the research work that great majority carry out comment on commodity excavation, we think that adjective is the main carriers of user's expression of opinion.Meanwhile, most consumers' opinions belongs to the wherein a kind of of following two kinds of grammatical representation forms: Adjectivalmodifiers, such as " agreatlocation ", and key word " location " modified in adjective " great "; Such as " allthestaffwerebrilliant ", key word " staff " is the main body that adjective " brilliant " is modified.By the specific syntactic pattern of grammatical analysis identification, we can extract the suggestion that user states product features key word.In order to further the suggestion that user states is converted into score data, we use subjective clue dictionary (SubjectiveClueLexicon) to determine the adjectival feeling polarities of these expression on feature critical word, i.e. front evaluation, unfavorable ratings or neutral evaluation.By carrying out polarity orientation to adjective, adjective such as " good ", " excellent ", " brilliant ", " wonderful " etc. with front polarity will be assigned+1, adjective such as " bad ", " awful ", " disappointed " etc. with negative polarity will be assigned-1, in addition, be judged as neutral adjective such as " normal ", " average " etc. and will be assigned 0.It should be noted that in the process of carrying out opinion score extraction, if detect in sentence to there is negative word, as " not ", " no ", " never " etc., corresponding suggestion polarity will be reversed.
For each comment, it is calculated as follows about the opinion score of a kth index:
o k = Σ s ∈ OP k s c o r e ( s ) | OP k |
Wherein, s represents the adjective of statement suggestion; OP krepresent the set be made up of the suggestion adjective about a kth index; | OP k| represent the number of set element; Score (s) represents the suggestion polarity of adjective s, namely+1 ,-1 or 0; By such mode, non-structured comment text can be converted into a vectorial O be made up of opinion score u,i=[o u, i, 1..., o u, i, k]; It should be noted that, the span of the opinion score extracted is [-1,1], the scope of the multi objective scoring that user directly provides is then for [1,5], in order to make both have identical span, we adopt the mode of equidistant conversion that opinion score is converted to interval [1,5], within, concrete conversion formula is as follows:
o after=o before×2+3
Wherein, o beforeand o afterrepresent the opinion score data before and after conversion respectively, by adopting above formula, we can guarantee that multi objective score data has identical span with opinion score data, both can be directly used in respectively the process of carrying out recommending thus and compare their effect.
3) user and commodity similarity matrix build
In commending system, we use U={u 1..., u nrepresent the set of user, I={i 1..., i mrepresent the set of commodity, wherein n and m represents the sum of user and commodity respectively; User can be expressed as one for the evaluation of a certain commodity and be marked the scoring vector r=[r formed by comprehensive grading and multi objective 0, r 1, r k], wherein r 0represent comprehensive grading, r 1, r krepresent the scoring about k index, this scoring vector also can be made up of comprehensive grading and opinion score, i.e. r=[r 0, o 1, o k], wherein r 0, o 1, o krepresent from the comment on commodity that user writes, excavate the opinion score obtained; In experimentation, can directly by r=[r 0, r 1, r k] replace with r=[r 0, o 1, o k] and among the process of cluster and recommendation; Target is simultaneously to user { u 1..., u nand commodity { i 1..., i mcluster is c bunch; Cluster result should be represented as a partitioned matrix M ∈ [0,1] (n+m) × c, wherein each element M i,jrepresent that corresponding element object i belongs to the probability of bunch j, therefore, M when element object i belongs to bunch j time i,j> 0, otherwise M i,j=0; Due to M i,jsize directly reacted the possibility that this element object i belongs to bunch j, so every a line sum of partitioned matrix M requires to be 1; In addition, if limit that each element object can add bunch maximum number, such as l bunch, i.e. 1≤l≤c, so at most only may obtain l nonzero value in the every a line in M; Above-mentioned partitioned matrix can be rewritten as:
M = P Q
Wherein, P ∈ [0,1] n × cfor the partitioned matrix about user, Q ∈ [0,1] m × cfor the partitioned matrix about commodity.
In order to carry out mathematical notation to solution in this paper, first we define a similarity matrix respectively for user and commodity.For user, similarity matrix SU ∈ [-1,1] n × nbuild in the following ways:
SU x , y = Σ i ∈ CI x , y ( r x , i - r x ‾ ) · ( r y , i - r y ‾ ) | r x , i | · | r y , i | / | CI x , y | i f | CI x , y | ≠ 0 0 o t h e r w i s e
Wherein, r x,iand r y,irepresent user u respectively xand u yto the scoring vector of commodity i, with represent user u respectively xand u yaverage score vector, CI x,yrepresent user u xand u ythe common commodity set commented on, | CI x,y| represent and belong to CI x,ythe number of commodity; About the similarity matrix SI ∈ [-1,1] of commodity m × mcan build in the following ways:
SI x , y = Σ u ∈ CU x , y ( r u , x - r x ‾ ) · ( r u , y - r y ‾ ) | r u , x | · | r u , y | / | CU x , y | i f | CU x , y | ≠ 0 0 o t h e r w i s e
Wherein, r u,xand r u,yrepresent that user u is to commodity i respectively xand i yscoring vector, with represent user u respectively xand u yaverage score vector, CU x,yrepresent once to commodity i xand i ycarried out user's set of marking, | CU x,y| represent and belong to CU x,ythe number of user.
4) bidirectional clustering algorithm is used to obtain cluster matrix
In order to carry out bidirectional clustering to user and commodity, we propose by minimizing following objective function by the user that is closely related or commodity association;
ϵ ( P , Q ) = Σ i = 1 n Σ j = 1 n ( | | p i D i i r o w - p j D j j c o l | | 2 · SU i , j ) + Σ i = 1 m Σ j = 1 m ( | | q i E i i r o w - q j E j j c o l | | 2 · SI i , j )
Wherein, p ii-th row of partitioned matrix P, with be about user to angle matrix, account form is: with q ii-th row of partitioned matrix Q, with be about commodity to angle matrix, account form is: with
Changed by algebraically, above formula can be converted into:
Wherein:
X = ( D r o w ) - 1 2 S U ( D c o l ) - 1 2 , Y = ( E r o w ) - 1 2 S I ( E c o l ) - 1 2 , K = I n - X 0 0 I m - Y
I n∈ R n × nrepresentation unit matrix.Thus, we study a question to be converted into and solve following optimization problem:
min M T r ( M T K M )
Meet: M ∈ [0,1] (n+m) × c, P1 c=1 n+m, | p i|=l, i=1 ..., (n+m); Parameter c be cluster bunch number and l be each user or commodity can belong to bunch maximum number, i.e. 1≤l≤c; In addition, symbol || represent the number of a vectorial nonzero element.
We propose a two stage strategy and solve above formula, specifically describe as follows:
4.1) search for a shared lower dimensional space to represent all users and merchandise news, optimum reservation user and the t of merchandise news tie up matrix Z ' can by obtaining following problem solving:
min Z T r ( Z T K Z )
Meet: Z ∈ [0,1] (n+m) × t, Z tz=I t; Wherein, I t∈ R t × trepresentation unit matrix and Z tz=I t; Here, Z tz=I tbe mainly used in avoiding matrix Z arbitrary extension; Because k is a positive semidefinite matrix, so can by solving acquisition to eigenvalue problem KZ=λ Z, namely Z '=[z for optimum solution Z ' 1..., z t], wherein z 1..., z tit is the minimal characteristic vector retained according to the eigenwert of matrix k.
4.2) cluster is carried out to user and commodity simultaneously, i.e. bidirectional clustering;
Consider that user and commodity all can appear in one or more bunch simultaneously, we propose above to perform FuzzyC-Means clustering algorithm to the matrix Z ' remaining user and merchandise news to the full extent; Perform the process of FuzzyC-Means clustering algorithm, namely following objective function carried out to the process of iteration optimization (namely minimizing):
min J ( M , V ) = min Σ i = 1 m + n Σ j = 1 c ( M i , j ) θ d ( e i , v j ) 2
Wherein M i,jrepresent element e ibelong to the probability of bunch j, v jrepresent the center of bunch j; Function d (﹒) represent Euclidean distance function, θ represents the parameter of the fog-level for controlling cluster result; In iterative process each time, algorithm upgrades the element of matrix M and V according to following formula:
M i , j = ( d ( e i , v j ) ) 2 / ( 1 - θ ) / [ Σ l = 1 c ( d ( e i , v l ) ) 2 / ( 1 - θ ) ]
v j = [ Σ i = 1 m + n M i , j θ · e i ] / [ Σ i = 1 m + n M i , j θ ]
Wherein j=1 .., if the gap of c objective function minJ (M, V) in two continuous print iterative process is not less than threshold epsilon, algorithm will be terminated; After solving matrix M, in the every a line of matrix, maximum and summation exceedes predetermined threshold value (such as 0.9) element of l will be retained, and be normalized, and therefore, in matrix M, the element sum of every a line is 1.
In short, first the proposed clustering algorithm based on multi objective scoring builds similarity measurements moment matrix to user and commodity respectively according to the feature of multi objective scoring, then the clustering problem of research is converted into an optimization problem, on this basis in conjunction with the thought of FuzzyC-Means clustering algorithm, success carries out bidirectional clustering to user and commodity, by user and the commodity probability distribution in different bunches, the index preference used is evaluated to user portray in commodity process.
5) recommend in single bunch
5.1) aggregate function algorithm is used to obtain recommending in single bunch
Recommend method based on aggregate function is generally supposed: user marks to the comprehensive grading of commodity and multi objective and is closely related, and namely comprehensive grading is often marked by multi objective and determined.Thus, the recommend method based on aggregate function proposes to utilize multi objective scoring structure about the aggregate function of comprehensive grading, is predicted by the scoring of the aggregate function built to user.We propose to use the homing method based on principal component analysis build the aggregate function about comprehensive grading and use it for calculated recommendation result.Principal component analysis is a kind of method for carrying out Dimension Reduction Analysis to data, and its essential core thought is that the main composition by extracting minority from data sample represents all data samples.How to select main composition mainly to carry out according to the eigenwert variance of sample data, namely each main composition selected is all that in data sample, eigenwert variance is maximum.It is specifically intended that be incoherent mutually between the major component chosen according to eigenwert variance, the co-linear relationship impact existed between multi objective scoring can be got rid of thus.On this basis, the main composition that utilization is chosen builds the aggregate function about dependent variable.
Return after building the aggregate function about comprehensive grading to user using main composition, targeted customer to be predicted by following formula may marking of Candidate Recommendation commodity:
r u , i = Σ c = 1 k w c ( Σ u ′ ∈ C U r u ′ , c / | C U | )
Wherein, r u,irepresent that targeted customer u marks to the prediction of Candidate Recommendation commodity i, w crepresent the coefficient about index c in aggregate function, r u ', crepresent that user u ' is to the scoring of commodity i on index c, cu represents the user's set being positioned at same cluster and carrying out commodity i marking, | CU| represents the number of the user being arranged in set cu.
5.2) collaborative filtering function algorithm is used to obtain recommending in single bunch
The core concept of collaborative filtering based on multi objective scoring is: even if user by cluster to same have identical or similar index preference bunch in, they neither have on all four index preference.In other words, when prediction recommendation results, should be treated by differentiation with the different user in cluster.Thus, we propose to use the collaborative filtering based on multi objective scoring to produce recommendation results, and specific formula for calculation is:
r u , i = r u ‾ + Σ u ′ ∈ C U s i m ( u , u ′ ) × ( r u ′ , i - r u ′ ‾ ) Σ u ′ ∈ C U s i m ( u , u ′ )
Wherein, represent the average of the comprehensive grading of user u, r u ', irepresent that user u ' is to the comprehensive grading of commodity i, sim (u, u ') represent utilize multi objective score calculation about the Interest Similarity between user u and u '.
We take the computing method based on Euclidean distance, specifically describe as follows:
User u xand u ybe r to two of commodity i scoring vectors x,i=[r x, 1, r x,k] and r y,i=[r y, 1, r y,k], both Euclidean distances are calculated as follows:
d ( r x , i , r y , i ) = Σ c = 0 k | r x , c - r y , c | 2
User u xand u yoverall distance be calculated as the average of the Euclidean distance of the scoring vector of the commodity that they commented on jointly, that is:
d ( u x , u y ) = Σ i ∈ C I d ( r x , i , r y , i ) | C I |
If the Interest Similarity of two users is higher, then their overall distance should be less; In other words, there is reverse-power between the two; Thus, user u xand u yinterest Similarity be calculated as follows:
s i m ( u x , u y ) = 1 1 + d ( u x , u y ) .
6) comprehensive function algorithm is used to obtain final recommendation results
Above we adopt based on multi objective scoring bidirectional clustering algorithm, after cluster, same user or commodity allow to appear in multiple bunches simultaneously, the proposed algorithm proposed only utilizes the score data prediction recommendation results existed in bunch at every turn, the recommendation results that one or more derives from different bunches will be obtained like this, therefore, need to find suitable strategy these recommendation results to be integrated and return to targeted customer as final recommendation results; Because the clustering algorithm proposed is based on following two hypothesis: if 1. two users give identical or similar comprehensive grading and multi objective scoring to same or multiple commodity, these two users very likely belong to one or more bunch simultaneously; If 2. two commodity are given identical or similar comprehensive grading by one or more user and multi objective is marked, these two commodity very likely belong to one or more bunch simultaneously; Therefore, can by the element belonged to about user and commodity in the partitioned matrix M of distribution after cluster, i.e. M i,j, regard as the similarity degree of other element in this element object i and bunch j, namely user with bunch in the similarity of index preference of other user, or, commodity with bunch in other commodity by the similarity of user comment; When comprehensive multiple recommendation results, need about the similarity indicated value of user and commodity and M i,jin considering, following comprehensive strategic is proposed thus:
R u , i = Σ l = 1 h Pr e ( u , i , l ) · M u , l · M i , l i f u a n d i b e l o n g t o o n e 0 o t h e r w i s e
Wherein, R u,irepresent the final prediction scoring of user u to commodity i, Pre (u, i, l) represents that user u is to the recommendation results of commodity i in bunch l; Use above-mentioned comprehensive strategic, we it should be noted that, only have when user u and commodity i belongs to one or more bunch simultaneously time, i.e. M x,l≠ 0, M y,l≠ 0, l=1 ..., h, h≤c; Our proposed algorithm could produce and predict the outcome; In addition, parameter h represent parameter recommend be having of considering maximum be subordinate to probability bunch number, just only have front h to have maximum be subordinate to probability bunch information can be considered, into interior generations recommendation, to do so mainly in order to filter certain noise information.In actual experiment, we choose h=4.
A kind of multi objective scoring proposed algorithm of novelty is proposed: first according to user, cluster is carried out by user and commodity to the multi objective score information of commodity simultaneously herein, the user being positioned at same cluster after cluster to bunch commodity there is identical or similar index preference; On the basis of carrying out bidirectional clustering, the algorithm that proposition two kinds is different herein utilizes cluster result to produce and recommends: based on aggregate function algorithm and the multi objective collaborative filtering of principal component regression.Multi objective scoring is extracted in the comment on commodity also proposing herein in addition to utilize user to write, and among the process being applied to above-mentioned cluster and recommendation, the not same-action that the multi objective that comparing itself and user directly provides is marked.Result shows that the present invention has higher accuracy rate.
Compared to the collaborative filtering only utilizing comprehensive grading, utilize the proposed algorithm of multi objective scoring or opinion score can provide higher-quality recommendation results for user, utilize multi objective to mark and can fully take into account the satisfaction to each index of user, and user is more prone to evaluate the index ' s quality that they pay attention to when writing comment on commodity, in other words, compared to the multi objective scoring that user provides, the opinion score excavated from comment more can reflect the index preference of user, thus can obtain higher-quality recommendation results, be worthy to be popularized.
The above embodiment is only the preferred embodiment of the present invention, not limits practical range of the present invention with this, therefore the change that all shapes according to the present invention, principle are done, all should be encompassed in protection scope of the present invention.

Claims (2)

1., based on a proposed algorithm for multi objective scoring, it is characterized in that, comprise the following steps:
1) identification of index keyword
1.1) each comment of data centralization is divided into sentence { x 1, x 2..., and construct one by the list there is characteristics key word forming;
1.2) according to original Keyword List, by each sentence mark in comment corpus to the index with it with maximum word frequency registration;
1.3) χ is used 2statistical indicator weighs the word frequency relation between each index and key word, and adds in the Keyword List of this index by the key word that front t has the highest word frequency dependence;
1.4) repeat said process until algorithm meets end condition, namely the Keyword List of index remain unchanged or algorithm cycle index arrive threshold value;
2) opinion score extracts
After recognition value index and relevant feature critical word thereof, grammatical analysis is carried out to the statement in comment and extracts the suggestion of user to index or feature critical word;
For each comment, it is calculated as follows about the opinion score of a kth index:
o k = Σ s ∈ OP k s c o r e ( s ) | OP k |
Wherein, s represents the adjective of statement suggestion; OP krepresent the set be made up of the suggestion adjective about a kth index; | OP k| represent the number of set element; Score (s) represents the suggestion polarity of adjective s, namely+1 ,-1 or 0; By such mode, non-structured comment text can be converted into a vectorial O be made up of opinion score u,i=[o u, i, 1..., o u, i, k]; The span of the opinion score extracted is [-1,1], the scope of the multi objective scoring that user directly provides is then for [1,5], in order to make both, there is identical span, adopt the mode of equidistant conversion to be converted to by opinion score within interval [1,5], concrete conversion formula is as follows:
o after=o before×2+3
Wherein, o beforeand o afterrepresenting the opinion score data before and after conversion respectively, by adopting above formula, can guarantee that multi objective score data has identical span with opinion score data, both can be directly used in respectively the process of carrying out recommending thus and compare their effect;
3) user and commodity similarity matrix build
In commending system, use U={u 1..., u nrepresent the set of user, I={i 1..., i mrepresent the set of commodity, wherein n and m represents the sum of user and commodity respectively; User can be expressed as one for the evaluation of a certain commodity and be marked the scoring vector r=[r formed by comprehensive grading and multi objective 0, r 1, r k], wherein r 0represent comprehensive grading, r 1, r krepresent the scoring about k index, this scoring vector also can be made up of comprehensive grading and opinion score, i.e. r=[r 0, o 1, o k], wherein r 0, o 1, o krepresent from the comment on commodity that user writes, excavate the opinion score obtained; In experimentation, can directly by r=[r 0, r 1, r k] replace with r=[r 0, o 1, o k] and among the process of cluster and recommendation; Target is simultaneously to user { u 1..., u nand commodity { i 1..., i mcluster is c bunch; Cluster result should be represented as a partitioned matrix M ∈ [0,1] (n+m) × c, wherein each element M i,jrepresent that corresponding element object i belongs to the probability of bunch j, therefore, M when element object i belongs to bunch j time i,j> 0, otherwise M i,j=0; Due to M i,jsize directly reacted the possibility that this element object i belongs to bunch j, so every a line sum of partitioned matrix M requires to be 1; In addition, if limit that each element object can add bunch maximum number, such as l bunch, i.e. 1≤l≤c, so at most only may obtain l nonzero value in the every a line in M; Above-mentioned partitioned matrix can be rewritten as:
M = P Q
Wherein, P ∈ [0,1] n × cfor the partitioned matrix about user, Q ∈ [0,1] m × cfor the partitioned matrix about commodity;
For user, similarity matrix SU ∈ [-1,1] n × nbuild in the following ways:
SU x , y = Σ i ∈ CI x , y ( r x , i - r x ‾ ) · ( r y , i - r y ‾ ) | r x , i | · | r y , i | / | CI x , y | i f | CI x , y | ≠ 0 0 o t h e r w i s e
Wherein, r x,iand r y,irepresent user u respectively xand u yto the scoring vector of commodity i, with represent user u respectively xand u yaverage score vector, CI x,yrepresent user u xand u ythe common commodity set commented on, | CI x,y| represent and belong to CI x,ythe number of commodity; About the similarity matrix SI ∈ [-1,1] of commodity m × mcan build in the following ways:
SI x , y = Σ u ∈ CU x , y ( r u , x - r x ‾ ) · ( r u , y - r y ‾ ) | r u , x | · | r u , y | / | CU x , y | i f | CU x , y | ≠ 0 0 o t h e r w i s e
Wherein, r u,xand r u,yrepresent that user u is to commodity i respectively xand i yscoring vector, with represent user u respectively xand u yaverage score vector, CU x,yrepresent once to commodity i xand i ycarried out user's set of marking, | CU x,y| represent and belong to CU x,ythe number of user;
4) bidirectional clustering algorithm is used to obtain cluster matrix
In order to bidirectional clustering can be carried out to user and commodity, propose by minimizing following objective function by the user that is closely related or commodity association;
ϵ ( P , Q ) = Σ i = 1 n Σ j = 1 n ( | | p i D i i r o w - p j D j j c o l | | 2 · SU i , j ) + Σ i = 1 m Σ j = 1 m ( | | q i E i i r o w - q j E j j c o l | | 2 · SI i , j )
Wherein, p ii-th row of partitioned matrix P, with be about user to angle matrix, account form is: with q ii-th row of partitioned matrix Q, with be about commodity to angle matrix, account form is: with
Changed by algebraically, above formula can be converted into:
Wherein:
X = ( D r o w ) - 1 2 S U ( D c o l ) - 1 2 , Y = ( E r o w ) - 1 2 S I ( E c o l ) - 1 2 , K = I n - X 0 0 I m - Y
I n∈ R n × nrepresentation unit matrix; Solve following optimization problem:
min M T r ( M T K M )
Meet: M ∈ [0,1] (n+m) × c, P1 c=1 n+m, | p i|=l, i=1 ..., (n+m); Parameter c be cluster bunch number and l be each user or commodity can belong to bunch maximum number, i.e. 1≤l≤c; In addition, symbol || represent the number of a vectorial nonzero element;
Propose a two stage strategy to solve above formula, specifically describe as follows:
4.1) search for a shared lower dimensional space to represent all users and merchandise news, optimum reservation user and the t of merchandise news tie up matrix Z ' can by obtaining following problem solving:
min Z T r ( Z T K Z )
Meet: Z ∈ [0,1] (n+m) × t, Z tz=I t; Wherein, I t∈ R t × trepresentation unit matrix and Z tz=I t; Here, Z tz=I tbe mainly used in avoiding matrix Z arbitrary extension; Because k is a positive semidefinite matrix, so can by solving acquisition to eigenvalue problem KZ=λ Z, namely Z '=[z for optimum solution Z ' 1..., z t], wherein z 1..., z tit is the minimal characteristic vector retained according to the eigenwert of matrix k;
4.2) cluster is carried out to user and commodity simultaneously, i.e. bidirectional clustering;
Consider that user and commodity all can appear in one or more bunch simultaneously, the matrix Z ' proposed remaining user and merchandise news to the full extent above performs FuzzyC-Means clustering algorithm; Perform the process of FuzzyC-Means clustering algorithm, namely following objective function carried out to the process of iteration optimization:
min J ( M , V ) = min Σ i = 1 m + n Σ j = 1 c ( M i , j ) θ d ( e i , v j ) 2
Wherein M i,jrepresent element e ibelong to the probability of bunch j, v jrepresent the center of bunch j; Function d (﹒) represent Euclidean distance function, θ represents the parameter of the fog-level for controlling cluster result; In iterative process each time, algorithm upgrades the element of matrix M and V according to following formula:
M i , j = ( d ( e i , v j ) ) 2 / ( 1 - θ ) / [ Σ l = 1 c ( d ( e i , v l ) ) 2 / ( 1 - θ ) ]
v j = [ Σ i = 1 m + n M i , j θ · e i ] / [ Σ i = 1 m + n M i , j θ ]
Wherein i=1 .., (m+n), j=1 .., if the gap of c objective function minJ (M, V) in two continuous print iterative process is not less than threshold epsilon, algorithm will be terminated; After solving matrix M, in the every a line of matrix, maximum and summation the exceedes predetermined threshold value element of l will be retained, and be normalized, and therefore, in matrix M, the element sum of every a line is 1;
5) recommend in single bunch
5.1) aggregate function algorithm is used to obtain recommending in single bunch
Recommend method based on aggregate function is generally supposed: user marks to the comprehensive grading of commodity and multi objective and is closely related, and namely comprehensive grading is often marked by multi objective and determined; Thus, the recommend method based on aggregate function proposes to utilize multi objective scoring structure about the aggregate function of comprehensive grading, is predicted by the scoring of the aggregate function built to user; Propose to use the homing method based on principal component analysis build the aggregate function about comprehensive grading and use it for calculated recommendation result; Principal component analysis is a kind of method for carrying out Dimension Reduction Analysis to data, and its essential core thought is that the main composition by extracting minority from data sample represents all data samples; How to select main composition mainly to carry out according to the eigenwert variance of sample data, namely each main composition selected is all that in data sample, eigenwert variance is maximum; According to being incoherent mutually between the major component that eigenwert variance is chosen, the co-linear relationship impact existed between multi objective scoring can be got rid of thus; On this basis, the main composition that utilization is chosen builds the aggregate function about dependent variable;
Return after building the aggregate function about comprehensive grading to user using main composition, targeted customer to be predicted by following formula may marking of Candidate Recommendation commodity:
r u , i = Σ c = 1 k w c ( Σ u ′ ∈ C U r u ′ , c / | C U | )
Wherein, r u,irepresent that targeted customer u marks to the prediction of Candidate Recommendation commodity i, w crepresent the coefficient about index c in aggregate function, r u ', crepresent that user u ' is to the scoring of commodity i on index c, cu represents the user's set being positioned at same cluster and carrying out commodity i marking, | CU| represents the number of the user being arranged in set cu;
5.2) collaborative filtering function algorithm is used to obtain recommending in single bunch
The core concept of collaborative filtering based on multi objective scoring is: even if user by cluster to same have identical or similar index preference bunch in, they neither have on all four index preference; In other words, when prediction recommendation results, should be treated by differentiation with the different user in cluster; Thus, propose to use the collaborative filtering based on multi objective scoring to produce recommendation results, specific formula for calculation is:
r u , i = r u ‾ + Σ u ′ ∈ C U s i m ( u , u ′ ) × ( r u ′ , i - r u ′ ‾ ) Σ u ′ ∈ C U s i m ( u , u ′ )
Wherein, represent the average of the comprehensive grading of user u, r u ', irepresent that user u ' is to the comprehensive grading of commodity i, sim (u, u ') represent utilize multi objective score calculation about the Interest Similarity between user u and u ';
Take the computing method based on Euclidean distance, specifically describe as follows:
User u xand u ybe r to two of commodity i scoring vectors x,i=[r x, 1, r x,k] and r y,i=] r y, 1, r y,k], both Euclidean distances are calculated as follows:
d ( r x , i , r y , i ) = Σ c = 0 k | r x , c - r y , c | 2
User u xand u yoverall distance be calculated as the average of the Euclidean distance of the scoring vector of the commodity that they commented on jointly, that is:
d ( u x , u y ) = Σ i ∈ C I d ( r x , i , r y , i ) | C I |
If the Interest Similarity of two users is higher, then their overall distance should be less; In other words, there is reverse-power between the two; Thus, user u xand u yinterest Similarity be calculated as follows:
s i m ( u x , u y ) = 1 1 + d ( u x , u y ) ;
6) comprehensive function algorithm is used to obtain final recommendation results
The bidirectional clustering algorithm based on multi objective scoring adopted above, after cluster, same user or commodity allow to appear in multiple bunches simultaneously, the proposed algorithm proposed only utilizes the score data prediction recommendation results existed in bunch at every turn, the recommendation results that one or more derives from different bunches will be obtained like this, therefore, need to find suitable strategy these recommendation results to be integrated and return to targeted customer as final recommendation results; Because the clustering algorithm proposed is based on following two hypothesis: if 1. two users give identical or similar comprehensive grading and multi objective scoring to same or multiple commodity, these two users very likely belong to one or more bunch simultaneously; If 2. two commodity are given identical or similar comprehensive grading by one or more user and multi objective is marked, these two commodity very likely belong to one or more bunch simultaneously; Therefore, can by the element belonged to about user and commodity in the partitioned matrix M of distribution after cluster, i.e. M i,j, regard as the similarity degree of other element in this element object i and bunch j, namely user with bunch in the similarity of index preference of other user, or, commodity with bunch in other commodity by the similarity of user comment; When comprehensive multiple recommendation results, need about the similarity indicated value of user and commodity and M i,jin considering, following comprehensive strategic is proposed thus:
R u , i = Σ l = 1 h Pr e ( u , i , l ) · M u , l · M i , l if u and i belong to one 0 o t h e r w i s e
Wherein, R u,irepresent the final prediction scoring of user u to commodity i, Pre (u, i, l) represents that user u is to the recommendation results of commodity i in bunch l; Use above-mentioned comprehensive strategic, only have when user u and commodity i belongs to one or more bunch time, i.e. M simultaneously x,l≠ 0, M y,l≠ 0, l=1 ..., h, h≤c; Proposed algorithm could produce and predict the outcome; In addition, parameter h represent parameter recommend be having of considering maximum be subordinate to probability bunch number, just only have front h to have maximum be subordinate to probability bunch information can be considered, into interior generations recommendation, to do so mainly in order to filtered noise information.
2. a kind of proposed algorithm based on multi objective scoring according to claim 1, is characterized in that: in step 1) in, weigh the χ of the word frequency dependence between a feature critical word w and index A 2statistical indicator is calculated as follows:
χ 2 ( ω , A ) = C × ( C 1 C 4 - C 2 C 3 ) 2 ( C 1 + C 3 ) × ( C 4 + C 2 ) × ( C 1 + C 2 ) × ( C 4 + C 3 )
Wherein, c represents the number of times that all feature critical words occur, C 1representation feature key word w appears at the number of times belonged in the sentence of index A, C 2representation feature key word w appears at the number of times do not belonged in the sentence of index A, C 3represent and belong to index A's but do not comprise the number of the sentence of feature critical word w, C 4represent the number not comprising again the sentence of feature critical word w neither belonging to index A.
CN201510493550.6A 2015-08-12 2015-08-12 Recommendation algorithm based on multi-index grading Pending CN105095477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510493550.6A CN105095477A (en) 2015-08-12 2015-08-12 Recommendation algorithm based on multi-index grading

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510493550.6A CN105095477A (en) 2015-08-12 2015-08-12 Recommendation algorithm based on multi-index grading

Publications (1)

Publication Number Publication Date
CN105095477A true CN105095477A (en) 2015-11-25

Family

ID=54575913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510493550.6A Pending CN105095477A (en) 2015-08-12 2015-08-12 Recommendation algorithm based on multi-index grading

Country Status (1)

Country Link
CN (1) CN105095477A (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105824878A (en) * 2016-03-08 2016-08-03 上海大学 Product recommending method based on support vector machine regression model
CN106503096A (en) * 2016-10-14 2017-03-15 上海斐讯数据通信技术有限公司 Social networkies based on distributed noise control sound interference recommend method and system
WO2017129033A1 (en) * 2016-01-29 2017-08-03 阿里巴巴集团控股有限公司 Question recommendation method and device
CN107123032A (en) * 2017-05-02 2017-09-01 北京邮电大学 A kind of item recommendation method and device
CN107169830A (en) * 2017-05-15 2017-09-15 南京大学 A kind of personalized recommendation method based on cluster PU matrix decompositions
CN107220831A (en) * 2017-04-06 2017-09-29 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of user group's division methods and system based on anti-fake traceability system
CN107274247A (en) * 2017-05-03 2017-10-20 浙江工商大学 Wisdom based on cloud computing perceives recommendation method
CN107818165A (en) * 2017-10-31 2018-03-20 平安科技(深圳)有限公司 Marketing client screening technique, electronic installation and storage medium based on tag library
CN107862070A (en) * 2017-11-22 2018-03-30 华南理工大学 Online class based on text cluster discusses the instant group technology of short text and system
CN107944911A (en) * 2017-11-18 2018-04-20 电子科技大学 A kind of recommendation method of the commending system based on text analyzing
CN108665301A (en) * 2018-03-20 2018-10-16 鹏璨文化创意(上海)股份有限公司 A kind of exhibitions interaction platform towards multipair elephant
CN109190023A (en) * 2018-08-15 2019-01-11 深圳信息职业技术学院 The method, apparatus and terminal device of Collaborative Recommendation
CN109408702A (en) * 2018-08-29 2019-03-01 昆明理工大学 A kind of mixed recommendation method based on sparse edge noise reduction autocoding
CN109408728A (en) * 2018-11-30 2019-03-01 安徽大学 A kind of difference secret protection recommended method based on covering algorithm
CN109559169A (en) * 2018-11-26 2019-04-02 上海财经大学 A kind of sharp user knowledge method for distinguishing based on online user's scoring
CN109640128A (en) * 2018-12-04 2019-04-16 南昌航空大学 A kind of TV user watching behavior feature extracting method and system
CN109726233A (en) * 2018-12-28 2019-05-07 浙江省公众信息产业有限公司 For portraying the method, computer system and readable medium of user image
CN109933716A (en) * 2019-01-15 2019-06-25 深圳心跳智能科技有限公司 A kind of personalized hotel's intelligent recommendation algorithm based on customer action Habit Preference
CN110110139A (en) * 2019-04-19 2019-08-09 北京奇艺世纪科技有限公司 The method, apparatus and electronic equipment that a kind of pair of recommendation results explain
CN110110230A (en) * 2019-04-26 2019-08-09 华南理工大学 A kind of recommended method to be scored based on user with comment
CN110992215A (en) * 2019-12-10 2020-04-10 浙江力石科技股份有限公司 Semantic analysis-based travel service recommendation system, database and recommendation method
CN111222332A (en) * 2020-01-06 2020-06-02 华南理工大学 Commodity recommendation method combining attention network and user emotion
WO2020133398A1 (en) * 2018-12-29 2020-07-02 深圳市欢太科技有限公司 Application recommendation method and apparatus, server and computer-readable storage medium
CN111444438A (en) * 2020-03-24 2020-07-24 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining recall permission rate of recall strategy
WO2021208695A1 (en) * 2020-11-19 2021-10-21 平安科技(深圳)有限公司 Method and apparatus for target item recommendation, electronic device, and computer readable storage medium
CN115098931A (en) * 2022-07-20 2022-09-23 江苏艾佳家居用品有限公司 Small sample analysis method for mining personalized requirements of indoor design of user
CN115329078A (en) * 2022-08-11 2022-11-11 北京百度网讯科技有限公司 Text data processing method, device, equipment and storage medium
CN115511506A (en) * 2022-09-30 2022-12-23 中国电子科技集团公司第十五研究所 Enterprise credit rating method, device, terminal equipment and storage medium
CN116362761A (en) * 2023-03-06 2023-06-30 北京三维天地科技股份有限公司 Verification detection mechanism recommendation method and system based on data aggregation recommendation algorithm
CN116541607A (en) * 2023-07-04 2023-08-04 量子数科科技有限公司 Intelligent recommendation method based on commodity retrieval data analysis
CN116977034A (en) * 2023-09-22 2023-10-31 北京世纪飞讯科技有限公司 Internet brand user management method and system based on big data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014020214A1 (en) * 2012-08-01 2014-02-06 Universitat Politècnica De Catalunya Improved method for spectral clustering by computer and uses thereof
CN104503973A (en) * 2014-11-14 2015-04-08 浙江大学软件学院(宁波)管理中心(宁波软件教育中心) Recommendation method based on singular value decomposition and classifier combination

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014020214A1 (en) * 2012-08-01 2014-02-06 Universitat Politècnica De Catalunya Improved method for spectral clustering by computer and uses thereof
CN104503973A (en) * 2014-11-14 2015-04-08 浙江大学软件学院(宁波)管理中心(宁波软件教育中心) Recommendation method based on singular value decomposition and classifier combination

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
陈冠良: "基于多指标评分的推荐算法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (47)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017129033A1 (en) * 2016-01-29 2017-08-03 阿里巴巴集团控股有限公司 Question recommendation method and device
TWI772287B (en) * 2016-01-29 2022-08-01 香港商阿里巴巴集團服務有限公司 Recommended methods and equipment for problems
CN105824878A (en) * 2016-03-08 2016-08-03 上海大学 Product recommending method based on support vector machine regression model
CN106503096A (en) * 2016-10-14 2017-03-15 上海斐讯数据通信技术有限公司 Social networkies based on distributed noise control sound interference recommend method and system
CN106503096B (en) * 2016-10-14 2020-02-04 上海斐讯数据通信技术有限公司 Social network recommendation method and system based on distributed noise interference prevention
CN107220831A (en) * 2017-04-06 2017-09-29 广东顺德中山大学卡内基梅隆大学国际联合研究院 A kind of user group's division methods and system based on anti-fake traceability system
CN107123032A (en) * 2017-05-02 2017-09-01 北京邮电大学 A kind of item recommendation method and device
CN107123032B (en) * 2017-05-02 2020-11-13 北京邮电大学 Article recommendation method and device
CN107274247A (en) * 2017-05-03 2017-10-20 浙江工商大学 Wisdom based on cloud computing perceives recommendation method
CN107169830A (en) * 2017-05-15 2017-09-15 南京大学 A kind of personalized recommendation method based on cluster PU matrix decompositions
CN107169830B (en) * 2017-05-15 2020-11-03 南京大学 Personalized recommendation method based on clustering PU matrix decomposition
CN107818165A (en) * 2017-10-31 2018-03-20 平安科技(深圳)有限公司 Marketing client screening technique, electronic installation and storage medium based on tag library
CN107944911A (en) * 2017-11-18 2018-04-20 电子科技大学 A kind of recommendation method of the commending system based on text analyzing
CN107944911B (en) * 2017-11-18 2021-12-03 电子科技大学 Recommendation method of recommendation system based on text analysis
CN107862070A (en) * 2017-11-22 2018-03-30 华南理工大学 Online class based on text cluster discusses the instant group technology of short text and system
CN107862070B (en) * 2017-11-22 2021-08-10 华南理工大学 Online classroom discussion short text instant grouping method and system based on text clustering
CN108665301A (en) * 2018-03-20 2018-10-16 鹏璨文化创意(上海)股份有限公司 A kind of exhibitions interaction platform towards multipair elephant
CN108665301B (en) * 2018-03-20 2021-10-19 鹏璨文化创意(上海)股份有限公司 Multi-object oriented exhibition interactive platform
CN109190023A (en) * 2018-08-15 2019-01-11 深圳信息职业技术学院 The method, apparatus and terminal device of Collaborative Recommendation
CN109408702A (en) * 2018-08-29 2019-03-01 昆明理工大学 A kind of mixed recommendation method based on sparse edge noise reduction autocoding
CN109408702B (en) * 2018-08-29 2021-07-16 昆明理工大学 Mixed recommendation method based on sparse edge noise reduction automatic coding
CN109559169B (en) * 2018-11-26 2022-12-13 上海财经大学 Method for identifying sharp users based on online user scoring
CN109559169A (en) * 2018-11-26 2019-04-02 上海财经大学 A kind of sharp user knowledge method for distinguishing based on online user's scoring
CN109408728A (en) * 2018-11-30 2019-03-01 安徽大学 A kind of difference secret protection recommended method based on covering algorithm
CN109640128B (en) * 2018-12-04 2021-01-05 南昌航空大学 Television user watching behavior feature extraction method and system
CN109640128A (en) * 2018-12-04 2019-04-16 南昌航空大学 A kind of TV user watching behavior feature extracting method and system
CN109726233A (en) * 2018-12-28 2019-05-07 浙江省公众信息产业有限公司 For portraying the method, computer system and readable medium of user image
WO2020133398A1 (en) * 2018-12-29 2020-07-02 深圳市欢太科技有限公司 Application recommendation method and apparatus, server and computer-readable storage medium
CN109933716A (en) * 2019-01-15 2019-06-25 深圳心跳智能科技有限公司 A kind of personalized hotel's intelligent recommendation algorithm based on customer action Habit Preference
CN110110139A (en) * 2019-04-19 2019-08-09 北京奇艺世纪科技有限公司 The method, apparatus and electronic equipment that a kind of pair of recommendation results explain
CN110110230A (en) * 2019-04-26 2019-08-09 华南理工大学 A kind of recommended method to be scored based on user with comment
CN110992215A (en) * 2019-12-10 2020-04-10 浙江力石科技股份有限公司 Semantic analysis-based travel service recommendation system, database and recommendation method
CN110992215B (en) * 2019-12-10 2023-10-13 浙江力石科技股份有限公司 Travel service recommendation system, database and recommendation method based on semantic analysis
CN111222332A (en) * 2020-01-06 2020-06-02 华南理工大学 Commodity recommendation method combining attention network and user emotion
CN111444438B (en) * 2020-03-24 2023-09-01 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining quasi-recall rate of recall strategy
CN111444438A (en) * 2020-03-24 2020-07-24 北京百度网讯科技有限公司 Method, device, equipment and storage medium for determining recall permission rate of recall strategy
WO2021208695A1 (en) * 2020-11-19 2021-10-21 平安科技(深圳)有限公司 Method and apparatus for target item recommendation, electronic device, and computer readable storage medium
CN115098931A (en) * 2022-07-20 2022-09-23 江苏艾佳家居用品有限公司 Small sample analysis method for mining personalized requirements of indoor design of user
CN115329078A (en) * 2022-08-11 2022-11-11 北京百度网讯科技有限公司 Text data processing method, device, equipment and storage medium
CN115329078B (en) * 2022-08-11 2024-03-12 北京百度网讯科技有限公司 Text data processing method, device, equipment and storage medium
CN115511506A (en) * 2022-09-30 2022-12-23 中国电子科技集团公司第十五研究所 Enterprise credit rating method, device, terminal equipment and storage medium
CN116362761A (en) * 2023-03-06 2023-06-30 北京三维天地科技股份有限公司 Verification detection mechanism recommendation method and system based on data aggregation recommendation algorithm
CN116362761B (en) * 2023-03-06 2024-04-05 北京三维天地科技股份有限公司 Verification detection mechanism recommendation method and system based on data aggregation recommendation algorithm
CN116541607B (en) * 2023-07-04 2023-09-15 量子数科科技有限公司 Intelligent recommendation method based on commodity retrieval data analysis
CN116541607A (en) * 2023-07-04 2023-08-04 量子数科科技有限公司 Intelligent recommendation method based on commodity retrieval data analysis
CN116977034A (en) * 2023-09-22 2023-10-31 北京世纪飞讯科技有限公司 Internet brand user management method and system based on big data
CN116977034B (en) * 2023-09-22 2023-12-08 北京世纪飞讯科技有限公司 Internet brand user management method and system based on big data

Similar Documents

Publication Publication Date Title
CN105095477A (en) Recommendation algorithm based on multi-index grading
CN109543178B (en) Method and system for constructing judicial text label system
CN105488024B (en) The abstracting method and device of Web page subject sentence
CN101231634B (en) Autoabstract method for multi-document
CN103823859B (en) Name recognition algorithm based on combination of decision-making tree rules and multiple statistic models
CN101751455B (en) Method for automatically generating title by adopting artificial intelligence technology
CN110532379B (en) Electronic information recommendation method based on LSTM (least Square TM) user comment sentiment analysis
CN102194013A (en) Domain-knowledge-based short text classification method and text classification system
CN103810299A (en) Image retrieval method on basis of multi-feature fusion
CN106156333B (en) A kind of improvement list class collaborative filtering method of mosaic society's information
CN102193936A (en) Data classification method and device
CN105893609A (en) Mobile APP recommendation method based on weighted mixing
CN103778227A (en) Method for screening useful images from retrieved images
CN111523055B (en) Collaborative recommendation method and system based on agricultural product characteristic attribute comment tendency
CN107315738A (en) A kind of innovation degree appraisal procedure of text message
CN105488077A (en) Content tag generation method and apparatus
CN105138508A (en) Preference diffusion based context recommendation system
CN104008187B (en) Semi-structured text matching method based on the minimum edit distance
CN110134799B (en) BM25 algorithm-based text corpus construction and optimization method
CN100543735C (en) File similarity measure method based on file structure
CN112905739B (en) False comment detection model training method, detection method and electronic equipment
CN109241527B (en) Automatic generation method of false comment data set of Chinese commodity
CN106250925B (en) A kind of zero Sample video classification method based on improved canonical correlation analysis
CN111191051B (en) Method and system for constructing emergency knowledge map based on Chinese word segmentation technology
CN104239496A (en) Collaborative filtering method based on integration of fuzzy weight similarity measurement and clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20151125

WD01 Invention patent application deemed withdrawn after publication