CN110096640A - User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items - Google Patents
User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items Download PDFInfo
- Publication number
- CN110096640A CN110096640A CN201910176852.9A CN201910176852A CN110096640A CN 110096640 A CN110096640 A CN 110096640A CN 201910176852 A CN201910176852 A CN 201910176852A CN 110096640 A CN110096640 A CN 110096640A
- Authority
- CN
- China
- Prior art keywords
- user
- hierarchical structure
- structure tree
- node
- project
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 20
- 238000001914 filtration Methods 0.000 title claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims abstract description 8
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 239000011435 rock Substances 0.000 description 3
- 230000018199 S phase Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 2
- 238000013480 data collection Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000009423 ventilation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Abstract
The present invention provides user's similarity calculating method based on classification of the items in a kind of Collaborative Filtering Recommendation System, it include: to be classified by the projects to score each user, it is correspondingly formed hierarchical structure tree, the project relevance present in hierarchical structure tree to be scored by each user calculates the similarity between user.In the case where user's rating matrix is more sparse, possible multiple users simultaneously do not score to identical items, then being all 0 using the similitude that them are calculated in traditional cosine similarity.User's similarity calculating method based on classification of the items, even if two users do not score to identical items, as long as there are certain relevances in hierarchical structure tree for the project of the two scoring, its contribution margin to user's similitude can accurately be calculated, and then user's similarity is accurately calculated, to correctly find similar neighborhood for target user.
Description
Technical field
The present invention relates to personalized recommendation system, the use based on classification of the items in especially a kind of Collaborative Filtering Recommendation System
Family similarity calculating method.
Background technique
With the fast development of Internet technology and information technology, the information on internet is steeply risen, and user searches out institute
The information needed is increasingly difficult, and while people can obtain more rich and varied information resources, also there is an urgent need in magnanimity
Useful information is rapidly and accurately extracted in data, in this context, personalized recommendation system comes into being, and achieves wide
General application.Personalized recommendation system extracts the interest of user by the characteristic information and historical behavior data of collection and analysis user
Preference, to provide accurate personalized ventilation system for user.
In numerous personalized recommendation technologies, collaborative filtering is current most successful and applies most recommended technologies,
It is widely used in e-commerce system.Its core concept is filtered out and is used with target based on user-project score data collection
Interest similar user in family is used as nearest-neighbors collection by comprehensive score information prediction target of the nearest-neighbors to projects
Scoring of the family to projects, i.e. fancy grade, to make corresponding recommendation for target user.Its advantage is that not needing analysis item
Each dimensional characteristics of purpose, and the point of interest that usage mining can be helped new.But when project has the classification method institute of standard
When the hierarchical structure feature of formation, traditional similarity calculating method does not consider the relationship between each project, leads to user's phase
It is larger like degree error calculated, and then influence the selection of nearest-neighbors.
Collaborative Filtering Recommendation Algorithm is based on user-project score data collection, if total number of users is m, project sum is
N, d1……dnIndicate projects, USER1 ... USERm indicates each user;Ri,jScoring for user i to project j, scoring are got over
It is high, then it represents that user i is bigger to the preference degree of project j;User-project rating matrix is as follows:
Existing user's similarity calculating method is mainly cosine similarity calculation method, if between user i and user j
Similitude sim (i, j) expression, then:
Wherein Ii,jIndicate the project set that user i and user j scored jointly, IiAnd IjRespectively indicate user i and user j
The project set to score, Ri,cIndicate scoring of the user i to project c, Rj,cIndicate scoring of the user j to project c;sim(i,j)
Value range be [0,1], value shows that more greatly the similitude between user i and user j is higher.
Traditional cosine similarity calculation method only relies on single interest value to calculate user's similarity, and there is no consider
To the characteristic information of project, therefore obtained user's similarity result is inaccurate.And it is more dilute in user's rating matrix
In the case where dredging, the probability for the project that multiple users did not score jointly increases, then utilizing traditional cosine similarity meter
The similitude that calculation obtains them is all 0.Therefore the higher neighbours of similarity accurately can not be found for target user.
When having hierarchical structure feature between project, illustrate that traditional cosine similarity calculates user below by example
The problem of similitude.As shown in Figure 1;
Assuming that project expression is music, the leaf node in corresponding diagram 2 successively indicates downwards music from root node Root
Type (Rock and Classical), band (Beatles and Stones) belonging to music, specific music, this be one by
Slightly arrive thin assorting process.Assuming that there are following users, scoring: A:(b is produced to following types of music respectively1,b2), B
(b3,b4), C (s1,s2), each user is defaulted as 1 to the scoring of music;
If obtaining sim (A, B)=sim (A, C)=sim (B, C)=0 using cosine similarity calculation method.But it is straight
From the point of view of in sight, it is clear that it can be concluded that sim (A, B) > sim (B, C) because A and B like the music of Beatles band, and B and
C likes the music of Rock type, the distance closer one between A and B between distance ratio B and the C in above-mentioned hierarchical structure tree
A bit.
Summary of the invention
When the purpose of the present invention is for project there is the classification method of standard to be formed by hierarchical structure feature, one is provided
User's similarity calculating method based on classification of the items in the kind higher Collaborative Filtering Recommendation System of computational accuracy.The present invention uses
Technical solution be:
User's similarity calculating method based on classification of the items in a kind of Collaborative Filtering Recommendation System, comprising:
Classified by the projects to score each user, be correspondingly formed hierarchical structure tree, commented by each user
The project relevance present in hierarchical structure tree divided calculates the similarity between user.
User's similarity calculating method in the Collaborative Filtering Recommendation System based on classification of the items, specifically includes:
If user-project rating matrix are as follows:
Wherein, total number of users m, project sum are n, d1……dnIndicate projects, USER1 ... USERm indicates each use
Family;Ri,jScoring for user i to project j;
Scoring of the user i and user j in n dimension project space, can be indicated with n-dimensional vector, it may be assumed that C respectivelyi={ (dt,
Ri,t) | t ∈ { 1 ..., n } }, Cj={ (dt,Rj,t)|t∈{1,...,n}};
Assuming that classifying to all projects, an initial hierarchical structure tree is formed;Retain the project that user i scored
The leaf node of expression, and backtracking retains the ancestor node of these leaf nodes upwards, traces back to root node Root always, other
Node is deleted, by vector CiGenerate a hierarchical structure tree Ti;Similarly, by vector CjA raw hierarchical structure tree Tj;
Find common nodeIf hierarchical structure tree TiMiddle leaf node l also appears in hierarchical structure tree Tj
In, then using leaf node l as common nodeOtherwise, hierarchical structure tree T is foundiThe ancestral of middle leaf node l
First node, and the ancestor node also appears in hierarchical structure tree TjIn, when there are multiple such ancestor nodes, take depth value
Maximum ancestor node is as common nodeWhen there are such ancestor node, by the ancestor node
As common node
IfRepresentational level structure tree TiMiddle leaf node l and hierarchical structure tree TjMatching degree, give below
OutCalculation formula:
Wherein depth (l) indicates the depth of leaf node l,Indicate the depth of above-mentioned common node
Degree,
In conjunction with user i to the score value of project, the matching degree that the user i that successively adds up scored between project and user j is obtained
To the unidirectional similarity si (i, j) between user i and user j are as follows:
Wherein, dtIndicate a project;
Similarly obtain the unidirectional similarity si (j, i) between user j and user i are as follows:
Due to si (i, j) ≠ si (j, i), the similarity between user i and user j is indicated with the average value of the two, it may be assumed that
Sim (i, j)=(si (i, j)+si (j, i))/2.
Further,
Depth (Root)=0, the depth value of other nodes successively add up downwards;If hierarchical structure tree TiMiddle leaf section
Point l also appears in hierarchical structure tree TjIn, thenIf hierarchical structure tree TiThe ancestors of middle leaf node l
Node only has root node Root in hierarchical structure tree TjIn, then
The present invention has the advantages that
(1) traditional cosine similarity calculation method is when calculating user's similitude, is merely able to calculate user and comment jointly
Divided project to the contribution margin of its similitude, there is no each dimensional characteristics for considering project, very big when existing between disparity items
Similitude when, they also have certain contribution margin to user's similitude.User Similarity measures side based on classification of the items
Method considers the hierarchical structure feature of project, the similitude between user can be accurately calculated, thus in collaborative filtering recommending
It helps target user accurately to choose similar neighborhood in system, and predicts that target user comments sundry item according to similar neighborhood
Score value.
(2) in the case where user's rating matrix is more sparse, possible multiple users simultaneously do not carry out identical items
Scoring, then being all 0 using the similitude that them are calculated in traditional cosine similarity.User based on classification of the items is similar
Calculation method is spent, even if two users do not score to identical items, as long as the project of the two scoring is in hierarchical structure tree
It is middle accurately to calculate its contribution margin to user's similitude there are certain relevance, and then accurately calculate user's phase
Like degree, to correctly find similar neighborhood for target user.
Detailed description of the invention
Fig. 1 is the project hierarchical structure tree schematic diagram in background of invention.
Fig. 2 is user A scoring hierarchical structure tree schematic diagram generated in an example of the invention.
Fig. 3 is user B scoring hierarchical structure tree schematic diagram generated in an example of the invention.
Specific embodiment
Below with reference to specific drawings and examples, the invention will be further described.
If user-project rating matrix are as follows:
Wherein, total number of users m, project sum are n, d1……dnIndicate projects, USER1 ... USERm indicates each use
Family;Ri,jScoring for user i to project j;
Can be obtained by user-project rating matrix: user i and j n dimension project space in scoring, can respectively with n tie up to
Amount is to indicate, it may be assumed that Ci={ (dt,Ri,t) | t ∈ { 1 ..., n } }, Cj={ (dt,Rj,t)|t∈{1,...,n}};
Assuming that classifying to all projects, the hierarchical structure tree being similar in Fig. 1 is formed;Retain user i to score
Project indicate leaf node, and upwards backtracking retain these leaf nodes ancestor node, trace back to root node always
Root, other nodes are deleted, by vector CiGenerate a hierarchical structure tree Ti;Similarly, by vector CjA raw hierarchical structure tree
Tj;
Find common nodeIf hierarchical structure tree TiMiddle leaf node l also appears in hierarchical structure tree Tj
In, then using leaf node l as common nodeOtherwise, hierarchical structure tree T is foundiThe ancestral of middle leaf node l
First node, and the ancestor node also appears in hierarchical structure tree TjIn, when there are multiple such ancestor nodes, take depth value
Maximum ancestor node is as common nodeWhen there are such ancestor node, which is made
For common node
IfRepresentational level structure tree TiMiddle leaf node l and hierarchical structure tree TjMatching degree, give below
OutCalculation formula:
Wherein depth (l) indicates the depth of leaf node l,Indicate the depth of above-mentioned common node
Degree, depth (Root)=0, the depth value of other nodes successively add up downwards;If hierarchical structure tree TiMiddle leaf node l
Appear in hierarchical structure tree TjIn, thenIf hierarchical structure tree TiThe ancestor node of middle leaf node l
Only root node Root is in hierarchical structure tree TjIn, then
In conjunction with user i to the score value of project, the matching degree that the user i that successively adds up scored between project and user j is obtained
To the unidirectional similarity si (i, j) between user i and user j are as follows:
Wherein, dtIndicate a project;
Similarly obtain the unidirectional similarity si (j, i) between user j and user i are as follows:
Due to si (i, j) ≠ si (j, i), it can be indicated with the average value of the two between user i and user j
Similarity, it may be assumed that
Sim (i, j)=(si (i, j)+si (j, i))/2.
In an example, user A and B respectively indicates the scoring of respective project are as follows: C1={ (b1,1),(b2, 1) }, C2
={ (b3,1),(b4,1)};By C1And C2Respective hierarchical structure tree T is generated respectively1And T2, as shown in Figures 2 and 3 respectively;
By can be calculated: depth (Root)=0, depth (Rock)=1, depth (Beatles)=2, depth
(b1)=3, depth (b2)=3,Then Si (A, B)=0.67, similarly, by can be calculated si (B, A)=
0.67, therefore sim (A, B)=0.67.It can be calculated sim (A, C)=0.33, sim (B, C)=0.33 using identical method.
It can be seen that sim (A, B) > sim (B, C) > 0, it was demonstrated that the reasonability of above-mentioned similarity calculating method.
It should be noted last that the above specific embodiment is only used to illustrate the technical scheme of the present invention and not to limit it,
Although being described the invention in detail referring to example, those skilled in the art should understand that, it can be to the present invention
Technical solution be modified or replaced equivalently, without departing from the spirit and scope of the technical solution of the present invention, should all cover
In the scope of the claims of the present invention.
Claims (3)
1. user's similarity calculating method in a kind of Collaborative Filtering Recommendation System based on classification of the items, which is characterized in that
Classified by the projects to score each user, be correspondingly formed hierarchical structure tree, scored by each user
Project relevance present in hierarchical structure tree calculates the similarity between user.
2. user's similarity calculating method in Collaborative Filtering Recommendation System as described in claim 1 based on classification of the items,
It is characterized in that, specifically includes:
If user-project rating matrix are as follows:
Wherein, total number of users m, project sum are n, d1……dnIndicate projects, USER1 ... USERm indicates each user;
Ri,jScoring for user i to project j;
Scoring of the user i and user j in n dimension project space, can be indicated with n-dimensional vector, it may be assumed that C respectivelyi={ (dt,Ri,t)
| t ∈ { 1 ..., n } }, Cj={ (dt,Rj,t)|t∈{1,...,n}};
Assuming that classifying to all projects, an initial hierarchical structure tree is formed;Retaining the project that user i scored indicates
Leaf node, and upwards backtracking retain these leaf nodes ancestor node, trace back to root node Root always, other nodes
It deletes, by vector CiGenerate a hierarchical structure tree Ti;Similarly, by vector CjA raw hierarchical structure tree Tj;
Find common nodeIf hierarchical structure tree TiMiddle leaf node l also appears in hierarchical structure tree TjIn, then will
Leaf node l is as common nodeOtherwise, hierarchical structure tree T is foundiThe ancestor node of middle leaf node l,
And the ancestor node also appears in hierarchical structure tree TjIn, when there are multiple such ancestor nodes, take depth value maximum
Ancestor node is as common nodeWhen there are such ancestor node, using the ancestor node as public
Node
IfRepresentational level structure tree TiMiddle leaf node l and hierarchical structure tree TjMatching degree, be given belowCalculation formula:
Wherein depth (l) indicates the depth of leaf node l,Indicate the depth of above-mentioned common node,
In conjunction with user i to the score value of project, the matching degree that the user i that successively adds up scored between project and user j is used
Unidirectional similarity si (i, j) between family i and user j are as follows:
Wherein, dtIndicate a project;
Similarly obtain the unidirectional similarity si (j, i) between user j and user i are as follows:
Due to si (i, j) ≠ si (j, i), the similarity between user i and user j is indicated with the average value of the two, it may be assumed that
Sim (i, j)=(si (i, j)+si (j, i))/2.
3. user's similarity calculating method in Collaborative Filtering Recommendation System as claimed in claim 2 based on classification of the items,
It is characterized in that,
Depth (Root)=0, the depth value of other nodes successively add up downwards;If hierarchical structure tree TiMiddle leaf node l
Appear in hierarchical structure tree TjIn, thenIf hierarchical structure tree TiThe ancestor node of middle leaf node l
Only root node Root is in hierarchical structure tree TjIn, then
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910176852.9A CN110096640A (en) | 2019-03-08 | 2019-03-08 | User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910176852.9A CN110096640A (en) | 2019-03-08 | 2019-03-08 | User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110096640A true CN110096640A (en) | 2019-08-06 |
Family
ID=67443866
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910176852.9A Pending CN110096640A (en) | 2019-03-08 | 2019-03-08 | User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096640A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110647683A (en) * | 2019-09-17 | 2020-01-03 | 北京邮电大学 | Information recommendation method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320719A (en) * | 2015-01-16 | 2016-02-10 | 焦点科技股份有限公司 | Crowdfunding website project recommendation method based on project tag and graphical relationship |
CN109086281A (en) * | 2017-06-14 | 2018-12-25 | 成都淞幸科技有限责任公司 | A kind of supplier's recommended method based on arest neighbors Collaborative Filtering Recommendation Algorithm |
-
2019
- 2019-03-08 CN CN201910176852.9A patent/CN110096640A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320719A (en) * | 2015-01-16 | 2016-02-10 | 焦点科技股份有限公司 | Crowdfunding website project recommendation method based on project tag and graphical relationship |
CN109086281A (en) * | 2017-06-14 | 2018-12-25 | 成都淞幸科技有限责任公司 | A kind of supplier's recommended method based on arest neighbors Collaborative Filtering Recommendation Algorithm |
Non-Patent Citations (2)
Title |
---|
张晓敏等: "基于概念层次树的个性化推荐算法", 《计算机工程》 * |
肖敏等: "基于项目语义相似度的协同过滤推荐算法", 《武汉理工大学学报》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110647683A (en) * | 2019-09-17 | 2020-01-03 | 北京邮电大学 | Information recommendation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106802956B (en) | Movie recommendation method based on weighted heterogeneous information network | |
JP6434542B2 (en) | Understanding tables for searching | |
CN107590128B (en) | Paper homonymy author disambiguation method based on high-confidence characteristic attribute hierarchical clustering method | |
CN105706078A (en) | Automatic definition of entity collections | |
CN106484764A (en) | User's similarity calculating method based on crowd portrayal technology | |
CN102750379B (en) | Fast character string matching method based on filtering type | |
Xue et al. | Ontology alignment based on instance using NSGA-II | |
CN106708929A (en) | Video program search method and device | |
WO2015051481A1 (en) | Determining collection membership in a data graph | |
CN115563313A (en) | Knowledge graph-based document book semantic retrieval system | |
Deepak et al. | Operators for similarity search: Semantics, techniques and usage scenarios | |
JP2018180789A (en) | Query clustering device, method, and program | |
KR20120087214A (en) | Friend recommendation method for SNS user, recording medium for the same, and SNS and server using the same | |
CN109902143B (en) | Multi-keyword extended retrieval method based on ciphertext | |
CN113342994B (en) | Recommendation system based on non-sampling cooperative knowledge graph network | |
Zhao et al. | An improved user identification method across social networks via tagging behaviors | |
CN103064907A (en) | System and method for topic meta search based on unsupervised entity relation extraction | |
JP7092194B2 (en) | Information processing equipment, judgment method, and program | |
CN110096640A (en) | User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items | |
CN106202349A (en) | Web page classifying dictionary creation method and device | |
Zaharieva et al. | Cross-platform social event detection | |
CN108427730A (en) | It is a kind of that method is recommended based on the Social Label of random walk and condition random field | |
CN109885797B (en) | Relational network construction method based on multi-identity space mapping | |
CN109919459B (en) | Method for measuring influence among social network objects | |
CN114022233A (en) | Novel commodity recommendation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190806 |