CN110096640A - User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items - Google Patents

User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items Download PDF

Info

Publication number
CN110096640A
CN110096640A CN201910176852.9A CN201910176852A CN110096640A CN 110096640 A CN110096640 A CN 110096640A CN 201910176852 A CN201910176852 A CN 201910176852A CN 110096640 A CN110096640 A CN 110096640A
Authority
CN
China
Prior art keywords
user
hierarchical structure
structure tree
node
project
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910176852.9A
Other languages
Chinese (zh)
Inventor
台宪青
姚文峰
崔光霁
马治杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu IoT Research and Development Center
Original Assignee
Jiangsu IoT Research and Development Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu IoT Research and Development Center filed Critical Jiangsu IoT Research and Development Center
Priority to CN201910176852.9A priority Critical patent/CN110096640A/en
Publication of CN110096640A publication Critical patent/CN110096640A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization

Abstract

The present invention provides user's similarity calculating method based on classification of the items in a kind of Collaborative Filtering Recommendation System, it include: to be classified by the projects to score each user, it is correspondingly formed hierarchical structure tree, the project relevance present in hierarchical structure tree to be scored by each user calculates the similarity between user.In the case where user's rating matrix is more sparse, possible multiple users simultaneously do not score to identical items, then being all 0 using the similitude that them are calculated in traditional cosine similarity.User's similarity calculating method based on classification of the items, even if two users do not score to identical items, as long as there are certain relevances in hierarchical structure tree for the project of the two scoring, its contribution margin to user's similitude can accurately be calculated, and then user's similarity is accurately calculated, to correctly find similar neighborhood for target user.

Description

User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items
Technical field
The present invention relates to personalized recommendation system, the use based on classification of the items in especially a kind of Collaborative Filtering Recommendation System Family similarity calculating method.
Background technique
With the fast development of Internet technology and information technology, the information on internet is steeply risen, and user searches out institute The information needed is increasingly difficult, and while people can obtain more rich and varied information resources, also there is an urgent need in magnanimity Useful information is rapidly and accurately extracted in data, in this context, personalized recommendation system comes into being, and achieves wide General application.Personalized recommendation system extracts the interest of user by the characteristic information and historical behavior data of collection and analysis user Preference, to provide accurate personalized ventilation system for user.
In numerous personalized recommendation technologies, collaborative filtering is current most successful and applies most recommended technologies, It is widely used in e-commerce system.Its core concept is filtered out and is used with target based on user-project score data collection Interest similar user in family is used as nearest-neighbors collection by comprehensive score information prediction target of the nearest-neighbors to projects Scoring of the family to projects, i.e. fancy grade, to make corresponding recommendation for target user.Its advantage is that not needing analysis item Each dimensional characteristics of purpose, and the point of interest that usage mining can be helped new.But when project has the classification method institute of standard When the hierarchical structure feature of formation, traditional similarity calculating method does not consider the relationship between each project, leads to user's phase It is larger like degree error calculated, and then influence the selection of nearest-neighbors.
Collaborative Filtering Recommendation Algorithm is based on user-project score data collection, if total number of users is m, project sum is N, d1……dnIndicate projects, USER1 ... USERm indicates each user;Ri,jScoring for user i to project j, scoring are got over It is high, then it represents that user i is bigger to the preference degree of project j;User-project rating matrix is as follows:
Existing user's similarity calculating method is mainly cosine similarity calculation method, if between user i and user j Similitude sim (i, j) expression, then:
Wherein Ii,jIndicate the project set that user i and user j scored jointly, IiAnd IjRespectively indicate user i and user j The project set to score, Ri,cIndicate scoring of the user i to project c, Rj,cIndicate scoring of the user j to project c;sim(i,j) Value range be [0,1], value shows that more greatly the similitude between user i and user j is higher.
Traditional cosine similarity calculation method only relies on single interest value to calculate user's similarity, and there is no consider To the characteristic information of project, therefore obtained user's similarity result is inaccurate.And it is more dilute in user's rating matrix In the case where dredging, the probability for the project that multiple users did not score jointly increases, then utilizing traditional cosine similarity meter The similitude that calculation obtains them is all 0.Therefore the higher neighbours of similarity accurately can not be found for target user.
When having hierarchical structure feature between project, illustrate that traditional cosine similarity calculates user below by example The problem of similitude.As shown in Figure 1;
Assuming that project expression is music, the leaf node in corresponding diagram 2 successively indicates downwards music from root node Root Type (Rock and Classical), band (Beatles and Stones) belonging to music, specific music, this be one by Slightly arrive thin assorting process.Assuming that there are following users, scoring: A:(b is produced to following types of music respectively1,b2), B (b3,b4), C (s1,s2), each user is defaulted as 1 to the scoring of music;
If obtaining sim (A, B)=sim (A, C)=sim (B, C)=0 using cosine similarity calculation method.But it is straight From the point of view of in sight, it is clear that it can be concluded that sim (A, B) > sim (B, C) because A and B like the music of Beatles band, and B and C likes the music of Rock type, the distance closer one between A and B between distance ratio B and the C in above-mentioned hierarchical structure tree A bit.
Summary of the invention
When the purpose of the present invention is for project there is the classification method of standard to be formed by hierarchical structure feature, one is provided User's similarity calculating method based on classification of the items in the kind higher Collaborative Filtering Recommendation System of computational accuracy.The present invention uses Technical solution be:
User's similarity calculating method based on classification of the items in a kind of Collaborative Filtering Recommendation System, comprising:
Classified by the projects to score each user, be correspondingly formed hierarchical structure tree, commented by each user The project relevance present in hierarchical structure tree divided calculates the similarity between user.
User's similarity calculating method in the Collaborative Filtering Recommendation System based on classification of the items, specifically includes:
If user-project rating matrix are as follows:
Wherein, total number of users m, project sum are n, d1……dnIndicate projects, USER1 ... USERm indicates each use Family;Ri,jScoring for user i to project j;
Scoring of the user i and user j in n dimension project space, can be indicated with n-dimensional vector, it may be assumed that C respectivelyi={ (dt, Ri,t) | t ∈ { 1 ..., n } }, Cj={ (dt,Rj,t)|t∈{1,...,n}};
Assuming that classifying to all projects, an initial hierarchical structure tree is formed;Retain the project that user i scored The leaf node of expression, and backtracking retains the ancestor node of these leaf nodes upwards, traces back to root node Root always, other Node is deleted, by vector CiGenerate a hierarchical structure tree Ti;Similarly, by vector CjA raw hierarchical structure tree Tj
Find common nodeIf hierarchical structure tree TiMiddle leaf node l also appears in hierarchical structure tree Tj In, then using leaf node l as common nodeOtherwise, hierarchical structure tree T is foundiThe ancestral of middle leaf node l First node, and the ancestor node also appears in hierarchical structure tree TjIn, when there are multiple such ancestor nodes, take depth value Maximum ancestor node is as common nodeWhen there are such ancestor node, by the ancestor node As common node
IfRepresentational level structure tree TiMiddle leaf node l and hierarchical structure tree TjMatching degree, give below OutCalculation formula:
Wherein depth (l) indicates the depth of leaf node l,Indicate the depth of above-mentioned common node Degree,
In conjunction with user i to the score value of project, the matching degree that the user i that successively adds up scored between project and user j is obtained To the unidirectional similarity si (i, j) between user i and user j are as follows:
Wherein, dtIndicate a project;
Similarly obtain the unidirectional similarity si (j, i) between user j and user i are as follows:
Due to si (i, j) ≠ si (j, i), the similarity between user i and user j is indicated with the average value of the two, it may be assumed that
Sim (i, j)=(si (i, j)+si (j, i))/2.
Further,
Depth (Root)=0, the depth value of other nodes successively add up downwards;If hierarchical structure tree TiMiddle leaf section Point l also appears in hierarchical structure tree TjIn, thenIf hierarchical structure tree TiThe ancestors of middle leaf node l Node only has root node Root in hierarchical structure tree TjIn, then
The present invention has the advantages that
(1) traditional cosine similarity calculation method is when calculating user's similitude, is merely able to calculate user and comment jointly Divided project to the contribution margin of its similitude, there is no each dimensional characteristics for considering project, very big when existing between disparity items Similitude when, they also have certain contribution margin to user's similitude.User Similarity measures side based on classification of the items Method considers the hierarchical structure feature of project, the similitude between user can be accurately calculated, thus in collaborative filtering recommending It helps target user accurately to choose similar neighborhood in system, and predicts that target user comments sundry item according to similar neighborhood Score value.
(2) in the case where user's rating matrix is more sparse, possible multiple users simultaneously do not carry out identical items Scoring, then being all 0 using the similitude that them are calculated in traditional cosine similarity.User based on classification of the items is similar Calculation method is spent, even if two users do not score to identical items, as long as the project of the two scoring is in hierarchical structure tree It is middle accurately to calculate its contribution margin to user's similitude there are certain relevance, and then accurately calculate user's phase Like degree, to correctly find similar neighborhood for target user.
Detailed description of the invention
Fig. 1 is the project hierarchical structure tree schematic diagram in background of invention.
Fig. 2 is user A scoring hierarchical structure tree schematic diagram generated in an example of the invention.
Fig. 3 is user B scoring hierarchical structure tree schematic diagram generated in an example of the invention.
Specific embodiment
Below with reference to specific drawings and examples, the invention will be further described.
If user-project rating matrix are as follows:
Wherein, total number of users m, project sum are n, d1……dnIndicate projects, USER1 ... USERm indicates each use Family;Ri,jScoring for user i to project j;
Can be obtained by user-project rating matrix: user i and j n dimension project space in scoring, can respectively with n tie up to Amount is to indicate, it may be assumed that Ci={ (dt,Ri,t) | t ∈ { 1 ..., n } }, Cj={ (dt,Rj,t)|t∈{1,...,n}};
Assuming that classifying to all projects, the hierarchical structure tree being similar in Fig. 1 is formed;Retain user i to score Project indicate leaf node, and upwards backtracking retain these leaf nodes ancestor node, trace back to root node always Root, other nodes are deleted, by vector CiGenerate a hierarchical structure tree Ti;Similarly, by vector CjA raw hierarchical structure tree Tj
Find common nodeIf hierarchical structure tree TiMiddle leaf node l also appears in hierarchical structure tree Tj In, then using leaf node l as common nodeOtherwise, hierarchical structure tree T is foundiThe ancestral of middle leaf node l First node, and the ancestor node also appears in hierarchical structure tree TjIn, when there are multiple such ancestor nodes, take depth value Maximum ancestor node is as common nodeWhen there are such ancestor node, which is made For common node
IfRepresentational level structure tree TiMiddle leaf node l and hierarchical structure tree TjMatching degree, give below OutCalculation formula:
Wherein depth (l) indicates the depth of leaf node l,Indicate the depth of above-mentioned common node Degree, depth (Root)=0, the depth value of other nodes successively add up downwards;If hierarchical structure tree TiMiddle leaf node l Appear in hierarchical structure tree TjIn, thenIf hierarchical structure tree TiThe ancestor node of middle leaf node l Only root node Root is in hierarchical structure tree TjIn, then
In conjunction with user i to the score value of project, the matching degree that the user i that successively adds up scored between project and user j is obtained To the unidirectional similarity si (i, j) between user i and user j are as follows:
Wherein, dtIndicate a project;
Similarly obtain the unidirectional similarity si (j, i) between user j and user i are as follows:
Due to si (i, j) ≠ si (j, i), it can be indicated with the average value of the two between user i and user j Similarity, it may be assumed that
Sim (i, j)=(si (i, j)+si (j, i))/2.
In an example, user A and B respectively indicates the scoring of respective project are as follows: C1={ (b1,1),(b2, 1) }, C2 ={ (b3,1),(b4,1)};By C1And C2Respective hierarchical structure tree T is generated respectively1And T2, as shown in Figures 2 and 3 respectively;
By can be calculated: depth (Root)=0, depth (Rock)=1, depth (Beatles)=2, depth (b1)=3, depth (b2)=3,Then Si (A, B)=0.67, similarly, by can be calculated si (B, A)= 0.67, therefore sim (A, B)=0.67.It can be calculated sim (A, C)=0.33, sim (B, C)=0.33 using identical method.
It can be seen that sim (A, B) > sim (B, C) > 0, it was demonstrated that the reasonability of above-mentioned similarity calculating method.
It should be noted last that the above specific embodiment is only used to illustrate the technical scheme of the present invention and not to limit it, Although being described the invention in detail referring to example, those skilled in the art should understand that, it can be to the present invention Technical solution be modified or replaced equivalently, without departing from the spirit and scope of the technical solution of the present invention, should all cover In the scope of the claims of the present invention.

Claims (3)

1. user's similarity calculating method in a kind of Collaborative Filtering Recommendation System based on classification of the items, which is characterized in that
Classified by the projects to score each user, be correspondingly formed hierarchical structure tree, scored by each user Project relevance present in hierarchical structure tree calculates the similarity between user.
2. user's similarity calculating method in Collaborative Filtering Recommendation System as described in claim 1 based on classification of the items, It is characterized in that, specifically includes:
If user-project rating matrix are as follows:
Wherein, total number of users m, project sum are n, d1……dnIndicate projects, USER1 ... USERm indicates each user; Ri,jScoring for user i to project j;
Scoring of the user i and user j in n dimension project space, can be indicated with n-dimensional vector, it may be assumed that C respectivelyi={ (dt,Ri,t) | t ∈ { 1 ..., n } }, Cj={ (dt,Rj,t)|t∈{1,...,n}};
Assuming that classifying to all projects, an initial hierarchical structure tree is formed;Retaining the project that user i scored indicates Leaf node, and upwards backtracking retain these leaf nodes ancestor node, trace back to root node Root always, other nodes It deletes, by vector CiGenerate a hierarchical structure tree Ti;Similarly, by vector CjA raw hierarchical structure tree Tj
Find common nodeIf hierarchical structure tree TiMiddle leaf node l also appears in hierarchical structure tree TjIn, then will Leaf node l is as common nodeOtherwise, hierarchical structure tree T is foundiThe ancestor node of middle leaf node l, And the ancestor node also appears in hierarchical structure tree TjIn, when there are multiple such ancestor nodes, take depth value maximum Ancestor node is as common nodeWhen there are such ancestor node, using the ancestor node as public Node
IfRepresentational level structure tree TiMiddle leaf node l and hierarchical structure tree TjMatching degree, be given belowCalculation formula:
Wherein depth (l) indicates the depth of leaf node l,Indicate the depth of above-mentioned common node,
In conjunction with user i to the score value of project, the matching degree that the user i that successively adds up scored between project and user j is used Unidirectional similarity si (i, j) between family i and user j are as follows:
Wherein, dtIndicate a project;
Similarly obtain the unidirectional similarity si (j, i) between user j and user i are as follows:
Due to si (i, j) ≠ si (j, i), the similarity between user i and user j is indicated with the average value of the two, it may be assumed that
Sim (i, j)=(si (i, j)+si (j, i))/2.
3. user's similarity calculating method in Collaborative Filtering Recommendation System as claimed in claim 2 based on classification of the items, It is characterized in that,
Depth (Root)=0, the depth value of other nodes successively add up downwards;If hierarchical structure tree TiMiddle leaf node l Appear in hierarchical structure tree TjIn, thenIf hierarchical structure tree TiThe ancestor node of middle leaf node l Only root node Root is in hierarchical structure tree TjIn, then
CN201910176852.9A 2019-03-08 2019-03-08 User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items Pending CN110096640A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910176852.9A CN110096640A (en) 2019-03-08 2019-03-08 User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910176852.9A CN110096640A (en) 2019-03-08 2019-03-08 User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items

Publications (1)

Publication Number Publication Date
CN110096640A true CN110096640A (en) 2019-08-06

Family

ID=67443866

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910176852.9A Pending CN110096640A (en) 2019-03-08 2019-03-08 User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items

Country Status (1)

Country Link
CN (1) CN110096640A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647683A (en) * 2019-09-17 2020-01-03 北京邮电大学 Information recommendation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320719A (en) * 2015-01-16 2016-02-10 焦点科技股份有限公司 Crowdfunding website project recommendation method based on project tag and graphical relationship
CN109086281A (en) * 2017-06-14 2018-12-25 成都淞幸科技有限责任公司 A kind of supplier's recommended method based on arest neighbors Collaborative Filtering Recommendation Algorithm

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320719A (en) * 2015-01-16 2016-02-10 焦点科技股份有限公司 Crowdfunding website project recommendation method based on project tag and graphical relationship
CN109086281A (en) * 2017-06-14 2018-12-25 成都淞幸科技有限责任公司 A kind of supplier's recommended method based on arest neighbors Collaborative Filtering Recommendation Algorithm

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张晓敏等: "基于概念层次树的个性化推荐算法", 《计算机工程》 *
肖敏等: "基于项目语义相似度的协同过滤推荐算法", 《武汉理工大学学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110647683A (en) * 2019-09-17 2020-01-03 北京邮电大学 Information recommendation method and device

Similar Documents

Publication Publication Date Title
CN106802956B (en) Movie recommendation method based on weighted heterogeneous information network
JP6434542B2 (en) Understanding tables for searching
CN107590128B (en) Paper homonymy author disambiguation method based on high-confidence characteristic attribute hierarchical clustering method
CN105706078A (en) Automatic definition of entity collections
CN106484764A (en) User's similarity calculating method based on crowd portrayal technology
CN102750379B (en) Fast character string matching method based on filtering type
Xue et al. Ontology alignment based on instance using NSGA-II
CN106708929A (en) Video program search method and device
WO2015051481A1 (en) Determining collection membership in a data graph
CN115563313A (en) Knowledge graph-based document book semantic retrieval system
Deepak et al. Operators for similarity search: Semantics, techniques and usage scenarios
JP2018180789A (en) Query clustering device, method, and program
KR20120087214A (en) Friend recommendation method for SNS user, recording medium for the same, and SNS and server using the same
CN109902143B (en) Multi-keyword extended retrieval method based on ciphertext
CN113342994B (en) Recommendation system based on non-sampling cooperative knowledge graph network
Zhao et al. An improved user identification method across social networks via tagging behaviors
CN103064907A (en) System and method for topic meta search based on unsupervised entity relation extraction
JP7092194B2 (en) Information processing equipment, judgment method, and program
CN110096640A (en) User's similarity calculating method in Collaborative Filtering Recommendation System based on classification of the items
CN106202349A (en) Web page classifying dictionary creation method and device
Zaharieva et al. Cross-platform social event detection
CN108427730A (en) It is a kind of that method is recommended based on the Social Label of random walk and condition random field
CN109885797B (en) Relational network construction method based on multi-identity space mapping
CN109919459B (en) Method for measuring influence among social network objects
CN114022233A (en) Novel commodity recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190806