CN104915388A - Book tag recommendation method based on spectral clustering and crowdsourcing technology - Google Patents

Book tag recommendation method based on spectral clustering and crowdsourcing technology Download PDF

Info

Publication number
CN104915388A
CN104915388A CN201510270676.7A CN201510270676A CN104915388A CN 104915388 A CN104915388 A CN 104915388A CN 201510270676 A CN201510270676 A CN 201510270676A CN 104915388 A CN104915388 A CN 104915388A
Authority
CN
China
Prior art keywords
term
cluster
user
matrix
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510270676.7A
Other languages
Chinese (zh)
Other versions
CN104915388B (en
Inventor
张寅�
魏宝刚
尹彦飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201510270676.7A priority Critical patent/CN104915388B/en
Publication of CN104915388A publication Critical patent/CN104915388A/en
Application granted granted Critical
Publication of CN104915388B publication Critical patent/CN104915388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a book tag recommendation method based on spectral clustering and a crowdsourcing technology. After the method is applied to a digital library system, a Laplacian matrix is constructed through a retrieval click log of a user, retrieval words are clustered through the spectral clustering, a spectral clustering result is continuously optimized through the crowdsourcing technology, and finally, the optimized result is applied to a recommendation system. According to the method, the retrieval words of the user serve as tags, the clustering accuracy of the retrieval words is improved through combination of the spectral clustering and the crowdsourcing technology, and therefore the accuracy of the system in the aspect of tag recommendation is improved.

Description

A kind of book labels recommend method based on spectral clustering and mass-rent technology
Technical field
The invention belongs to the book labels recommended technology based on spectral clustering and mass-rent technology, relate to a kind of book labels recommend method based on spectral clustering and mass-rent technology.
Background technology
Along with being on the increase of internet information, information presents explosive growth, and rationally sorting out efficiently information becomes the key that information effectively utilizes.Traditional classifying method carries out mainly through artificial mode, and under the prerequisite of magnanimity information, the mode of this kind of information categorization is hard to carry on, has thus engendered the novel information classifying mode taking label as core, and has become the key of internet, applications.In the middle of digital library system, label is mainly from book information, use in the process of system user simultaneously, term, the books index information of user also can join in the middle of system as a kind of label, and be that tie furthers the relation of user and books with label, promote the efficiency that user finds books.
Meanwhile, for the application of mass data, commending system is more and more paid close attention to.The mode of user's obtaining information is retrieved the vertical retrieval of domain knowledge by full network type information, arrive current commending system again, the acquisition speed of information is constantly accelerated, and improves constantly for the information personalized of different user, and the contribution of commending system in system availability is day by day remarkable.Clustering algorithm is the key method of data mining, and in commending system, clustering algorithm is used for realizing the cluster to article, user, and runs by the iteration of algorithm the effect optimizing cluster.
Summary of the invention
The object of the invention is to deficiency term utilized for existing commending system, provide a kind of in digital library the book labels recommend method based on spectral clustering and mass-rent technology.
The object of the invention is to be achieved through the following technical solutions: a kind of book labels recommend method based on spectral clustering and mass-rent technology, comprises the following steps:
(1) from result collection system or Web daily record, filter out retrieve data and the retrieval click data of user;
(2) utilize retrieve data and the retrieval click data of user, build term-books matrix, obtain the Laplacian matrix of term-term according to term-books matrix;
(3) use spectral clustering to carry out cluster operation to Laplacian matrix, obtain the cluster result of term;
(4) optimization that mass-rent technology continues the cluster result that step 3 obtains is utilized;
(5) cluster result after the search records in user's past and step 4 being optimized maps, and utilizes the cluster structures after mapping as label recommendations to user.
Further, described step 2 is specially: the term set Q={q obtaining all users from the retrieve data of user 1, q 2..., q n, wherein n is the sum of term, and q is independent term; Books set B={ b that term is clicked is obtained from the retrieval click data of user 1, b 2..., b m, wherein m clicks the sum of books, and b is independently books; Obtain term-books matrix M according to the books set B that the term set Q of all users and term click, for each of term-books matrix M, be defined as follows:
Wherein I ijit is the corresponding relation of i-th term and this book of jth; For each these books, if there is multiple term all to there is click behavior to this this book, so there is contact between these terms, term-term matrix D is built according to the contact between term, for each of term-term matrix D, if there is contact between two terms, be 1, otherwise be 0; Be placed on diagonal line by each column element of term-term matrix D is added the value obtained, other position is set to 0, thus forms new matrix W; Laplacian matrix L is obtained by formula L=D-W.
Further, described step 3 is specially: for spectral clustering, and selected objective function RatioCut is:
RatioCut ( A i , . . . , A k ) = 1 2 Σ i = 1 k W ( A i , A i ‾ ) | A i | = Σ i = 1 k cut ( A i , A i ‾ ) | A i |
Wherein k is the number of cluster, A irepresent i-th cluster result, | A i| represent the term quantity in i-th cluster result, represent removing A ioutside other cluster result set, represent the weight sum of i-th cluster result and other cluster results, computing formula be wherein W (a, b) is the weight of cluster result a and cluster result b; Release according to the character of Laplacian matrix L and minimize objective function RatioCut and be equivalent to and minimize Laplacian matrix, thus use the dimensionality reduction of method realization to Laplacian matrix of SVD matrix decomposition, use K-mean clustering algorithm to complete the cluster operation of the Laplacian matrix after to dimensionality reduction.
Further, described step 4 is specially: the result of cluster, as the selected user of mass-rent, is sent to selected user by the mode sending mail by the user that in the cluster result of term step 3 obtained, term is corresponding, and the feedback of selected user is defined as:
Wherein, Query represents a term, positive feedback represents that user thinks that this term meets the theme of place cluster result, and negative feedback represents that user thinks that this term does not meet the theme of cluster result, and this term of zero feedback representation is difficult to judge whether to meet theme; According to the feedback information of selected user to a cluster result, this cluster result is carried out to the process of following three kinds of different modes:
A the feedback information of () selected user shows that this cluster can well show some themes, it is embodied in two aspects: be on the one hand that negative feedback result is less than positive feedback result, is the situation that the feedback information of user does not exist contradiction each other on the other hand; In this case, delete the negative feedback in cluster result, retain the term of positive feedback and zero feedback;
B the feedback information of () selected user is chaotic, be difficult to the quality showing this Clustering Effect, it is even contrary that it is embodied in the feedback information difference of several user to identical term; In this kind of situation, mean that the feedback information of current selected user is still not enough to judge this cluster, thus need to introduce new user, the distribution of mass-rent task again operation;
C the feedback information of () selected user shows that this cluster does not have clear and definite theme, be embodied in the feedback of selected user different or contrary more than the feedback information of the term of 50%; In this case, directly this cluster result is deleted.
The invention has the beneficial effects as follows: the method utilizes the term information of spectral clustering to user to carry out cluster, and use the optimization that mass-rent technology continues the result of cluster, final realization utilizes term to improve the effect of book labels recommendation.The present invention is on the basis of cluster result, propose by using mass-rent technology to realize the object be optimized cluster result, the feedback information of cluster result judged by collecting multiple user and optimizes the result of cluster, and the result of cluster is applied in the middle of commending system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the book labels recommend method that the present invention is based on spectral clustering and mass-rent technology.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, a kind of book labels recommend method based on spectral clustering and mass-rent technology of the present invention, comprises the following steps:
(1) from result collection system or Web daily record, filter out retrieve data and the retrieval click data of user;
(2) utilize retrieve data and the retrieval click data of user, build term-books matrix, obtain the Laplacian matrix of term-term according to term-books matrix; Be specially: the term set Q={q obtaining all users from the retrieve data of user 1, q 2..., q n, wherein n is the sum of term, and q is independent term; Books set B={ b that term is clicked is obtained from the retrieval click data of user 1, b 2..., b m, wherein m clicks the sum of books, and b is independently books; Obtain term-books matrix M according to the books set B that the term set Q of all users and term click, for each of term-books matrix M, be defined as follows:
Wherein I ijit is the corresponding relation of i-th term and this book of jth; For each these books, if there is multiple term all to there is click behavior to this this book, so there is contact between these terms, term-term matrix D is built according to the contact between term, for each of term-term matrix D, if there is contact between two terms, be 1, otherwise be 0; Be placed on diagonal line by each column element of term-term matrix D is added the value obtained, other position is set to 0, thus forms new matrix W; Laplacian matrix L is obtained by formula L=D-W.
(3) use spectral clustering to carry out cluster operation to Laplacian matrix, obtain the cluster result of term; Be specially: for spectral clustering, selected objective function RatioCut is:
RatioCut ( A i , . . . , A k ) = 1 2 Σ i = 1 k W ( A i , A i ‾ ) | A i | = Σ i = 1 k cut ( A i , A i ‾ ) | A i |
Wherein k is the number of cluster, A irepresent i-th cluster result, | A i| represent the term quantity in i-th cluster result, represent removing A ioutside other cluster result set, represent the weight sum of i-th cluster result and other cluster results, computing formula be wherein W (a, b) is the weight of cluster result a and cluster result b; Release according to the character of Laplacian matrix L and minimize objective function RatioCut and be equivalent to and minimize Laplacian matrix, thus use the dimensionality reduction of method realization to Laplacian matrix of SVD matrix decomposition, use K-mean clustering algorithm to complete the cluster operation of the Laplacian matrix after to dimensionality reduction.
(4) optimization that mass-rent technology continues the cluster result that step 3 obtains is utilized; Be specially: the result of cluster, as the selected user of mass-rent, is sent to selected user by the mode sending mail by the user that in the cluster result of term step 3 obtained, term is corresponding, the feedback of selected user is defined as:
Wherein, Query represents a term, positive feedback represents that user thinks that this term meets the theme of place cluster result, and negative feedback represents that user thinks that this term does not meet the theme of cluster result, and this term of zero feedback representation is difficult to judge whether to meet theme; According to the feedback information of selected user to a cluster result, this cluster result is carried out to the process of following three kinds of different modes:
A the feedback information of () selected user shows that this cluster can well show some themes, it is embodied in two aspects: be on the one hand that negative feedback result is less than positive feedback result, is the situation that the feedback information of user does not exist contradiction each other on the other hand; In this case, delete the negative feedback in cluster result, retain the term of positive feedback and zero feedback;
B the feedback information of () selected user is chaotic, be difficult to the quality showing this Clustering Effect, it is even contrary that it is embodied in the feedback information difference of several user to identical term; In this kind of situation, mean that the feedback information of current selected user is still not enough to judge this cluster, thus need to introduce new user, the distribution of mass-rent task again operation;
C the feedback information of () selected user shows that this cluster does not have clear and definite theme, be embodied in the feedback of selected user different or contrary more than the feedback information of the term of 50%; In this case, directly this cluster result is deleted.
(5) cluster result after the search records in user's past and step 4 being optimized maps, and utilizes the cluster structures after mapping as label recommendations to user.

Claims (4)

1., based on a book labels recommend method for spectral clustering and mass-rent technology, it is characterized in that, comprise the following steps:
(1) from result collection system or Web daily record, filter out retrieve data and the retrieval click data of user;
(2) utilize retrieve data and the retrieval click data of user, build term-books matrix, obtain the Laplacian matrix of term-term according to term-books matrix;
(3) use spectral clustering to carry out cluster operation to Laplacian matrix, obtain the cluster result of term;
(4) optimization that mass-rent technology continues the cluster result that step 3 obtains is utilized;
(5) cluster result after the search records in user's past and step 4 being optimized maps, and utilizes the cluster structures after mapping as label recommendations to user.
2. a kind of book labels recommend method based on spectral clustering and mass-rent technology according to claim 1, it is characterized in that, described step 2 is specially: the term set Q={q obtaining all users from the retrieve data of user 1, q 2..., q n, wherein n is the sum of term, and q is independent term; Books set B={ b that term is clicked is obtained from the retrieval click data of user 1, b 2..., b m, wherein m clicks the sum of books, and b is independently books; Obtain term-books matrix M according to the books set B that the term set Q of all users and term click, for each of term-books matrix M, be defined as follows:
Wherein I ijit is the corresponding relation of i-th term and this book of jth; For each these books, if there is multiple term all to there is click behavior to this this book, so there is contact between these terms, term-term matrix D is built according to the contact between term, for each of term-term matrix D, if there is contact between two terms, be 1, otherwise be 0; Be placed on diagonal line by each column element of term-term matrix D is added the value obtained, other position is set to 0, thus forms new matrix W; Laplacian matrix L is obtained by formula L=D-W.
3. a kind of book labels recommend method based on spectral clustering and mass-rent technology according to claim 1, it is characterized in that, described step 3 is specially: for spectral clustering, and selected objective function RatioCut is:
RatioCut ( A i , . . . , A k ) = 1 2 Σ i = 1 k W ( A i , A i ‾ ) | A i | = Σ i = 1 k cut ( A i , A i ‾ ) | A i |
Wherein k is the number of cluster, A irepresent i-th cluster result, | A i| represent the term quantity in i-th cluster result, represent removing A ioutside other cluster result set, represent the weight sum of i-th cluster result and other cluster results, computing formula be wherein W (a, b) is the weight of cluster result a and cluster result b; Release according to the character of Laplacian matrix L and minimize objective function RatioCut and be equivalent to and minimize Laplacian matrix, thus use the dimensionality reduction of method realization to Laplacian matrix of SVD matrix decomposition, use K-mean clustering algorithm to complete the cluster operation of the Laplacian matrix after to dimensionality reduction.
4. a kind of book labels recommend method based on spectral clustering and mass-rent technology according to claim 1, it is characterized in that, described step 4 is specially: the user that in the cluster result of term step 3 obtained, term is corresponding is as the selected user of mass-rent, by the mode sending mail, the result of cluster is sent to selected user, the feedback of selected user is defined as:
Wherein, Query represents a term, positive feedback represents that user thinks that this term meets the theme of place cluster result, and negative feedback represents that user thinks that this term does not meet the theme of cluster result, and this term of zero feedback representation is difficult to judge whether to meet theme; According to the feedback information of selected user to a cluster result, this cluster result is carried out to the process of following three kinds of different modes:
A the feedback information of () selected user shows that this cluster can well show some themes, it is embodied in two aspects: be on the one hand that negative feedback result is less than positive feedback result, is the situation that the feedback information of user does not exist contradiction each other on the other hand; In this case, delete the negative feedback in cluster result, retain the term of positive feedback and zero feedback;
B the feedback information of () selected user is chaotic, be difficult to the quality showing this Clustering Effect, it is even contrary that it is embodied in the feedback information difference of several user to identical term; In this kind of situation, mean that the feedback information of current selected user is still not enough to judge this cluster, thus need to introduce new user, the distribution of mass-rent task again operation;
C the feedback information of () selected user shows that this cluster does not have clear and definite theme, be embodied in the feedback of selected user different or contrary more than the feedback information of the term of 50%; In this case, directly this cluster result is deleted.
CN201510270676.7A 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology Active CN104915388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510270676.7A CN104915388B (en) 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201510107290 2015-03-11
CN2015101072904 2015-03-11
CN201510270676.7A CN104915388B (en) 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology

Publications (2)

Publication Number Publication Date
CN104915388A true CN104915388A (en) 2015-09-16
CN104915388B CN104915388B (en) 2018-03-16

Family

ID=54084451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510270676.7A Active CN104915388B (en) 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology

Country Status (1)

Country Link
CN (1) CN104915388B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN106202184A (en) * 2016-06-27 2016-12-07 华中科技大学 A kind of books personalized recommendation method towards libraries of the universities and system
CN107301199A (en) * 2017-05-17 2017-10-27 北京融数云途科技有限公司 A kind of data label generation method and device
CN110851706A (en) * 2019-10-10 2020-02-28 百度在线网络技术(北京)有限公司 Training method and device for user click model, electronic equipment and storage medium
US11113580B2 (en) 2019-12-30 2021-09-07 Industrial Technology Research Institute Image classification system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901450A (en) * 2010-07-14 2010-12-01 中兴通讯股份有限公司 Media content recommendation method and media content recommendation system
CN102376063A (en) * 2011-11-29 2012-03-14 北京航空航天大学 Social-label-based method for optimizing personalized recommendation system
JP2013084216A (en) * 2011-10-12 2013-05-09 Ntt Docomo Inc Fixed phrase discrimination device and fixed phrase discrimination method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901450A (en) * 2010-07-14 2010-12-01 中兴通讯股份有限公司 Media content recommendation method and media content recommendation system
JP2013084216A (en) * 2011-10-12 2013-05-09 Ntt Docomo Inc Fixed phrase discrimination device and fixed phrase discrimination method
CN102376063A (en) * 2011-11-29 2012-03-14 北京航空航天大学 Social-label-based method for optimizing personalized recommendation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李默等: "基于标签和关联规则挖掘的图书组合推荐系统模型研究", 《计算机应用研究》 *
罗琳等: "标签技术在高效图书馆OPAC系统中的应用调查", 《图书情报工作》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN106202184A (en) * 2016-06-27 2016-12-07 华中科技大学 A kind of books personalized recommendation method towards libraries of the universities and system
CN106202184B (en) * 2016-06-27 2019-05-31 华中科技大学 A kind of books personalized recommendation method and system towards libraries of the universities
CN107301199A (en) * 2017-05-17 2017-10-27 北京融数云途科技有限公司 A kind of data label generation method and device
CN107301199B (en) * 2017-05-17 2021-02-12 北京融数云途科技有限公司 Data tag generation method and device
CN110851706A (en) * 2019-10-10 2020-02-28 百度在线网络技术(北京)有限公司 Training method and device for user click model, electronic equipment and storage medium
CN110851706B (en) * 2019-10-10 2022-11-01 百度在线网络技术(北京)有限公司 Training method and device for user click model, electronic equipment and storage medium
US11838377B2 (en) 2019-10-10 2023-12-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus, electronic device and storage medium for training user click model
US11113580B2 (en) 2019-12-30 2021-09-07 Industrial Technology Research Institute Image classification system and method

Also Published As

Publication number Publication date
CN104915388B (en) 2018-03-16

Similar Documents

Publication Publication Date Title
CN102254043B (en) Semantic mapping-based clothing image retrieving method
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN104484431B (en) A kind of multi-source Personalize News webpage recommending method based on domain body
CN104915388A (en) Book tag recommendation method based on spectral clustering and crowdsourcing technology
CN104008106B (en) A kind of method and device obtaining much-talked-about topic
CN103049440A (en) Recommendation processing method and processing system for related articles
CN102708130A (en) Scalable engine that computes user micro-segments for offer matching
CN103793489A (en) Method for discovering topics of communities in on-line social network
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN106547864A (en) A kind of Personalized search based on query expansion
CN102566945A (en) Method and system for realizing automatic acquisition and on-demand printing of book
CN104615734B (en) A kind of community management service big data processing system and its processing method
CN106227510A (en) Method and device is recommended in application
CN112632405A (en) Recommendation method, device, equipment and storage medium
CN107291895A (en) A kind of quick stratification document searching method
Tian et al. A music recommendation system based on logistic regression and eXtreme gradient boosting
CN104899702B (en) Decoration norm for detailed estimates management system based on big data and management method
CN103761286A (en) Method for retrieving service resources on basis of user interest
CN115936624A (en) Basic level data management method and device
CN101840438B (en) Retrieval system oriented to meta keywords of source document
CN113918724A (en) Method for constructing river and lake health knowledge map
CN110489665B (en) Microblog personalized recommendation method based on scene modeling and convolutional neural network
CN107657067B (en) Cosine distance-based leading-edge scientific and technological information rapid pushing method and system
CN109803022A (en) A kind of digitalization resource shared system and its method of servicing
CN114722304A (en) Community search method based on theme on heterogeneous information network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant