CN104915388A - Book tag recommendation method based on spectral clustering and crowdsourcing technology - Google Patents
Book tag recommendation method based on spectral clustering and crowdsourcing technology Download PDFInfo
- Publication number
- CN104915388A CN104915388A CN201510270676.7A CN201510270676A CN104915388A CN 104915388 A CN104915388 A CN 104915388A CN 201510270676 A CN201510270676 A CN 201510270676A CN 104915388 A CN104915388 A CN 104915388A
- Authority
- CN
- China
- Prior art keywords
- term
- cluster
- user
- matrix
- result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a book tag recommendation method based on spectral clustering and a crowdsourcing technology. After the method is applied to a digital library system, a Laplacian matrix is constructed through a retrieval click log of a user, retrieval words are clustered through the spectral clustering, a spectral clustering result is continuously optimized through the crowdsourcing technology, and finally, the optimized result is applied to a recommendation system. According to the method, the retrieval words of the user serve as tags, the clustering accuracy of the retrieval words is improved through combination of the spectral clustering and the crowdsourcing technology, and therefore the accuracy of the system in the aspect of tag recommendation is improved.
Description
Technical field
The invention belongs to the book labels recommended technology based on spectral clustering and mass-rent technology, relate to a kind of book labels recommend method based on spectral clustering and mass-rent technology.
Background technology
Along with being on the increase of internet information, information presents explosive growth, and rationally sorting out efficiently information becomes the key that information effectively utilizes.Traditional classifying method carries out mainly through artificial mode, and under the prerequisite of magnanimity information, the mode of this kind of information categorization is hard to carry on, has thus engendered the novel information classifying mode taking label as core, and has become the key of internet, applications.In the middle of digital library system, label is mainly from book information, use in the process of system user simultaneously, term, the books index information of user also can join in the middle of system as a kind of label, and be that tie furthers the relation of user and books with label, promote the efficiency that user finds books.
Meanwhile, for the application of mass data, commending system is more and more paid close attention to.The mode of user's obtaining information is retrieved the vertical retrieval of domain knowledge by full network type information, arrive current commending system again, the acquisition speed of information is constantly accelerated, and improves constantly for the information personalized of different user, and the contribution of commending system in system availability is day by day remarkable.Clustering algorithm is the key method of data mining, and in commending system, clustering algorithm is used for realizing the cluster to article, user, and runs by the iteration of algorithm the effect optimizing cluster.
Summary of the invention
The object of the invention is to deficiency term utilized for existing commending system, provide a kind of in digital library the book labels recommend method based on spectral clustering and mass-rent technology.
The object of the invention is to be achieved through the following technical solutions: a kind of book labels recommend method based on spectral clustering and mass-rent technology, comprises the following steps:
(1) from result collection system or Web daily record, filter out retrieve data and the retrieval click data of user;
(2) utilize retrieve data and the retrieval click data of user, build term-books matrix, obtain the Laplacian matrix of term-term according to term-books matrix;
(3) use spectral clustering to carry out cluster operation to Laplacian matrix, obtain the cluster result of term;
(4) optimization that mass-rent technology continues the cluster result that step 3 obtains is utilized;
(5) cluster result after the search records in user's past and step 4 being optimized maps, and utilizes the cluster structures after mapping as label recommendations to user.
Further, described step 2 is specially: the term set Q={q obtaining all users from the retrieve data of user
1, q
2..., q
n, wherein n is the sum of term, and q is independent term; Books set B={ b that term is clicked is obtained from the retrieval click data of user
1, b
2..., b
m, wherein m clicks the sum of books, and b is independently books; Obtain term-books matrix M according to the books set B that the term set Q of all users and term click, for each of term-books matrix M, be defined as follows:
Wherein I
ijit is the corresponding relation of i-th term and this book of jth; For each these books, if there is multiple term all to there is click behavior to this this book, so there is contact between these terms, term-term matrix D is built according to the contact between term, for each of term-term matrix D, if there is contact between two terms, be 1, otherwise be 0; Be placed on diagonal line by each column element of term-term matrix D is added the value obtained, other position is set to 0, thus forms new matrix W; Laplacian matrix L is obtained by formula L=D-W.
Further, described step 3 is specially: for spectral clustering, and selected objective function RatioCut is:
Wherein k is the number of cluster, A
irepresent i-th cluster result, | A
i| represent the term quantity in i-th cluster result,
represent removing A
ioutside other cluster result set,
represent the weight sum of i-th cluster result and other cluster results,
computing formula be
wherein W (a, b) is the weight of cluster result a and cluster result b; Release according to the character of Laplacian matrix L and minimize objective function RatioCut and be equivalent to and minimize Laplacian matrix, thus use the dimensionality reduction of method realization to Laplacian matrix of SVD matrix decomposition, use K-mean clustering algorithm to complete the cluster operation of the Laplacian matrix after to dimensionality reduction.
Further, described step 4 is specially: the result of cluster, as the selected user of mass-rent, is sent to selected user by the mode sending mail by the user that in the cluster result of term step 3 obtained, term is corresponding, and the feedback of selected user is defined as:
Wherein, Query represents a term, positive feedback represents that user thinks that this term meets the theme of place cluster result, and negative feedback represents that user thinks that this term does not meet the theme of cluster result, and this term of zero feedback representation is difficult to judge whether to meet theme; According to the feedback information of selected user to a cluster result, this cluster result is carried out to the process of following three kinds of different modes:
A the feedback information of () selected user shows that this cluster can well show some themes, it is embodied in two aspects: be on the one hand that negative feedback result is less than positive feedback result, is the situation that the feedback information of user does not exist contradiction each other on the other hand; In this case, delete the negative feedback in cluster result, retain the term of positive feedback and zero feedback;
B the feedback information of () selected user is chaotic, be difficult to the quality showing this Clustering Effect, it is even contrary that it is embodied in the feedback information difference of several user to identical term; In this kind of situation, mean that the feedback information of current selected user is still not enough to judge this cluster, thus need to introduce new user, the distribution of mass-rent task again operation;
C the feedback information of () selected user shows that this cluster does not have clear and definite theme, be embodied in the feedback of selected user different or contrary more than the feedback information of the term of 50%; In this case, directly this cluster result is deleted.
The invention has the beneficial effects as follows: the method utilizes the term information of spectral clustering to user to carry out cluster, and use the optimization that mass-rent technology continues the result of cluster, final realization utilizes term to improve the effect of book labels recommendation.The present invention is on the basis of cluster result, propose by using mass-rent technology to realize the object be optimized cluster result, the feedback information of cluster result judged by collecting multiple user and optimizes the result of cluster, and the result of cluster is applied in the middle of commending system.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the book labels recommend method that the present invention is based on spectral clustering and mass-rent technology.
Embodiment
Below in conjunction with accompanying drawing, the present invention is described in further detail.
As shown in Figure 1, a kind of book labels recommend method based on spectral clustering and mass-rent technology of the present invention, comprises the following steps:
(1) from result collection system or Web daily record, filter out retrieve data and the retrieval click data of user;
(2) utilize retrieve data and the retrieval click data of user, build term-books matrix, obtain the Laplacian matrix of term-term according to term-books matrix; Be specially: the term set Q={q obtaining all users from the retrieve data of user
1, q
2..., q
n, wherein n is the sum of term, and q is independent term; Books set B={ b that term is clicked is obtained from the retrieval click data of user
1, b
2..., b
m, wherein m clicks the sum of books, and b is independently books; Obtain term-books matrix M according to the books set B that the term set Q of all users and term click, for each of term-books matrix M, be defined as follows:
Wherein I
ijit is the corresponding relation of i-th term and this book of jth; For each these books, if there is multiple term all to there is click behavior to this this book, so there is contact between these terms, term-term matrix D is built according to the contact between term, for each of term-term matrix D, if there is contact between two terms, be 1, otherwise be 0; Be placed on diagonal line by each column element of term-term matrix D is added the value obtained, other position is set to 0, thus forms new matrix W; Laplacian matrix L is obtained by formula L=D-W.
(3) use spectral clustering to carry out cluster operation to Laplacian matrix, obtain the cluster result of term; Be specially: for spectral clustering, selected objective function RatioCut is:
Wherein k is the number of cluster, A
irepresent i-th cluster result, | A
i| represent the term quantity in i-th cluster result,
represent removing A
ioutside other cluster result set,
represent the weight sum of i-th cluster result and other cluster results,
computing formula be
wherein W (a, b) is the weight of cluster result a and cluster result b; Release according to the character of Laplacian matrix L and minimize objective function RatioCut and be equivalent to and minimize Laplacian matrix, thus use the dimensionality reduction of method realization to Laplacian matrix of SVD matrix decomposition, use K-mean clustering algorithm to complete the cluster operation of the Laplacian matrix after to dimensionality reduction.
(4) optimization that mass-rent technology continues the cluster result that step 3 obtains is utilized; Be specially: the result of cluster, as the selected user of mass-rent, is sent to selected user by the mode sending mail by the user that in the cluster result of term step 3 obtained, term is corresponding, the feedback of selected user is defined as:
Wherein, Query represents a term, positive feedback represents that user thinks that this term meets the theme of place cluster result, and negative feedback represents that user thinks that this term does not meet the theme of cluster result, and this term of zero feedback representation is difficult to judge whether to meet theme; According to the feedback information of selected user to a cluster result, this cluster result is carried out to the process of following three kinds of different modes:
A the feedback information of () selected user shows that this cluster can well show some themes, it is embodied in two aspects: be on the one hand that negative feedback result is less than positive feedback result, is the situation that the feedback information of user does not exist contradiction each other on the other hand; In this case, delete the negative feedback in cluster result, retain the term of positive feedback and zero feedback;
B the feedback information of () selected user is chaotic, be difficult to the quality showing this Clustering Effect, it is even contrary that it is embodied in the feedback information difference of several user to identical term; In this kind of situation, mean that the feedback information of current selected user is still not enough to judge this cluster, thus need to introduce new user, the distribution of mass-rent task again operation;
C the feedback information of () selected user shows that this cluster does not have clear and definite theme, be embodied in the feedback of selected user different or contrary more than the feedback information of the term of 50%; In this case, directly this cluster result is deleted.
(5) cluster result after the search records in user's past and step 4 being optimized maps, and utilizes the cluster structures after mapping as label recommendations to user.
Claims (4)
1., based on a book labels recommend method for spectral clustering and mass-rent technology, it is characterized in that, comprise the following steps:
(1) from result collection system or Web daily record, filter out retrieve data and the retrieval click data of user;
(2) utilize retrieve data and the retrieval click data of user, build term-books matrix, obtain the Laplacian matrix of term-term according to term-books matrix;
(3) use spectral clustering to carry out cluster operation to Laplacian matrix, obtain the cluster result of term;
(4) optimization that mass-rent technology continues the cluster result that step 3 obtains is utilized;
(5) cluster result after the search records in user's past and step 4 being optimized maps, and utilizes the cluster structures after mapping as label recommendations to user.
2. a kind of book labels recommend method based on spectral clustering and mass-rent technology according to claim 1, it is characterized in that, described step 2 is specially: the term set Q={q obtaining all users from the retrieve data of user
1, q
2..., q
n, wherein n is the sum of term, and q is independent term; Books set B={ b that term is clicked is obtained from the retrieval click data of user
1, b
2..., b
m, wherein m clicks the sum of books, and b is independently books; Obtain term-books matrix M according to the books set B that the term set Q of all users and term click, for each of term-books matrix M, be defined as follows:
Wherein I
ijit is the corresponding relation of i-th term and this book of jth; For each these books, if there is multiple term all to there is click behavior to this this book, so there is contact between these terms, term-term matrix D is built according to the contact between term, for each of term-term matrix D, if there is contact between two terms, be 1, otherwise be 0; Be placed on diagonal line by each column element of term-term matrix D is added the value obtained, other position is set to 0, thus forms new matrix W; Laplacian matrix L is obtained by formula L=D-W.
3. a kind of book labels recommend method based on spectral clustering and mass-rent technology according to claim 1, it is characterized in that, described step 3 is specially: for spectral clustering, and selected objective function RatioCut is:
Wherein k is the number of cluster, A
irepresent i-th cluster result, | A
i| represent the term quantity in i-th cluster result,
represent removing A
ioutside other cluster result set,
represent the weight sum of i-th cluster result and other cluster results,
computing formula be
wherein W (a, b) is the weight of cluster result a and cluster result b; Release according to the character of Laplacian matrix L and minimize objective function RatioCut and be equivalent to and minimize Laplacian matrix, thus use the dimensionality reduction of method realization to Laplacian matrix of SVD matrix decomposition, use K-mean clustering algorithm to complete the cluster operation of the Laplacian matrix after to dimensionality reduction.
4. a kind of book labels recommend method based on spectral clustering and mass-rent technology according to claim 1, it is characterized in that, described step 4 is specially: the user that in the cluster result of term step 3 obtained, term is corresponding is as the selected user of mass-rent, by the mode sending mail, the result of cluster is sent to selected user, the feedback of selected user is defined as:
Wherein, Query represents a term, positive feedback represents that user thinks that this term meets the theme of place cluster result, and negative feedback represents that user thinks that this term does not meet the theme of cluster result, and this term of zero feedback representation is difficult to judge whether to meet theme; According to the feedback information of selected user to a cluster result, this cluster result is carried out to the process of following three kinds of different modes:
A the feedback information of () selected user shows that this cluster can well show some themes, it is embodied in two aspects: be on the one hand that negative feedback result is less than positive feedback result, is the situation that the feedback information of user does not exist contradiction each other on the other hand; In this case, delete the negative feedback in cluster result, retain the term of positive feedback and zero feedback;
B the feedback information of () selected user is chaotic, be difficult to the quality showing this Clustering Effect, it is even contrary that it is embodied in the feedback information difference of several user to identical term; In this kind of situation, mean that the feedback information of current selected user is still not enough to judge this cluster, thus need to introduce new user, the distribution of mass-rent task again operation;
C the feedback information of () selected user shows that this cluster does not have clear and definite theme, be embodied in the feedback of selected user different or contrary more than the feedback information of the term of 50%; In this case, directly this cluster result is deleted.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510270676.7A CN104915388B (en) | 2015-03-11 | 2015-05-26 | It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510107290 | 2015-03-11 | ||
CN2015101072904 | 2015-03-11 | ||
CN201510270676.7A CN104915388B (en) | 2015-03-11 | 2015-05-26 | It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104915388A true CN104915388A (en) | 2015-09-16 |
CN104915388B CN104915388B (en) | 2018-03-16 |
Family
ID=54084451
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510270676.7A Active CN104915388B (en) | 2015-03-11 | 2015-05-26 | It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104915388B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105426826A (en) * | 2015-11-09 | 2016-03-23 | 张静 | Tag noise correction based crowd-sourced tagging data quality improvement method |
CN106202184A (en) * | 2016-06-27 | 2016-12-07 | 华中科技大学 | A kind of books personalized recommendation method towards libraries of the universities and system |
CN107301199A (en) * | 2017-05-17 | 2017-10-27 | 北京融数云途科技有限公司 | A kind of data label generation method and device |
CN110851706A (en) * | 2019-10-10 | 2020-02-28 | 百度在线网络技术(北京)有限公司 | Training method and device for user click model, electronic equipment and storage medium |
US11113580B2 (en) | 2019-12-30 | 2021-09-07 | Industrial Technology Research Institute | Image classification system and method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901450A (en) * | 2010-07-14 | 2010-12-01 | 中兴通讯股份有限公司 | Media content recommendation method and media content recommendation system |
CN102376063A (en) * | 2011-11-29 | 2012-03-14 | 北京航空航天大学 | Social-label-based method for optimizing personalized recommendation system |
JP2013084216A (en) * | 2011-10-12 | 2013-05-09 | Ntt Docomo Inc | Fixed phrase discrimination device and fixed phrase discrimination method |
-
2015
- 2015-05-26 CN CN201510270676.7A patent/CN104915388B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901450A (en) * | 2010-07-14 | 2010-12-01 | 中兴通讯股份有限公司 | Media content recommendation method and media content recommendation system |
JP2013084216A (en) * | 2011-10-12 | 2013-05-09 | Ntt Docomo Inc | Fixed phrase discrimination device and fixed phrase discrimination method |
CN102376063A (en) * | 2011-11-29 | 2012-03-14 | 北京航空航天大学 | Social-label-based method for optimizing personalized recommendation system |
Non-Patent Citations (2)
Title |
---|
李默等: "基于标签和关联规则挖掘的图书组合推荐系统模型研究", 《计算机应用研究》 * |
罗琳等: "标签技术在高效图书馆OPAC系统中的应用调查", 《图书情报工作》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105426826A (en) * | 2015-11-09 | 2016-03-23 | 张静 | Tag noise correction based crowd-sourced tagging data quality improvement method |
CN106202184A (en) * | 2016-06-27 | 2016-12-07 | 华中科技大学 | A kind of books personalized recommendation method towards libraries of the universities and system |
CN106202184B (en) * | 2016-06-27 | 2019-05-31 | 华中科技大学 | A kind of books personalized recommendation method and system towards libraries of the universities |
CN107301199A (en) * | 2017-05-17 | 2017-10-27 | 北京融数云途科技有限公司 | A kind of data label generation method and device |
CN107301199B (en) * | 2017-05-17 | 2021-02-12 | 北京融数云途科技有限公司 | Data tag generation method and device |
CN110851706A (en) * | 2019-10-10 | 2020-02-28 | 百度在线网络技术(北京)有限公司 | Training method and device for user click model, electronic equipment and storage medium |
CN110851706B (en) * | 2019-10-10 | 2022-11-01 | 百度在线网络技术(北京)有限公司 | Training method and device for user click model, electronic equipment and storage medium |
US11838377B2 (en) | 2019-10-10 | 2023-12-05 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method, apparatus, electronic device and storage medium for training user click model |
US11113580B2 (en) | 2019-12-30 | 2021-09-07 | Industrial Technology Research Institute | Image classification system and method |
Also Published As
Publication number | Publication date |
---|---|
CN104915388B (en) | 2018-03-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102254043B (en) | Semantic mapping-based clothing image retrieving method | |
CN108363821A (en) | A kind of information-pushing method, device, terminal device and storage medium | |
CN104484431B (en) | A kind of multi-source Personalize News webpage recommending method based on domain body | |
CN104915388A (en) | Book tag recommendation method based on spectral clustering and crowdsourcing technology | |
CN104008106B (en) | A kind of method and device obtaining much-talked-about topic | |
CN103049440A (en) | Recommendation processing method and processing system for related articles | |
CN102708130A (en) | Scalable engine that computes user micro-segments for offer matching | |
CN103793489A (en) | Method for discovering topics of communities in on-line social network | |
CN104391908B (en) | Multiple key indexing means based on local sensitivity Hash on a kind of figure | |
CN106547864A (en) | A kind of Personalized search based on query expansion | |
CN102566945A (en) | Method and system for realizing automatic acquisition and on-demand printing of book | |
CN104615734B (en) | A kind of community management service big data processing system and its processing method | |
CN106227510A (en) | Method and device is recommended in application | |
CN112632405A (en) | Recommendation method, device, equipment and storage medium | |
CN107291895A (en) | A kind of quick stratification document searching method | |
Tian et al. | A music recommendation system based on logistic regression and eXtreme gradient boosting | |
CN104899702B (en) | Decoration norm for detailed estimates management system based on big data and management method | |
CN103761286A (en) | Method for retrieving service resources on basis of user interest | |
CN115936624A (en) | Basic level data management method and device | |
CN101840438B (en) | Retrieval system oriented to meta keywords of source document | |
CN113918724A (en) | Method for constructing river and lake health knowledge map | |
CN110489665B (en) | Microblog personalized recommendation method based on scene modeling and convolutional neural network | |
CN107657067B (en) | Cosine distance-based leading-edge scientific and technological information rapid pushing method and system | |
CN109803022A (en) | A kind of digitalization resource shared system and its method of servicing | |
CN114722304A (en) | Community search method based on theme on heterogeneous information network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |