CN104915388B - It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology - Google Patents

It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology Download PDF

Info

Publication number
CN104915388B
CN104915388B CN201510270676.7A CN201510270676A CN104915388B CN 104915388 B CN104915388 B CN 104915388B CN 201510270676 A CN201510270676 A CN 201510270676A CN 104915388 B CN104915388 B CN 104915388B
Authority
CN
China
Prior art keywords
term
user
cluster
mrow
msub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510270676.7A
Other languages
Chinese (zh)
Other versions
CN104915388A (en
Inventor
张寅�
魏宝刚
尹彦飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201510270676.7A priority Critical patent/CN104915388B/en
Publication of CN104915388A publication Critical patent/CN104915388A/en
Application granted granted Critical
Publication of CN104915388B publication Critical patent/CN104915388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Method is recommended based on spectral clustering and the book labels of mass-rent technology the invention discloses a kind of, this method is applied to digital library system, Laplacian matrixes are built by using the retrieval click logs of user, and term is clustered using spectral clustering, afterwards by using mass-rent technology, lasting optimization is carried out to the result of cluster, finally the result of optimization is applied in commending system.The present invention lifts the degree of accuracy of term clustering, so as to improve accuracy of the system in terms of label recommendations using the term of user as label by the combination of spectral clustering and mass-rent technology.

Description

It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology
Technical field
The invention belongs to the book labels recommended technology based on spectral clustering and mass-rent technology, be related to it is a kind of based on spectral clustering and The book labels of mass-rent technology recommend method.
Background technology
With being on the increase for internet information, explosive growth is presented in information, and information, which is rationally efficiently sorted out, to be turned into The key that information effectively utilizes.Traditional classifying method is mainly carried out by artificial mode, and on the premise of magnanimity information, The mode that such information is sorted out is hard to carry on, thus has engendered the novel information classifying mode using label as core, And have become the key of the Internet, applications.Among digital library system, label essentially from book information, while with During family uses system, the term of user, books index information can also be added among system as a kind of label, And the relation of furthered using label as tie user and books, lifting user have found the efficiency of books.
Meanwhile got growing concern for for the application of mass data, commending system.User obtains the mode of information The vertical retrieval of domain knowledge is retrieved by full network type information, then it is continuous to current commending system, the acquisition speed of information Accelerate, it is increasingly notable for the information personalized continuous improvement of different user, contribution of the commending system in terms of system availability. Clustering algorithm is the key method of data mining, and clustering algorithm is used for realizing the cluster to article, user in commending system, and Run by the iteration of algorithm to optimize the effect of cluster.
The content of the invention
It is an object of the invention to the deficiency utilized for existing commending system to term, there is provided one kind is used for digitized map Method is recommended based on spectral clustering and the book labels of mass-rent technology on book shop.
The purpose of the present invention is achieved through the following technical solutions:A kind of books based on spectral clustering and mass-rent technology Label recommendation method, comprise the following steps:
(1) the retrieval data and retrieval click data of user are filtered out from result collection system or Web daily records;
(2) using the retrieval data and retrieval click data of user, term-books matrix is built, according to term-figure Book matrix obtains the Laplacian matrixes of term-term;
(3) cluster operation is carried out to Laplacian matrixes using spectral clustering, obtains the cluster result of term;
(4) lasting optimization is carried out to the cluster result obtained by step 3 using mass-rent technology;
(5) cluster result after the past retrieval record of user is optimized with step 4 is mapped, and utilizes gathering after mapping Class formation is as label recommendations to user.
Further, the step 2 is specially:The retrieval set of words Q of all users is obtained from the retrieval data of user ={ q1,q2,…,qn, wherein n is the sum of term, and q is independent retrieval word;Examined from the retrieval click data of user The books set B={ b that rope word is clicked on1,b2,…,bm, wherein m is the sum for clicking on books, and b is independent books;According to institute The books set B for having the retrieval set of words Q of user and term to click on obtains term-books matrix M, for term-figure Book matrix M each single item, is defined as follows:
Wherein IijFor the corresponding relation of this book of i-th of term and jth;For each books, if multiple retrievals There is click behavior in word, then contact between these terms be present, built according to the contact between term to this this book Term-term matrix D, for each single item of term-term matrix D, if contact between two terms be present It is then 1, is otherwise 0;It is placed in by the value that each column element of term-term matrix D is added to obtain on diagonal, its Its position is set to 0, so as to form new matrix W;Laplacian matrix Ls are obtained by formula L=D-W.
Further, the step 3 is specially:For spectral clustering, selected object function RatioCut is:
Wherein k be cluster number, AiIth cluster result is represented, | Ai| represent the term in ith cluster result Quantity,Represent to remove AiOutside other cluster result set,Represent ith cluster result and other clusters As a result weight sum,Calculation formula beWherein W (a, b) is cluster As a result a and cluster result b weight;Released according to the property of Laplacian matrix Ls and minimize object function RatioCut equivalences In minimizing Laplacian matrixes, the dimensionality reduction to Laplacian matrixes is realized thereby using the method for SVD matrix decompositions, is used K-mean clustering algorithms complete the cluster operation to the Laplacian matrixes after dimensionality reduction.
Further, the step 4 is specially:In the cluster result for the term that step 3 is obtained corresponding to term Selected user of the user as mass-rent, selected user is sent to by way of sending mail by the result of cluster, selectes user Feedback be defined as:
Wherein, Query represents a term, and positive feedback represents that user thinks that the term meets place cluster result Theme, negative-feedback represent that user thinks that the term does not meet the theme of cluster result, and the zero feedback representation term is difficult to sentence It is disconnected whether to meet theme;Feedback information according to selected user to a cluster result, the cluster result is carried out following three kinds The processing of different modes:
(a) feedback information for selecting user shows that the cluster can be very good to show some theme, and it is embodied in Two aspects:On the one hand it is that negative-feedback result is less than positive feedback result, is on the other hand that the feedback information of user is not present each other The situation of contradiction;In this case, the negative-feedback in cluster result is deleted, retains the term of positive feedback and zero feedback;
(b) feedback information for selecting user is chaotic, it is difficult to shows the quality of the Clustering Effect, it is embodied in several use Family is different to the feedback information of identical term or even opposite;In the case of this kind, it is meant that the feedback information of current selected user Still it is not enough to judge the cluster, thus needs to introduce new user, again mass-rent task distribution operation;
(c) feedback information for selecting user shows that the cluster does not have clear and definite theme, is embodied in selected user Feedback in term more than 50% feedback information it is different or opposite;In this case, directly the cluster result is deleted Remove.
The beneficial effects of the invention are as follows:This method is clustered using retrieval word information of the spectral clustering to user, and is used The lasting optimization of result of the mass-rent technology to cluster, it is final to realize the effect that book labels recommendation is improved using term.This hair It is bright on the basis of cluster result, it is proposed that the purpose optimized to cluster result is realized by using mass-rent technology, lead to The feedback information for collecting multiple users to cluster result is crossed to judge and optimize the result of cluster, and the result of cluster is applied to Among commending system.
Brief description of the drawings
Fig. 1 is the flow chart that the present invention recommends method based on spectral clustering and the book labels of mass-rent technology.
Embodiment
The present invention is described in further detail below in conjunction with the accompanying drawings.
As shown in figure 1, the present invention is a kind of to recommend method, including following step based on spectral clustering and the book labels of mass-rent technology Suddenly:
(1) the retrieval data and retrieval click data of user are filtered out from result collection system or Web daily records;
(2) using the retrieval data and retrieval click data of user, term-books matrix is built, according to term-figure Book matrix obtains the Laplacian matrixes of term-term;Specially:All users are obtained from the retrieval data of user Retrieval set of words Q={ q1,q2,...,qn, wherein n is the sum of term, and q is independent retrieval word;From the Access Points of user Hit in data and obtain the books set B={ b of term click1,b2,...,bm, wherein m is the sum for clicking on books, and b is only Vertical books;Term-books matrix is obtained according to the retrieval set of words Q and term of all users the books set B clicked on M, for term-books matrix M each single item, it is defined as follows:
Wherein IijFor the corresponding relation of this book of i-th of term and jth;For each books, if multiple retrievals There is click behavior in word, then contact between these terms be present, built according to the contact between term to this this book Term-term matrix D, for each single item of term-term matrix D, if contact between two terms be present It is then 1, is otherwise 0;It is placed in by the value that each column element of term-term matrix D is added to obtain on diagonal, its Its position is set to 0, so as to form new matrix W;Laplacian matrix Ls are obtained by formula L=D-W.
(3) cluster operation is carried out to Laplacian matrixes using spectral clustering, obtains the cluster result of term;Specially: For spectral clustering, selected object function RatioCut is:
Wherein k be cluster number, AiIth cluster result is represented, | Ai| represent the term in ith cluster result Quantity,Represent to remove AiOutside other cluster result set,Represent ith cluster result and other clusters As a result weight sum,Calculation formula beWherein W (a, b) is cluster As a result a and cluster result b weight;Released according to the property of Laplacian matrix Ls and minimize object function RatioCut equivalences In minimizing Laplacian matrixes, the dimensionality reduction to Laplacian matrixes is realized thereby using the method for SVD matrix decompositions, is used K-mean clustering algorithms complete the cluster operation to the Laplacian matrixes after dimensionality reduction.
(4) lasting optimization is carried out to the cluster result obtained by step 3 using mass-rent technology;Specially:Step 3 is obtained To term cluster result in selected user of the user corresponding to term as mass-rent, by way of sending mail will The result of cluster is sent to selected user, and the feedback of selected user is defined as:
Wherein, Query represents a term, and positive feedback represents that user thinks that the term meets place cluster result Theme, negative-feedback represent that user thinks that the term does not meet the theme of cluster result, and the zero feedback representation term is difficult to sentence It is disconnected whether to meet theme;Feedback information according to selected user to a cluster result, the cluster result is carried out following three kinds The processing of different modes:
(a) feedback information for selecting user shows that the cluster can be very good to show some theme, and it is embodied in Two aspects:On the one hand it is that negative-feedback result is less than positive feedback result, is on the other hand that the feedback information of user is not present each other The situation of contradiction;In this case, the negative-feedback in cluster result is deleted, retains the term of positive feedback and zero feedback;
(b) feedback information for selecting user is chaotic, it is difficult to shows the quality of the Clustering Effect, it is embodied in several use Family is different to the feedback information of identical term or even opposite;In the case of this kind, it is meant that the feedback information of current selected user Still it is not enough to judge the cluster, thus needs to introduce new user, again mass-rent task distribution operation;
(c) feedback information for selecting user shows that the cluster does not have clear and definite theme, is embodied in selected user Feedback in term more than 50% feedback information it is different or opposite;In this case, directly the cluster result is deleted Remove.
(5) cluster result after the past retrieval record of user is optimized with step 4 is mapped, and utilizes gathering after mapping Class formation is as label recommendations to user.

Claims (3)

1. a kind of recommend method based on spectral clustering and the book labels of mass-rent technology, it is characterised in that comprises the following steps:
(1) the retrieval data and retrieval click data of user are filtered out from result collection system or Web daily records;
(2) using the retrieval data and retrieval click data of user, term-books matrix is built, according to term-books square Battle array obtains the Laplacian matrixes of term-term;Specially:The inspection of all users is obtained from the retrieval data of user Rope set of words Q={ q1,q2,…,qn, wherein n is the sum of term, and q is independent retrieval word;From the retrieval hits of user The books set B={ b of term click are obtained in1,b2,…,bm, wherein m is the sum for clicking on books, and b is independent figure Book;Term-books matrix M is obtained according to the retrieval set of words Q and term of all users the books set B clicked on, for Term-books matrix M each single item, is defined as follows:
Wherein IijFor the corresponding relation of this book of i-th of term and jth;For each books, if multiple terms are equal Click behavior to this this book be present, then contact between these terms be present, retrieval is built according to the contact between term Word-term matrix D, for each single item of term-term matrix D, it is if it contact be present between two terms 1, it is otherwise 0;It is placed in by the value that each column element of term-term matrix D is added to obtain on diagonal, Qi Tawei Install as 0, so as to form new matrix W;Laplacian matrix Ls are obtained by formula L=D-W;
(3) cluster operation is carried out to Laplacian matrixes using spectral clustering, obtains the cluster result of term;
(4) lasting optimization is carried out to the cluster result obtained by step 3 using mass-rent technology;
(5) cluster result after the past retrieval record of user is optimized with step 4 is mapped, and utilizes the cluster knot after mapping Structure is as label recommendations to user.
2. a kind of according to claim 1 recommend method based on spectral clustering and the book labels of mass-rent technology, it is characterised in that The step 3 is specially:For spectral clustering, selected object function RatioCut is:
<mrow> <mi>R</mi> <mi>a</mi> <mi>t</mi> <mi>i</mi> <mi>o</mi> <mi>C</mi> <mi>u</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <mn>...</mn> <mo>,</mo> <msub> <mi>A</mi> <mi>k</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mfrac> <mrow> <mi>W</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <mover> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&amp;OverBar;</mo> </mover> <mo>)</mo> </mrow> </mrow> <mrow> <mo>|</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>|</mo> </mrow> </mfrac> <mo>=</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>k</mi> </munderover> <mfrac> <mrow> <mi>c</mi> <mi>u</mi> <mi>t</mi> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>,</mo> <mover> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>&amp;OverBar;</mo> </mover> <mo>)</mo> </mrow> </mrow> <mrow> <mo>|</mo> <msub> <mi>A</mi> <mi>i</mi> </msub> <mo>|</mo> </mrow> </mfrac> </mrow>
Wherein k be cluster number, AiIth cluster result is represented, | Ai| the term quantity in ith cluster result is represented,Represent to remove AiOutside other cluster result set,Represent ith cluster result and other cluster results Weight sum,Calculation formula beWherein W (a, b) be cluster result a with Cluster result b weight;Minimum object function RatioCut is released according to the property of Laplacian matrix Ls and is equivalent to minimum Change Laplacian matrixes, realize the dimensionality reduction to Laplacian matrixes thereby using the method for SVD matrix decompositions, use K-mean Clustering algorithm completes the cluster operation to the Laplacian matrixes after dimensionality reduction.
3. a kind of according to claim 1 recommend method based on spectral clustering and the book labels of mass-rent technology, it is characterised in that The step 4 is specially:User corresponding to term is as the selected of mass-rent in the cluster result for the term that step 3 is obtained User, the result of cluster is sent to selected user by way of sending mail, the feedback of selected user is defined as:
Wherein, Query represents a term, and positive feedback represents that user thinks that the term meets the master of place cluster result Topic, negative-feedback represent that user thinks that the term does not meet the theme of cluster result, and the zero feedback representation term is difficult to judge Whether theme is met;Feedback information according to selected user to a cluster result, following three kinds are carried out to the cluster result not With the processing of mode:
(a) feedback information for selecting user shows that the cluster can be very good to show some theme, and it is embodied in two Aspect:On the one hand it is that negative-feedback result is less than positive feedback result, is on the other hand that contradiction each other is not present in the feedback information of user Situation;In this case, the negative-feedback in cluster result is deleted, retains the term of positive feedback and zero feedback;
(b) feedback information for selecting user is chaotic, it is difficult to shows the quality of the Clustering Effect, it is embodied in several users couple The feedback information of identical term is different or even opposite;In the case of this kind, it is meant that the feedback information of current selected user is still not It is enough to judge the cluster, thus needs to introduce new user, again mass-rent task distribution operation;
(c) feedback information for selecting user shows that the cluster does not have clear and definite theme, is embodied in the anti-of selected user The feedback information of term in feedback more than 50% is different or opposite;In this case, directly the cluster result is deleted.
CN201510270676.7A 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology Active CN104915388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510270676.7A CN104915388B (en) 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
CN201510107290 2015-03-11
CN2015101072904 2015-03-11
CN201510270676.7A CN104915388B (en) 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology

Publications (2)

Publication Number Publication Date
CN104915388A CN104915388A (en) 2015-09-16
CN104915388B true CN104915388B (en) 2018-03-16

Family

ID=54084451

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510270676.7A Active CN104915388B (en) 2015-03-11 2015-05-26 It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology

Country Status (1)

Country Link
CN (1) CN104915388B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105426826A (en) * 2015-11-09 2016-03-23 张静 Tag noise correction based crowd-sourced tagging data quality improvement method
CN106202184B (en) * 2016-06-27 2019-05-31 华中科技大学 A kind of books personalized recommendation method and system towards libraries of the universities
CN107301199B (en) * 2017-05-17 2021-02-12 北京融数云途科技有限公司 Data tag generation method and device
CN110851706B (en) 2019-10-10 2022-11-01 百度在线网络技术(北京)有限公司 Training method and device for user click model, electronic equipment and storage medium
US11113580B2 (en) 2019-12-30 2021-09-07 Industrial Technology Research Institute Image classification system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901450A (en) * 2010-07-14 2010-12-01 中兴通讯股份有限公司 Media content recommendation method and media content recommendation system
CN102376063A (en) * 2011-11-29 2012-03-14 北京航空航天大学 Social-label-based method for optimizing personalized recommendation system
JP2013084216A (en) * 2011-10-12 2013-05-09 Ntt Docomo Inc Fixed phrase discrimination device and fixed phrase discrimination method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101901450A (en) * 2010-07-14 2010-12-01 中兴通讯股份有限公司 Media content recommendation method and media content recommendation system
JP2013084216A (en) * 2011-10-12 2013-05-09 Ntt Docomo Inc Fixed phrase discrimination device and fixed phrase discrimination method
CN102376063A (en) * 2011-11-29 2012-03-14 北京航空航天大学 Social-label-based method for optimizing personalized recommendation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于标签和关联规则挖掘的图书组合推荐系统模型研究;李默等;《计算机应用研究》;20140831;全文 *
标签技术在高效图书馆OPAC系统中的应用调查;罗琳等;《图书情报工作》;20130228;全文 *

Also Published As

Publication number Publication date
CN104915388A (en) 2015-09-16

Similar Documents

Publication Publication Date Title
CN104915388B (en) It is a kind of that method is recommended based on spectral clustering and the book labels of mass-rent technology
CN101853299B (en) Image searching result ordering method based on perceptual cognition
CN103631929B (en) A kind of method of intelligent prompt, module and system for search
CN103778148B (en) Life cycle management method and equipment for data file of Hadoop distributed file system
CN101271476B (en) Relevant feedback retrieval method based on clustering in network image search
US20050203943A1 (en) Personalized classification for browsing documents
CN105205689A (en) Method and system for recommending commercial tenant
CN104375992A (en) Address matching method and device
CN102708130A (en) Scalable engine that computes user micro-segments for offer matching
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN106547864A (en) A kind of Personalized search based on query expansion
CN108171071A (en) A kind of multiple key towards cloud computing can sort cipher text retrieval method
CN104615734B (en) A kind of community management service big data processing system and its processing method
CN106095964A (en) A kind of method that data are carried out visualization filing and search
Tian et al. A music recommendation system based on logistic regression and eXtreme gradient boosting
CN107357845A (en) A kind of tour interest commending system and recommendation method based on Spark
CN102737123A (en) Multidimensional data distribution method
CN109819002B (en) Data pushing method and device, storage medium and electronic device
CN106503246A (en) Method for establishing ancient book intelligent digital document library
CN106649380A (en) Hot spot recommendation method and system based on tag
Anand et al. Discovering case knowledge using data mining
CN108256083A (en) Content recommendation method based on deep learning
Li et al. Annotating semantic tags of locations in location-based social networks
CN108009847A (en) The method for taking out shop embedding feature extractions under scene
CN107291951A (en) Data processing method, device, storage medium and processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant