CN105205139A - Personalized literature searching method - Google Patents

Personalized literature searching method Download PDF

Info

Publication number
CN105205139A
CN105205139A CN 201510592309 CN201510592309A CN105205139A CN 105205139 A CN105205139 A CN 105205139A CN 201510592309 CN201510592309 CN 201510592309 CN 201510592309 A CN201510592309 A CN 201510592309A CN 105205139 A CN105205139 A CN 105205139A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
searching
interest
results
information
keyword
Prior art date
Application number
CN 201510592309
Other languages
Chinese (zh)
Inventor
罗旭斌
Original Assignee
罗旭斌
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

The invention discloses a personalized literature searching method. The personalized literature searching method comprises the following steps: a, building a user information static library including but not limited to identity information and research field for each user and inputting the user information static library into a searching system; b, building an interest keyword library X including a plurality of interest keywords and interest degrees corresponding to each interest keyword for users; c, searching information: setting an input keyword set as Q, and searching to obtain the searching results R1, R2, and so on, Rn when the users search the information; adding each interest keyword into the keyword set, then searching to obtain searching results, moving ranks of repeated elements forwards if the searching results have the repeated elements with R1, R2, and so on, Rn, determining moving distances according to the interest degree and finally obtaining searching results. Through the adoption of the method, each information searching result is adjusted on the basis of the user interest keyword library so as to output personalized searching results of the users, and the output searching results are more accurate.

Description

一种个性化文献检索方法 A personalized document retrieval

技术领域 FIELD

[0001] 本发明涉及文献、信息检索技术领域,确切地说涉及一种个性化文献的检索方法。 [0001] The present invention relates to a document, the field of information retrieval and, more particularly relates to a method of retrieval of a personalized document.

背景技术 Background technique

[0002] 文献检索是指根据学习和工作的需要获取文献的过程。 [0002] Document retrieval is a process of acquiring the literature study and work needed. 现有的文献检索系统大多数都是基于文献本身的属性,包括关键词、作者、参考文献等静态信息进行构建,没有将文献需求者或检索人的特性纳入文献检索过程中,也就是说任何人输入同样的检索关键词时,得到的检索结果是相同的。 Most of the existing literature retrieval systems are based on the properties of the document itself, including the static keyword, author, etc. References to build, not those who need to document or retrieve human characteristics into the literature search process, that is to say any when people enter the same search keyword, the search result obtained is the same. 在这个信息爆炸的时代,文献检索同样面临海量的信息检索结果,如果能够将检索人的身份特质纳入检索过程,对检索结果进行个性化匹配,将有助于得到很有用的检索结果。 In this era of information explosion, literature search also face a flood of information retrieval result, if you retrieve the identity of the character can be included in the search process, the search results are personalized to match, it will help to get search results very useful. 比如,一个研究物流的人员在检索“network”时得到的检索结果和一个研究光纤通信的研究者输入同样关键词时得到的检索结果应该有所区别,以反映他们各自研究领域的研究成果,即根据其身份进行个性化的文件检索。 For example, a study of logistics personnel search results obtained when retrieving "network" and a study of the search results obtained when the same keyword input optical fiber communication researchers should be differentiated to reflect the results of their research in their fields of study, namely file retrieval personalized according to their identity.

[0003] 公开号为CN 101373486,公开日为2009年2月25日的中国专利文献公开了一种基于用户兴趣模型的个性化摘要系统,该个性化摘要系统由WEB信息检索单元、用户兴趣单元和个性化摘要单元组成。 [0003] Publication No. CN 101373486, Publication date discloses a personalized summary based on user interest model system for Chinese patent literature February 25, 2009, which is personalized digest system consists of WEB information retrieval unit, user interest unit and personalized digest units. 该个性化摘要系统通过分析用户检索日志,利用概念聚类方法建立和/或更新以层次概念结构描述的用户兴趣模型;然后依据该用户兴趣模型与检索结果进行用户兴趣与检索结果中句子相似度的解析,从而得到满足用户的个性化摘要。 The personalized digest system by analyzing the user to retrieve logs, establish and / or update the clustering method using the concept of user interest model in a hierarchical structure concepts described; then the user's interest and sentence similarity search results according to the user interest model with search results the resolution, to give a summary to meet individual user. 采用的个性化句子评分处理得到的个性化摘要充分考虑了用户的兴趣特点,使摘要的生成过程根据用户的兴趣进行匹配,可以提高摘要的有效性以及用户的满意度。 Personalized digest individualized sentence scoring process used to obtain full account of the characteristics of the user's interest, the summary generation process to match the user's interests, can improve the effectiveness of summary and user satisfaction.

[0004] 以上述专利文献为代表的现有技术,虽然也采用兴趣模型与检索结果进行用户兴趣与检索结果中句子相似度的解析,从而得到满足用户个性化的摘要,但其需要对句子相似度进行解析,解析后展现出来的个性化摘要系统准确率并不够高,且检索方式复杂。 [0004] In the above-described Patent Document prior art as represented, although interest model using the search results user interest in the search result sentence similarity analysis, to obtain a summary of individual users satisfied, it requires a similar sentence degree parse parsed emerged with a personalized summary and system accuracy is not high enough, and complex search methods. 同时,由于文献检索系统的用户多是专业的研究者,检索的内容也主要是专业研究文献,所得结果是自动文摘,而对于专业研究文献检索结果的匹配性不够好。 At the same time, due to the multi-user document retrieval system is professional researchers, content retrieval is mainly specializing in the research literature, the result was automatic summarization, and for professional research literature retrieval result of matching is not good enough.

发明内容 SUMMARY

[0005] 本发明旨在针对上述现有技术所存在的缺陷和不足,提供一种个性化文献检索方法,采用本方法进行检索时,增加了用户的兴趣关键词及对应的兴趣度,对于每个信息检索结果,都是基于用户兴趣关键词库进行调整,从而输出用户个性化的检索结果,使得输出的检索结果更加准确,检索方法简单。 [0005] The present invention is intended for the above-mentioned prior art shortcomings and deficiencies exist, there is provided a method of personalized document retrieval when retrieval using this method, increasing the interest of the user's interest and the corresponding keywords, for each a search result information retrieval results are based on user interest keyword database to adjust to output search results personalized for the user, so that the output is more accurate, simple retrieval method.

[0006] 本发明是通过采用下述技术方案实现的: [0006] The present invention is achieved by adopting the following technical solution:

一种个性化文献检索方法,其特征在于步骤如下: A personalized document retrieval method, comprising the steps of:

a、为每个用户构建用户信息静态库:包括并不限于用户的身份信息和研究领域,并由用户输入至检索系统; a, to build a static library user information for each user: comprising identity information and are not limited to research user, input by the user to the retrieval system;

b、为每个用户构建用户的兴趣关键词库X:包括多个兴趣关键词和每个兴趣关键词相应的兴趣度;将兴趣关键词库X形式化表达为xl,x2,…,xm (其中m为自然数),对于每项兀素x= (k, w),其中k为兴趣关键词,w为该兴趣关键词对应的兴趣度,兴趣关键词库X初始化为步骤a中用户输入的关注领域,并将兴趣度统一赋予一个静态值; b, constructed for each user keyword database user interests X: interest comprises a plurality of keywords and keywords each interest corresponding degree of interest; the formal expression of interest is a keyword database X xl, x2, ..., xm ( wherein m is a natural number), Wu for each pixel x = (k, w), where k is the keyword of interest, for W corresponding to the keyword of interest of interest, X interest keyword database is initialized in step a user input areas of concern and interest of unity given a static value;

C、信息检索:用户进行信息检索时,设定输入的关键词集合为Q,进行检索,得到检索结果Rl,R2,….,Rn,(η为自然数);再将用户的兴趣关键词库X中的每个兴趣关键词加入到关键词集合中,再进行检索,得到的检索结果如与Rl,R2,….,Rn有重复元素,则将这些重复元素的排名向前移动,移动的距离根据这个兴趣关键词的兴趣度确定; C, Information Retrieval: the user for information retrieval, setting input keyword set is Q, retrieve, obtain search results Rl, R2, ..., Rn, (η is a natural number); keyword database then the user's interest. X in each interest keywords to the keyword set, then retrieved, and the retrieved results, such as Rl, R2, ...., Rn duplicate elements, will rank these repeat elements move forward, moving according to this keyword of interest is determined from the degree of interest;

若该用户兴趣关键词库X中有m个兴趣关键词,则需要做m次信息检索动作,最后调整完成的检索结果作为最终结果输出。 If there are m interested keyword, needed to be done m times of operation of the user-interest information retrieval keyword database X, the final adjustment of the final result is output as a search result.

[0007] 兴趣关键词库X的更新:每次用户输入检索关键词时,将检索关键词加入到兴趣关键词库X中,形成一个新的兴趣关键词,并将该兴趣关键词对应的兴趣度初始化为一个静态值;如某个检索关键词k在兴趣关键词库X中已经存在,则将该兴趣关键词对应的兴趣度w加I。 [0007] X is updated keyword database interest: each time the user inputs a search keyword, the search keyword of interest is added to the keyword database X to form a new keyword of interest, corresponding to the keyword of interest and the interest initializing a static value; k as a search key already exists in the keyword database interest X, then the degree of interest corresponding to the keyword of interest plus w I.

[0008]同时,每次检索后,将所有兴趣关键词的兴趣度值做衰减操作,即减少一个数值e。 [0008] Meanwhile, after each search, the keywords of all the interest level of interest for attenuation operation, i.e., a value reduced e. 此数值反映兴趣衰减的速度,可以为一个固定值,如0.01,也可以与用户的检索习惯相关,做自适应的学习确定。 This value reflects the decay rate of interest, may be a fixed value, such as 0.01, may be associated with the search user habits, determined to do adaptive learning. 如兴趣度衰减到小于等于0,则将其对应的兴趣关键词从兴趣关键词库X中删除,以保持兴趣关键词库的鲜活性。 The attenuation of interest to less than or equal to 0, it is deleted from the keyword corresponding to interest keyword database X interest in sexual interest to keep alive the keyword database.

[0009] 所述关键词集合中包括兴趣关键词和检索关键词。 [0009] The keyword set includes the search keyword and a keyword of interest.

[0010] 与现有技术相比,本发明所达到的有益效果如下: [0010] Compared with the prior art, the present invention is to achieve the following advantageous effects:

1、采用本发明所述的abc三个步骤,在进行信息检索时,先为每个用户构建了兴趣关键词库X,在检索时,是先采用检索关键词进行检索获得结果,再增加用户的兴趣关键词进入关键词集合获得检索结果,最后将重复的元素的排名前移,移动的距离根据兴趣关键词的兴趣度来确定。 1, using three abc step of the present invention, during the information retrieval, to build interest X keyword database for each user, when retrieving, using a search keyword searching to obtain the results, add users interest in entering the keyword set of keywords search result is obtained, the final ranking will be repeating elements forward, the distance traveled to determine the degree of interest based on keywords of interest. 这样的方式,对每个信息检索结果,都是基于用户兴趣关键词库进行调整,输出用户个性化的检索结果,使检索结果更匹配用户的需求。 In this way, the search results for each information are based on user interest keyword library search results to adjust the output of individual users, to make search results more closely match the needs of users.

[0011] 2、本方法采用对兴趣关键词库X进行更新,是根据每次用户的信息检索行为,对用户兴趣关键词库进行动态调整,使得系统不断加深对用户的了解,从而使得未来检索结果更加匹配其兴趣,检索结果更加准确。 [0011] 2, using the method of X interest keyword database updated, based on the behavior of each user's information retrieval, user interest keyword database dynamically adjusted, so that the system continues to deepen understanding of the user, so that future retrieval The results more closely match their interests, the search results more accurate.

具体实施方式 detailed description

[0012] 作为本发明的最佳实施,其公开了一种个性化文献检索方法,其步骤如下: [0012] As a preferred embodiment of the present invention, which discloses a personalized document retrieval method comprises the following steps:

a、为每个用户构建用户信息静态库:包括并不限于用户的身份信息和研究领域,并由用户输入至检索系统; a, to build a static library user information for each user: comprising identity information and are not limited to research user, input by the user to the retrieval system;

b、为每个用户构建用户的兴趣关键词库X:包括多个兴趣关键词和每个兴趣关键词相应的兴趣度;将兴趣关键词库X形式化表达为xl,x2,…,xm (其中m为自然数),对于每项兀素X= (k, W),其中k为兴趣关键词,w为该兴趣关键词对应的兴趣度,兴趣关键词库X初始化为步骤a中用户输入的关注领域,并将兴趣度统一赋予一个静态值; b, constructed for each user keyword database user interests X: interest comprises a plurality of keywords and keywords each interest corresponding degree of interest; the formal expression of interest is a keyword database X xl, x2, ..., xm ( wherein m is a natural number), for each pixel Wu X = (k, W), where k is the keyword of interest, for W corresponding to the keyword of interest of interest, X interest keyword database is initialized in step a user input areas of concern and interest of unity given a static value;

C、信息检索:用户进行信息检索时,设定输入的关键词集合为Q,进行检索,得到检索结果Rl,R2,….,Rn,(η为自然数);再将用户的兴趣关键词库X中的每个兴趣关键词加入到关键词集合中,再进行检索,得到的检索结果如与Rl,R2,….,Rn有重复元素,则将这些重复元素的排名向前移动,移动的距离根据这个兴趣关键词的兴趣度w按线性比例确定; C, Information Retrieval: the user for information retrieval, setting input keyword set is Q, retrieve, obtain search results Rl, R2, ..., Rn, (η is a natural number); keyword database then the user's interest. X in each interest keywords to the keyword set, then retrieved, and the retrieved results, such as Rl, R2, ...., Rn duplicate elements, will rank these repeat elements move forward, moving the linear scale is determined from the degree of interest of interest keyword W;

若该用户兴趣关键词库X中有m个兴趣关键词,则需要做m次信息检索动作,最后调整完成的检索结果作为最终结果输出。 If there are m interested keyword, needed to be done m times of operation of the user-interest information retrieval keyword database X, the final adjustment of the final result is output as a search result.

[0013] 兴趣关键词库X的更新:每次用户输入检索关键词时,将检索关键词加入到兴趣关键词库X中,形成一个新的兴趣关键词,并将其对应的兴趣度初始化为一个静态值;如某个检索关键词k在兴趣关键词库X中已经存在,则将该兴趣关键词对应的兴趣度w加I。 [0013] X is updated keyword database interest: each time the user inputs a search keyword, the search keyword of interest is added to the keyword database X to form a new keyword of interest, and the initializing the corresponding interest is a static value; k as a search key already exists in the keyword database interest X, then the degree of interest corresponding to the keyword of interest plus w I.

[0014]同时,每次检索后,将所有兴趣关键词的兴趣度值做衰减操作,即减少一个数值e。 [0014] Meanwhile, after each search, the keywords of all the interest level of interest for attenuation operation, i.e., a value reduced e. 此数值反映兴趣衰减的速度,可以为一个固定值,如0.01,也可以与用户的检索习惯相关,做自适应的学习确定。 This value reflects the decay rate of interest, may be a fixed value, such as 0.01, may be associated with the search user habits, determined to do adaptive learning. 如兴趣度衰减到小于等于0,则将其对应的兴趣关键词从兴趣关键词库X中删除,以保持兴趣关键词库的鲜活性。 The attenuation of interest to less than or equal to 0, it is deleted from the keyword corresponding to interest keyword database X interest in sexual interest to keep alive the keyword database.

[0015] 本实施例中,关键词集合中包括兴趣关键词和检索关键词。 [0015] In the present embodiment, the keyword set includes the search keyword and a keyword of interest.

[0016] 本方法在实际应用过程中,动态的用户兴趣关键词库X,包括用户的兴趣关键词及对应的兴趣度,对每个信息检索结果,基于用户兴趣关键词库进行调整,从而输出用户个性化的检索结果;同时,根据每次用户的信息检索行为,对用户兴趣关键词库进行动态调整,使得系统不断加深对用户的了解,从而使得未来检索结果更加匹配其兴趣,检索结果更加准确。 [0016] In practical applications of the method, the dynamic user interest keyword database X, including the user's interest and the corresponding keywords of interest, the information for each search result, be adjusted based on user interest keyword database, so that the output search results personalized for the user; at the same time, according to information retrieval behavior of each user, the user interest keyword database dynamically adjusted so that the system continues to deepen understanding of the user, so that future search results more closely match their interests, the search results more accurate.

Claims (3)

  1. 1.一种个性化文献检索方法,其特征在于步骤如下: a、为每个用户构建用户信息静态库:包括并不限于用户的身份信息和研究领域,并由用户输入至检索系统; b、为每个用户构建用户的兴趣关键词库X:包括多个兴趣关键词和每个兴趣关键词相应的兴趣度;将兴趣关键词库X形式化表达为xl,x2,…,xm (其中m为自然数),对于每项兀素X= (k, W),其中k为兴趣关键词,w为该兴趣关键词对应的兴趣度,兴趣关键词库X初始化为步骤a中用户输入的关注领域,并将兴趣度统一赋予一个静态值; C、信息检索:用户进行信息检索时,设定输入的关键词集合为Q,进行检索,得到检索结果Rl,R2,….,Rn,(η为自然数);再将用户的兴趣关键词库X中的每个兴趣关键词加入到关键词集合中,再进行检索,得到的检索结果如与Rl,R2,….,Rn有重复元素,则将这些重复元素的排名向 A personalized document retrieval method, comprising the steps of: a, to build a static library user information for each user: comprising identity information and are not limited to research user, by the user to input a retrieval system; B, Construction of user interest for each user keyword database X: interest comprises a plurality of keywords and keywords each interest corresponding degree of interest; the formal expression of interest is a keyword database X xl, x2, ..., xm (m wherein Follow the art is a natural number), for each pixel Wu X = (k, W), where k is the keyword of interest, for W corresponding to the keyword of interest of interest, X interest keyword database is initialized in step a user input and imparting a uniform static interestingness value; C, retrieving information: a user information search keyword set input is set to Q, retrieve, obtain search results Rl, R2, ..., Rn, (η is. a natural number); then the user's interest in each interest in the keyword database X keywords to the keyword set, then retrieved, and the retrieved results, such as Rl, R2, ..., Rn there are duplicate elements will. these rankings to repeating elements 移动,移动的距离根据这个兴趣关键词的兴趣度确定; 若该用户兴趣关键词库X中有m个兴趣关键词,则需要做m次信息检索动作,最后调整完成的检索结果作为最终结果输出。 Movement distance, the movement is determined based on this keyword of interest of interest; if there are m interested the user is interested keyword database keyword X, you need to do information search operation m times, the final adjustment of the search result as a final result output .
  2. 2.根据权利要求1所述的一种个性化文献检索方法,其特征在于:兴趣关键词库X的更新:每次用户输入检索关键词时,将检索关键词加入到兴趣关键词库X中,形成一个新的兴趣关键词,并将该兴趣关键词对应的兴趣度初始化为一个静态值;如某个检索关键词k在兴趣关键词库X中已经存在,则将该兴趣关键词对应的兴趣度w加I。 The a personalized document retrieval method according to claim 1, wherein: X of the keyword database update interest: each time the user inputs a search keyword, the search keyword is added to the interest in the keyword database X to form a new keyword of interest, and the degree of interest corresponding to the keyword of interest is initialized to a static value; k as a search key already exists in the keyword database interest X, then the corresponding keywords of interest w plus interest level I.
  3. 3.根据权利要求2所述的一种个性化文献检索方法,其特征在于:每次检索后,将所有兴趣关键词的兴趣度值做衰减操作,所述的衰减操作是减少一个数值e,如兴趣度衰减到小于等于O,则将其对应的兴趣关键词从兴趣关键词库X中删除。 The a personalized document retrieval method according to claim 2, wherein: After each search, the keywords of all interest interest level operation for attenuation, the attenuating operation is to reduce a value E, the attenuation of interest to or less O, then it is deleted from the keyword corresponding to interest in X interest keyword database.
CN 201510592309 2015-09-17 2015-09-17 Personalized literature searching method CN105205139A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201510592309 CN105205139A (en) 2015-09-17 2015-09-17 Personalized literature searching method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201510592309 CN105205139A (en) 2015-09-17 2015-09-17 Personalized literature searching method

Publications (1)

Publication Number Publication Date
CN105205139A true true CN105205139A (en) 2015-12-30

Family

ID=54952822

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201510592309 CN105205139A (en) 2015-09-17 2015-09-17 Personalized literature searching method

Country Status (1)

Country Link
CN (1) CN105205139A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070233672A1 (en) * 2006-03-30 2007-10-04 Coveo Inc. Personalizing search results from search engines
CN101080709A (en) * 2004-03-29 2007-11-28 咕果公司 Variable personalization of search results in a search engine
CN102156728A (en) * 2011-03-31 2011-08-17 河南理工大学 Improved personalized summary system based on user interest model
CN102819575A (en) * 2012-07-20 2012-12-12 南京大学 Personalized search method for Web service recommendation
US8463810B1 (en) * 2006-06-01 2013-06-11 Monster Worldwide, Inc. Scoring concepts for contextual personalized information retrieval

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101080709A (en) * 2004-03-29 2007-11-28 咕果公司 Variable personalization of search results in a search engine
US20070233672A1 (en) * 2006-03-30 2007-10-04 Coveo Inc. Personalizing search results from search engines
US8463810B1 (en) * 2006-06-01 2013-06-11 Monster Worldwide, Inc. Scoring concepts for contextual personalized information retrieval
CN102156728A (en) * 2011-03-31 2011-08-17 河南理工大学 Improved personalized summary system based on user interest model
CN102819575A (en) * 2012-07-20 2012-12-12 南京大学 Personalized search method for Web service recommendation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
谭利文: "基于用户模型的个性化网络文献检索系统的研究与设计", 《基于用户模型的个性化网络文献检索系统的研究与设计 *

Similar Documents

Publication Publication Date Title
Efron Hashtag retrieval in a microblogging environment
Vrandečić et al. Wikidata: a free collaborative knowledgebase
US20080294621A1 (en) Recommendation systems and methods using interest correlation
US20080294622A1 (en) Ontology based recommendation systems and methods
Courgeau Methodology and epistemology of multilevel analysis: approaches from different social sciences
Bennett et al. Inferring and using location metadata to personalize web search
US20120197993A1 (en) Skill ranking system
US20080222138A1 (en) Method and Apparatus for Constructing a Link Structure Between Documents
US20090240674A1 (en) Search Engine Optimization
Cafarella et al. Structured data on the web
Morsey et al. Dbpedia and the live extraction of structured data from wikipedia
Pu et al. Subject categorization of query terms for exploring Web users' search interests
Nasraoui World wide web personalization
Mimno Computational historiography: Data mining in a century of classics journals
CN102332006A (en) Information push control method and device
Cui et al. Attention-over-attention neural networks for reading comprehension
Kantor et al. Capturing human intelligence in the net
CN101770520A (en) User interest modeling method based on user browsing behavior
US20100076966A1 (en) Systems and methods for generating social index scores for key term analysis and comparisons
US20100191740A1 (en) System and method for ranking web searches with quantified semantic features
Cash et al. Adolescent suicide statements on MySpace
US20120005219A1 (en) Using computational engines to improve search relevance
US9195640B1 (en) Method and system for finding content having a desired similarity
US20150278691A1 (en) User interests facilitated by a knowledge base
Tang et al. ArnetMiner: An Expertise Oriented Search System for Web Community.

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination