CN105912673A - Optimization method for Micro Blog search based on personalized characteristics of user - Google Patents

Optimization method for Micro Blog search based on personalized characteristics of user Download PDF

Info

Publication number
CN105912673A
CN105912673A CN201610226690.1A CN201610226690A CN105912673A CN 105912673 A CN105912673 A CN 105912673A CN 201610226690 A CN201610226690 A CN 201610226690A CN 105912673 A CN105912673 A CN 105912673A
Authority
CN
China
Prior art keywords
user
microblogging
word
micro blog
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610226690.1A
Other languages
Chinese (zh)
Inventor
喻梅
曹雅茹
于健
王建荣
张旭
缑小路
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201610226690.1A priority Critical patent/CN105912673A/en
Publication of CN105912673A publication Critical patent/CN105912673A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An optimization method for Micro Blog search based on personalized characteristics of a user comprises the steps that a user-lexical item interest preference degree is calculated, wherein a them model is established according to release, attention, reposting and comments of the user of Sina Micro Blog, and a user-them-word relation is analyzed, so that the user's interest preference degree towards a word can be obtained; association rules are established, wherein the interest preference degree is taken as a weight factor of each work in the weighting association rules; inquired words are expanded, wherein lexical items with different weights generated after the establishment of the association rules are trained in an algorithm of the weight association rules, and final rules are obtained, and expanded inquired words are obtained through expansion of the inquired words according to connotations and characteristics expressed by the final rules; and in combination with timeliness characteristics of Micro Blog as well as similarity between expanded inquired words and Micro Blog files, a file about Micro Blog search results is regarded and re-sequenced, so that the Micro Blog search results are optimized. The method provided by the invention is characterized in that the Micro Blog search results are optimized; and aiming at each user, files which satisfy user inquiry rank top places, and unrelated files rank behind.

Description

Microblogging chess game optimization method based on user individual feature
Technical field
The present invention relates to a kind of microblogging chess game optimization method.Particularly relate to the search of a kind of microblogging based on user individual feature excellent Change method.
Background technology
At present, carry out the research of personalized interest modeling for the user in microblog, mainly carry out from two emphasis.
First emphasis is to analyze the social network relationships of microblog users, and then obtains the personalization features of user.Second side It is important that the content of text issuing microblog users is modeled, obtain user interest profile.
When user inquires about, can understand query demand fuzzy for a certain reason, query express is unclear, ultimately results in and obtains The Search Results obtained does not meets user's requirement.Consideration in view of the situation, query optimization expands the mechanism performance to search system Improve and there is important effect.It is exactly to add searching keyword relevant according to semantic dependency or High relevancy that mechanism is expanded in inquiry Topic word carries out expanding to improve the accuracy rate of inquiry.Existing microblogging search engine and achievement in research do not consider that user's is individual Property interest.Microblogging search engine is studied, finds that microblogging search engine is better than any of conventional web search engine and is exactly The time factor of document and the authority of publisher have been joined in the standard of document ordering scoring by microblogging search engine.Microblogging is searched Rope still suffers from deficiency, that is, not for each user's own characteristic, Search Results is carried out personalisation process, does not the most make The accuracy of Query Result is improved further with inquiry extending method.Abundant user profile resource exclusive in microblog, It is not analyzed and extracts user individual feature by search service, is a disappearance and the waste of microblogging search service.
Summary of the invention
The technical problem to be solved is to provide a kind of microblogging chess game optimization method based on user individual feature, energy Enough effectively existing microblogging search engines of solution and achievement in research carry out individual character for each user's own characteristic to Search Results Change processes, and the problem that can effectively solve the problem that the accuracy not using inquiry extending method raising Query Result.
The technical solution adopted in the present invention is: a kind of microblogging chess game optimization method based on user individual feature, including as follows Step:
1) user-lexical item interest preference degree calculates, and to the issue of Sina's microblog users, pays close attention to, forwards and comments on and carry out theme mould Type models, and analyzes user-theme-word relation, obtains user's interest preference degree to word;
2) correlation rule is built, using interest preference degree as the weight factor of each word in weighted association rules;
3) query terms is expanded, in weighted association rules algorithm, to the word with different weights produced after building correlation rule Item is trained obtaining final rule, and query word is expanded by implication and feature according to final regular expression, is expanded Fill query word;
4) combine the ageing feature of microblogging and expand the similarity of query word and microblogging document, microblogging search result document is carried out Again mark and sort, and then optimizing microblogging Search Results.
Step 4) described in microblogging Search Results is marked again and sorts, be to use word frequency-reverse text frequency model meter Calculate the similarity between user's query word and search microblogging document, as microblogging search result document standards of grading, wherein, similar Spending before high explanation microblogging search result document comes close to the content that user is to be inquired about, the explanation microblogging that similarity is low is searched for Result document, after the content that user is to be inquired about relatively far away from comes, sorts successively according to the height of similarity.
The microblogging chess game optimization method based on user individual feature of the present invention, expands query word by using inquiry to expand mechanism, Thus optimize microblogging Search Results;Using LDA topic model to analyze the individualized feature of microblog users, the interest obtaining user is inclined Good, using this preference as the weight of each lexical item in weighted association rules method, and then use weighted association rules method to expand Query word so that Sina's microblogging Search Results is ranked up according to Characteristic of Interest and the query demand of each user self, i.e. for Each user, meet user's inquiry document ranking is forward, irrelevant document ranks behind.
Accompanying drawing explanation
Fig. 1 is the flow chart of present invention microblogging based on user individual feature chess game optimization method;
Fig. 2 is that MAP evaluates difference inquiry extending method effect of optimization and original microblogging search system effect figure;
Fig. 3 is that NDCG evaluates difference inquiry extending method effect of optimization and original microblogging search effect figure.
Detailed description of the invention
Below in conjunction with embodiment and accompanying drawing, the microblogging chess game optimization method based on user individual feature of the present invention is made specifically Bright.
As it is shown in figure 1, the microblogging chess game optimization method based on user individual feature of the present invention, comprise the steps:
1) user-lexical item interest preference degree calculates
To the issue of Sina's microblog users, pay close attention to, forward and comment on and carry out topic model modeling, analyze user-theme-word relation, Obtain user's interest preference degree to word;
2) correlation rule is built
What the interest preference degree of word was reflected by described user is the probabilistic relation between user-word, have expressed to a certain extent not With word for the preference of different user and value, the present invention using interest preference degree as the power of each word in weighted association rules Repeated factor;
3) query terms is expanded
The lexical item with different weights produced after building correlation rule, in weighted association rules algorithm, is trained by the present invention Obtaining final rule, query word is expanded by implication and feature according to final regular expression, obtains expanding query word;
4) what microblogging search engine was different from other search engines is exactly a little that its content has the most ageing, so the present invention In conjunction with ageing feature and expansion query word and the similarity of microblogging document of microblogging, microblogging search result document is commented again Divide and sequence, and then optimize microblogging Search Results.
Described again marks to microblogging Search Results and sorts, and is to use word frequency-reverse text frequency model to calculate user and look into Ask the similarity between word and search microblogging document, as microblogging search result document standards of grading, wherein, high the saying of similarity Before bright microblogging search result document comes close to the content that user is to be inquired about, the explanation microblogging search result document that similarity is low After the content that user is to be inquired about relatively far away from comes, sort successively according to the height of similarity.
Below in conjunction with experiment in the case of document accuracy rate and recall rate are certain, to using the method for the present invention to microblogging search knot The effect of fruit rearrangement is evaluated, this problem relating to evaluate the search effect quality that different sequences brings.Experiment is adopted By MAP and the NDCG evaluation criterion that can reflect dependency and sequencing problem simultaneously, effect of optimization is carried out comparative evaluation.
(1)MAP
MAP is reflection searching system monodrome index of performance on whole relevant documentations.System retrieval relevant documentation sequence out The most forward, MAP is it is possible to the highest.If system does not return relevant documentation, then accuracy rate is defaulted as 0.MAP's is concrete public Shown in formula such as formula (1).
M A P = 1 N Σ Σ i = 1 i n i n - - - ( 1 )
Wherein, n is the number of relevant documentation, and i is i-th relevant documentation, and N is theme number, niIt is i-th relevant documentation Actual search sorting position.
(2)NDCG
NDCG can be good at the sequence effect in the presence of tolerance correlation level, and it is the best to be worth closer to 1 explanation sequence effect. Discount yield value was first introduced, shown in concrete DCG computing formula such as formula (2) before introducing NDCG.
DCG k = Σ i = 1 k rel i log 2 ( i + 1 ) - - - ( 2 )
Wherein, front k the document during k is Search Results.Correlation level is that degree of relevancy is more and more stronger from 0 to r.DCGk It is the discount yield value of the actual sequence of front k document, reliIt it is the correlation level of i-th document in actual ranking.Concrete NDCG Computing formula such as formula (3) shown in.
NDCG k = DCG k nDCG k - - - ( 3 )
Wherein, DCGkIt it is the discount yield value of the actual sequence of front k document.nDCGkIt it is the preferable ranking results of front k document Discount yield value.
Time as seen in Figure 2 using MAP as search effect evaluation criterion, based on the present invention special based on user individual The microblogging Search Results effect of optimization that the microblogging chess game optimization method levied obtains, all far above original microblogging search effect.Based on pass The search document scores standard effect that the inquiry extending method of connection rule obtains, less than original microblogging search engine standards of grading, causes MAP value is less than the effect of original microblogging Search Results, and this is also acceptable.
Time as seen in Figure 3 using NDCG as search effect evaluation criterion, sum based on dictionary, based on correlation rule The carried algorithm of paper, i.e. based on user individual feature inquiry extending method, the microblogging knot that these three inquiry extending method obtains Really effect of optimization is all higher than original microblogging search effect.
The effect of optimization obtained by Fig. 2 with Fig. 3 different inquiry extending method and the contrast of original microblogging search effect, illustrate this Invention microblogging chess game optimization method based on user individual feature can well Optimizing Search result, make Search Results more Meet user's request.

Claims (2)

1. a microblogging chess game optimization method based on user individual feature, it is characterised in that comprise the steps:
1) user-lexical item interest preference degree calculates, and to the issue of Sina's microblog users, pays close attention to, forwards and comments on and carry out theme mould Type models, and analyzes user-theme-word relation, obtains user's interest preference degree to word;
2) correlation rule is built, using interest preference degree as the weight factor of each word in weighted association rules;
3) expand query terms, in weighted association rules algorithm, to build produce after correlation rule with different weights Lexical item is trained obtaining final rule, and query word is expanded by implication and feature according to final regular expression, obtains Expand query word;
4) combine the ageing feature of microblogging and expand the similarity of query word and microblogging document, microblogging search result document is entered Row is again marked and sorts, and then optimizes microblogging Search Results.
Microblogging chess game optimization method based on user individual feature the most according to claim 1, it is characterised in that step Rapid 4) again marking microblogging Search Results and sorting described in, is to use word frequency-reverse text frequency model to calculate user Similarity between query word and search microblogging document, as microblogging search result document standards of grading, wherein, similarity is high Before illustrating that microblogging search result document comes close to the content that user is to be inquired about, the explanation microblogging Search Results literary composition that similarity is low Shelves, after the content that user is to be inquired about relatively far away from comes, sort successively according to the height of similarity.
CN201610226690.1A 2016-04-11 2016-04-11 Optimization method for Micro Blog search based on personalized characteristics of user Pending CN105912673A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610226690.1A CN105912673A (en) 2016-04-11 2016-04-11 Optimization method for Micro Blog search based on personalized characteristics of user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610226690.1A CN105912673A (en) 2016-04-11 2016-04-11 Optimization method for Micro Blog search based on personalized characteristics of user

Publications (1)

Publication Number Publication Date
CN105912673A true CN105912673A (en) 2016-08-31

Family

ID=56746830

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610226690.1A Pending CN105912673A (en) 2016-04-11 2016-04-11 Optimization method for Micro Blog search based on personalized characteristics of user

Country Status (1)

Country Link
CN (1) CN105912673A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484829A (en) * 2016-09-29 2017-03-08 中国国防科技信息中心 A kind of foundation of microblogging order models and microblogging diversity search method
CN108090220A (en) * 2017-12-29 2018-05-29 科大讯飞股份有限公司 Point of interest search sort method and system
CN108829793A (en) * 2018-06-01 2018-11-16 杭州电子科技大学 A kind of organizational member hobby method for digging
CN110909147A (en) * 2019-12-02 2020-03-24 支付宝(杭州)信息技术有限公司 Method and system for training sorting result selection model output standard question method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060047651A1 (en) * 2000-05-25 2006-03-02 Microsoft Corporation Facility for highlighting documents accessed through search or browsing
CN103123649A (en) * 2013-01-29 2013-05-29 广州一找网络科技有限公司 Method and system for searching information based on micro blog platform
CN103559258A (en) * 2013-11-04 2014-02-05 同济大学 Webpage ranking method based on cloud computation
CN103853831A (en) * 2014-03-10 2014-06-11 中国电子科技集团公司第二十八研究所 Personalized searching realization method based on user interest

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060047651A1 (en) * 2000-05-25 2006-03-02 Microsoft Corporation Facility for highlighting documents accessed through search or browsing
CN103123649A (en) * 2013-01-29 2013-05-29 广州一找网络科技有限公司 Method and system for searching information based on micro blog platform
CN103559258A (en) * 2013-11-04 2014-02-05 同济大学 Webpage ranking method based on cloud computation
CN103853831A (en) * 2014-03-10 2014-06-11 中国电子科技集团公司第二十八研究所 Personalized searching realization method based on user interest

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
缑小路: ""基于用户个性化特征的微博搜索结果优化"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
邸亮 等: ""LDA模型在微博用户推荐中的应用"", 《计算机工程》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106484829A (en) * 2016-09-29 2017-03-08 中国国防科技信息中心 A kind of foundation of microblogging order models and microblogging diversity search method
CN106484829B (en) * 2016-09-29 2019-05-17 中国国防科技信息中心 A kind of foundation and microblogging diversity search method of microblogging order models
CN108090220A (en) * 2017-12-29 2018-05-29 科大讯飞股份有限公司 Point of interest search sort method and system
CN108829793A (en) * 2018-06-01 2018-11-16 杭州电子科技大学 A kind of organizational member hobby method for digging
CN108829793B (en) * 2018-06-01 2021-09-24 杭州电子科技大学 Organization hobby mining method
CN110909147A (en) * 2019-12-02 2020-03-24 支付宝(杭州)信息技术有限公司 Method and system for training sorting result selection model output standard question method
CN110909147B (en) * 2019-12-02 2022-06-21 支付宝(杭州)信息技术有限公司 Method and system for training sorting result selection model output standard question method

Similar Documents

Publication Publication Date Title
CN100483408C (en) Method and apparatus for establishing link structure between multiple documents
CN102096717B (en) Search method and search engine
CN102902806B (en) A kind of method and system utilizing search engine to carry out query expansion
CN102253982B (en) Query suggestion method based on query semantics and click-through data
CN105045875B (en) Personalized search and device
CN101853272B (en) Search engine technology based on relevance feedback and clustering
CN102890711B (en) A kind of retrieval ordering method and system
CN102411621A (en) Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode
CN101751455B (en) Method for automatically generating title by adopting artificial intelligence technology
CN102081668B (en) Information retrieval optimizing method based on domain ontology
CN105912673A (en) Optimization method for Micro Blog search based on personalized characteristics of user
CN103778227A (en) Method for screening useful images from retrieved images
CN101321190A (en) Recommend method and recommend system of heterogeneous network
CN103577416A (en) Query expansion method and system
CN102254039A (en) Searching engine-based network searching method
CN103324665A (en) Hot spot information extraction method and device based on micro-blog
CN101770521A (en) Focusing relevancy ordering method for vertical search engine
CN101127042A (en) Sensibility classification method based on language model
CN104008109A (en) User interest based Web information push service system
CN103186550A (en) Method and system for generating video-related video list
CN105335415A (en) Search method based on input prediction, and input method system
CN102968419A (en) Disambiguation method for interactive Internet entity name
CN105528411A (en) Full-text retrieval device and method for interactive electronic technical manual of shipping equipment
CN102918532A (en) Detection of junk in search result ranking
CN105653562A (en) Calculation method and apparatus for correlation between text content and query request

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160831

RJ01 Rejection of invention patent application after publication