CN105912673A - Optimization method for Micro Blog search based on personalized characteristics of user - Google Patents
Optimization method for Micro Blog search based on personalized characteristics of user Download PDFInfo
- Publication number
- CN105912673A CN105912673A CN201610226690.1A CN201610226690A CN105912673A CN 105912673 A CN105912673 A CN 105912673A CN 201610226690 A CN201610226690 A CN 201610226690A CN 105912673 A CN105912673 A CN 105912673A
- Authority
- CN
- China
- Prior art keywords
- user
- microblogging
- word
- micro blog
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An optimization method for Micro Blog search based on personalized characteristics of a user comprises the steps that a user-lexical item interest preference degree is calculated, wherein a them model is established according to release, attention, reposting and comments of the user of Sina Micro Blog, and a user-them-word relation is analyzed, so that the user's interest preference degree towards a word can be obtained; association rules are established, wherein the interest preference degree is taken as a weight factor of each work in the weighting association rules; inquired words are expanded, wherein lexical items with different weights generated after the establishment of the association rules are trained in an algorithm of the weight association rules, and final rules are obtained, and expanded inquired words are obtained through expansion of the inquired words according to connotations and characteristics expressed by the final rules; and in combination with timeliness characteristics of Micro Blog as well as similarity between expanded inquired words and Micro Blog files, a file about Micro Blog search results is regarded and re-sequenced, so that the Micro Blog search results are optimized. The method provided by the invention is characterized in that the Micro Blog search results are optimized; and aiming at each user, files which satisfy user inquiry rank top places, and unrelated files rank behind.
Description
Technical field
The present invention relates to a kind of microblogging chess game optimization method.Particularly relate to the search of a kind of microblogging based on user individual feature excellent
Change method.
Background technology
At present, carry out the research of personalized interest modeling for the user in microblog, mainly carry out from two emphasis.
First emphasis is to analyze the social network relationships of microblog users, and then obtains the personalization features of user.Second side
It is important that the content of text issuing microblog users is modeled, obtain user interest profile.
When user inquires about, can understand query demand fuzzy for a certain reason, query express is unclear, ultimately results in and obtains
The Search Results obtained does not meets user's requirement.Consideration in view of the situation, query optimization expands the mechanism performance to search system
Improve and there is important effect.It is exactly to add searching keyword relevant according to semantic dependency or High relevancy that mechanism is expanded in inquiry
Topic word carries out expanding to improve the accuracy rate of inquiry.Existing microblogging search engine and achievement in research do not consider that user's is individual
Property interest.Microblogging search engine is studied, finds that microblogging search engine is better than any of conventional web search engine and is exactly
The time factor of document and the authority of publisher have been joined in the standard of document ordering scoring by microblogging search engine.Microblogging is searched
Rope still suffers from deficiency, that is, not for each user's own characteristic, Search Results is carried out personalisation process, does not the most make
The accuracy of Query Result is improved further with inquiry extending method.Abundant user profile resource exclusive in microblog,
It is not analyzed and extracts user individual feature by search service, is a disappearance and the waste of microblogging search service.
Summary of the invention
The technical problem to be solved is to provide a kind of microblogging chess game optimization method based on user individual feature, energy
Enough effectively existing microblogging search engines of solution and achievement in research carry out individual character for each user's own characteristic to Search Results
Change processes, and the problem that can effectively solve the problem that the accuracy not using inquiry extending method raising Query Result.
The technical solution adopted in the present invention is: a kind of microblogging chess game optimization method based on user individual feature, including as follows
Step:
1) user-lexical item interest preference degree calculates, and to the issue of Sina's microblog users, pays close attention to, forwards and comments on and carry out theme mould
Type models, and analyzes user-theme-word relation, obtains user's interest preference degree to word;
2) correlation rule is built, using interest preference degree as the weight factor of each word in weighted association rules;
3) query terms is expanded, in weighted association rules algorithm, to the word with different weights produced after building correlation rule
Item is trained obtaining final rule, and query word is expanded by implication and feature according to final regular expression, is expanded
Fill query word;
4) combine the ageing feature of microblogging and expand the similarity of query word and microblogging document, microblogging search result document is carried out
Again mark and sort, and then optimizing microblogging Search Results.
Step 4) described in microblogging Search Results is marked again and sorts, be to use word frequency-reverse text frequency model meter
Calculate the similarity between user's query word and search microblogging document, as microblogging search result document standards of grading, wherein, similar
Spending before high explanation microblogging search result document comes close to the content that user is to be inquired about, the explanation microblogging that similarity is low is searched for
Result document, after the content that user is to be inquired about relatively far away from comes, sorts successively according to the height of similarity.
The microblogging chess game optimization method based on user individual feature of the present invention, expands query word by using inquiry to expand mechanism,
Thus optimize microblogging Search Results;Using LDA topic model to analyze the individualized feature of microblog users, the interest obtaining user is inclined
Good, using this preference as the weight of each lexical item in weighted association rules method, and then use weighted association rules method to expand
Query word so that Sina's microblogging Search Results is ranked up according to Characteristic of Interest and the query demand of each user self, i.e. for
Each user, meet user's inquiry document ranking is forward, irrelevant document ranks behind.
Accompanying drawing explanation
Fig. 1 is the flow chart of present invention microblogging based on user individual feature chess game optimization method;
Fig. 2 is that MAP evaluates difference inquiry extending method effect of optimization and original microblogging search system effect figure;
Fig. 3 is that NDCG evaluates difference inquiry extending method effect of optimization and original microblogging search effect figure.
Detailed description of the invention
Below in conjunction with embodiment and accompanying drawing, the microblogging chess game optimization method based on user individual feature of the present invention is made specifically
Bright.
As it is shown in figure 1, the microblogging chess game optimization method based on user individual feature of the present invention, comprise the steps:
1) user-lexical item interest preference degree calculates
To the issue of Sina's microblog users, pay close attention to, forward and comment on and carry out topic model modeling, analyze user-theme-word relation,
Obtain user's interest preference degree to word;
2) correlation rule is built
What the interest preference degree of word was reflected by described user is the probabilistic relation between user-word, have expressed to a certain extent not
With word for the preference of different user and value, the present invention using interest preference degree as the power of each word in weighted association rules
Repeated factor;
3) query terms is expanded
The lexical item with different weights produced after building correlation rule, in weighted association rules algorithm, is trained by the present invention
Obtaining final rule, query word is expanded by implication and feature according to final regular expression, obtains expanding query word;
4) what microblogging search engine was different from other search engines is exactly a little that its content has the most ageing, so the present invention
In conjunction with ageing feature and expansion query word and the similarity of microblogging document of microblogging, microblogging search result document is commented again
Divide and sequence, and then optimize microblogging Search Results.
Described again marks to microblogging Search Results and sorts, and is to use word frequency-reverse text frequency model to calculate user and look into
Ask the similarity between word and search microblogging document, as microblogging search result document standards of grading, wherein, high the saying of similarity
Before bright microblogging search result document comes close to the content that user is to be inquired about, the explanation microblogging search result document that similarity is low
After the content that user is to be inquired about relatively far away from comes, sort successively according to the height of similarity.
Below in conjunction with experiment in the case of document accuracy rate and recall rate are certain, to using the method for the present invention to microblogging search knot
The effect of fruit rearrangement is evaluated, this problem relating to evaluate the search effect quality that different sequences brings.Experiment is adopted
By MAP and the NDCG evaluation criterion that can reflect dependency and sequencing problem simultaneously, effect of optimization is carried out comparative evaluation.
(1)MAP
MAP is reflection searching system monodrome index of performance on whole relevant documentations.System retrieval relevant documentation sequence out
The most forward, MAP is it is possible to the highest.If system does not return relevant documentation, then accuracy rate is defaulted as 0.MAP's is concrete public
Shown in formula such as formula (1).
Wherein, n is the number of relevant documentation, and i is i-th relevant documentation, and N is theme number, niIt is i-th relevant documentation
Actual search sorting position.
(2)NDCG
NDCG can be good at the sequence effect in the presence of tolerance correlation level, and it is the best to be worth closer to 1 explanation sequence effect.
Discount yield value was first introduced, shown in concrete DCG computing formula such as formula (2) before introducing NDCG.
Wherein, front k the document during k is Search Results.Correlation level is that degree of relevancy is more and more stronger from 0 to r.DCGk
It is the discount yield value of the actual sequence of front k document, reliIt it is the correlation level of i-th document in actual ranking.Concrete NDCG
Computing formula such as formula (3) shown in.
Wherein, DCGkIt it is the discount yield value of the actual sequence of front k document.nDCGkIt it is the preferable ranking results of front k document
Discount yield value.
Time as seen in Figure 2 using MAP as search effect evaluation criterion, based on the present invention special based on user individual
The microblogging Search Results effect of optimization that the microblogging chess game optimization method levied obtains, all far above original microblogging search effect.Based on pass
The search document scores standard effect that the inquiry extending method of connection rule obtains, less than original microblogging search engine standards of grading, causes
MAP value is less than the effect of original microblogging Search Results, and this is also acceptable.
Time as seen in Figure 3 using NDCG as search effect evaluation criterion, sum based on dictionary, based on correlation rule
The carried algorithm of paper, i.e. based on user individual feature inquiry extending method, the microblogging knot that these three inquiry extending method obtains
Really effect of optimization is all higher than original microblogging search effect.
The effect of optimization obtained by Fig. 2 with Fig. 3 different inquiry extending method and the contrast of original microblogging search effect, illustrate this
Invention microblogging chess game optimization method based on user individual feature can well Optimizing Search result, make Search Results more
Meet user's request.
Claims (2)
1. a microblogging chess game optimization method based on user individual feature, it is characterised in that comprise the steps:
1) user-lexical item interest preference degree calculates, and to the issue of Sina's microblog users, pays close attention to, forwards and comments on and carry out theme mould
Type models, and analyzes user-theme-word relation, obtains user's interest preference degree to word;
2) correlation rule is built, using interest preference degree as the weight factor of each word in weighted association rules;
3) expand query terms, in weighted association rules algorithm, to build produce after correlation rule with different weights
Lexical item is trained obtaining final rule, and query word is expanded by implication and feature according to final regular expression, obtains
Expand query word;
4) combine the ageing feature of microblogging and expand the similarity of query word and microblogging document, microblogging search result document is entered
Row is again marked and sorts, and then optimizes microblogging Search Results.
Microblogging chess game optimization method based on user individual feature the most according to claim 1, it is characterised in that step
Rapid 4) again marking microblogging Search Results and sorting described in, is to use word frequency-reverse text frequency model to calculate user
Similarity between query word and search microblogging document, as microblogging search result document standards of grading, wherein, similarity is high
Before illustrating that microblogging search result document comes close to the content that user is to be inquired about, the explanation microblogging Search Results literary composition that similarity is low
Shelves, after the content that user is to be inquired about relatively far away from comes, sort successively according to the height of similarity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610226690.1A CN105912673A (en) | 2016-04-11 | 2016-04-11 | Optimization method for Micro Blog search based on personalized characteristics of user |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610226690.1A CN105912673A (en) | 2016-04-11 | 2016-04-11 | Optimization method for Micro Blog search based on personalized characteristics of user |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105912673A true CN105912673A (en) | 2016-08-31 |
Family
ID=56746830
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610226690.1A Pending CN105912673A (en) | 2016-04-11 | 2016-04-11 | Optimization method for Micro Blog search based on personalized characteristics of user |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105912673A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484829A (en) * | 2016-09-29 | 2017-03-08 | 中国国防科技信息中心 | A kind of foundation of microblogging order models and microblogging diversity search method |
CN108090220A (en) * | 2017-12-29 | 2018-05-29 | 科大讯飞股份有限公司 | Point of interest search sort method and system |
CN108829793A (en) * | 2018-06-01 | 2018-11-16 | 杭州电子科技大学 | A kind of organizational member hobby method for digging |
CN110909147A (en) * | 2019-12-02 | 2020-03-24 | 支付宝(杭州)信息技术有限公司 | Method and system for training sorting result selection model output standard question method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060047651A1 (en) * | 2000-05-25 | 2006-03-02 | Microsoft Corporation | Facility for highlighting documents accessed through search or browsing |
CN103123649A (en) * | 2013-01-29 | 2013-05-29 | 广州一找网络科技有限公司 | Method and system for searching information based on micro blog platform |
CN103559258A (en) * | 2013-11-04 | 2014-02-05 | 同济大学 | Webpage ranking method based on cloud computation |
CN103853831A (en) * | 2014-03-10 | 2014-06-11 | 中国电子科技集团公司第二十八研究所 | Personalized searching realization method based on user interest |
-
2016
- 2016-04-11 CN CN201610226690.1A patent/CN105912673A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060047651A1 (en) * | 2000-05-25 | 2006-03-02 | Microsoft Corporation | Facility for highlighting documents accessed through search or browsing |
CN103123649A (en) * | 2013-01-29 | 2013-05-29 | 广州一找网络科技有限公司 | Method and system for searching information based on micro blog platform |
CN103559258A (en) * | 2013-11-04 | 2014-02-05 | 同济大学 | Webpage ranking method based on cloud computation |
CN103853831A (en) * | 2014-03-10 | 2014-06-11 | 中国电子科技集团公司第二十八研究所 | Personalized searching realization method based on user interest |
Non-Patent Citations (2)
Title |
---|
缑小路: ""基于用户个性化特征的微博搜索结果优化"", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
邸亮 等: ""LDA模型在微博用户推荐中的应用"", 《计算机工程》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106484829A (en) * | 2016-09-29 | 2017-03-08 | 中国国防科技信息中心 | A kind of foundation of microblogging order models and microblogging diversity search method |
CN106484829B (en) * | 2016-09-29 | 2019-05-17 | 中国国防科技信息中心 | A kind of foundation and microblogging diversity search method of microblogging order models |
CN108090220A (en) * | 2017-12-29 | 2018-05-29 | 科大讯飞股份有限公司 | Point of interest search sort method and system |
CN108829793A (en) * | 2018-06-01 | 2018-11-16 | 杭州电子科技大学 | A kind of organizational member hobby method for digging |
CN108829793B (en) * | 2018-06-01 | 2021-09-24 | 杭州电子科技大学 | Organization hobby mining method |
CN110909147A (en) * | 2019-12-02 | 2020-03-24 | 支付宝(杭州)信息技术有限公司 | Method and system for training sorting result selection model output standard question method |
CN110909147B (en) * | 2019-12-02 | 2022-06-21 | 支付宝(杭州)信息技术有限公司 | Method and system for training sorting result selection model output standard question method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100483408C (en) | Method and apparatus for establishing link structure between multiple documents | |
CN102096717B (en) | Search method and search engine | |
CN102902806B (en) | A kind of method and system utilizing search engine to carry out query expansion | |
CN102253982B (en) | Query suggestion method based on query semantics and click-through data | |
CN105045875B (en) | Personalized search and device | |
CN101853272B (en) | Search engine technology based on relevance feedback and clustering | |
CN102890711B (en) | A kind of retrieval ordering method and system | |
CN102411621A (en) | Chinese inquiry oriented multi-document automatic abstraction method based on cloud mode | |
CN101751455B (en) | Method for automatically generating title by adopting artificial intelligence technology | |
CN102081668B (en) | Information retrieval optimizing method based on domain ontology | |
CN105912673A (en) | Optimization method for Micro Blog search based on personalized characteristics of user | |
CN103778227A (en) | Method for screening useful images from retrieved images | |
CN101321190A (en) | Recommend method and recommend system of heterogeneous network | |
CN103577416A (en) | Query expansion method and system | |
CN102254039A (en) | Searching engine-based network searching method | |
CN103324665A (en) | Hot spot information extraction method and device based on micro-blog | |
CN101770521A (en) | Focusing relevancy ordering method for vertical search engine | |
CN101127042A (en) | Sensibility classification method based on language model | |
CN104008109A (en) | User interest based Web information push service system | |
CN103186550A (en) | Method and system for generating video-related video list | |
CN105335415A (en) | Search method based on input prediction, and input method system | |
CN102968419A (en) | Disambiguation method for interactive Internet entity name | |
CN105528411A (en) | Full-text retrieval device and method for interactive electronic technical manual of shipping equipment | |
CN102918532A (en) | Detection of junk in search result ranking | |
CN105653562A (en) | Calculation method and apparatus for correlation between text content and query request |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160831 |
|
RJ01 | Rejection of invention patent application after publication |