CN102521350A - Selection method of distributed information retrieval sets based on historical click data - Google Patents

Selection method of distributed information retrieval sets based on historical click data Download PDF

Info

Publication number
CN102521350A
CN102521350A CN2011104122625A CN201110412262A CN102521350A CN 102521350 A CN102521350 A CN 102521350A CN 2011104122625 A CN2011104122625 A CN 2011104122625A CN 201110412262 A CN201110412262 A CN 201110412262A CN 102521350 A CN102521350 A CN 102521350A
Authority
CN
China
Prior art keywords
retrieval
inquiry
historical query
historical
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2011104122625A
Other languages
Chinese (zh)
Other versions
CN102521350B (en
Inventor
陈岭
刘颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201110412262.5A priority Critical patent/CN102521350B/en
Publication of CN102521350A publication Critical patent/CN102521350A/en
Application granted granted Critical
Publication of CN102521350B publication Critical patent/CN102521350B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a selection method of distributed information retrieval sets based on historical click data, wherein the method comprises the steps: 1), a retrieval proxy server performs preprocessing to a query log to extract historical query and click data; 2, the retrieval proxy server computes correlation degree between the historical query and each information set according to the click data; 3), the retrieval proxy server computes comprehensive similarity between the new query and each historical query; 4), the retrieval proxy server selects the most similar historical query according to the comprehensive similarity and computes correlation degree between the new query and each information set according to the historical query and the selected correlation degree between the historical query and each information set;5), the retrieval proxy server selects a plurality of information sets, sends a retrieval request and combines the result returned by the retrieval proxy server to output to a user sending new query. The method has the advantages of high retrieval result accuracy, low network bandwidth consumption, fast response speed and economic and efficient retrieval.

Description

Distributed information retrieval set option method based on historical click data
Technical field
The present invention relates to the distributed information retrieval technology, be specifically related to the set option method of retrieving information in a kind of distributed information retrieval system.
Background technology
It is universal day by day that Along with computer technology, mechanics of communication, rapid development of network technology and Internet use, and the quantity of electronic document and day sharp increase make electronic document become a huge information bank.The explosive increase of WWW information also makes Web become huge information bank.How managing for these ultra-large data, the control user is submerged in googol according to also finding own required information in the storehouse fast.Mainly contain two kinds of solutions at present: a kind of is centralized; Promptly adopt the separate unit high-performance server that mass data is carried out unified management, unified for the user provides service, this scenario-frame is simple; Be easy to dispose and implement; But the service performance of separate unit server always has the upper limit, and system cost is non-linear growth, is not easy to expansion.Another kind is distributed; Promptly adopt the logical server disposition of many Daeporis to manage mass data; Share the multi-user concurrent request, the sharpest edges of this scheme are to carry out dynamic-configuration to system resource according to the actual performance demand, avoid the overweight systemic breakdown that causes of load through load-balancing technique; And cost is relatively low, and applicability is stronger.
As shown in Figure 1; Distributed information retrieval system is made up of retrieval agent server and information retrieval server unit; The retrieval agent server through network user oriented 1, user 2 ..., user n provides the distributed information retrieval interface service; The information retrieval server unit comprise a plurality of information retrieval servers that are distributed frame (information retrieval server 1, information retrieval server 2 ..., information retrieval server n), the retrieval agent server links to each other with each information retrieval server through network.Each information retrieval server is as an ensemble of communication, a part of document of storage system.During retrieval, the retrieval agent server is transmitted to information retrieval server with inquiry, and each information retrieval server is retrieved separately and the result is returned to the agency, and the agency presents to the user after the result is merged by certain rule.
Because the data scale of distributed search is huge, many traditional methods all can not directly be used for distributed system, and the processing power of each node is not quite similar and can only retrieve the data subset of this locality usually; Make distributed information retrieval be faced with many challenges; As: Query Result is of low quality, is mainly reflected in recall ratio and precision ratio is lower, lacks necessary descriptor; Several aspects such as the good ordering rule of neither one, the inconvenience that causes the user to use.How to be that so huge information resources provide navigation Service efficiently, helping the user in the data of magnanimity, to find the information that needs fast is the search engine problem demanding prompt solution.Usually the user only is concerned about the result who comes the front that search engine returns, yet the degree of correlation of Query Result that the current search engine returns and user's request is unsatisfactory.So relevance ranking of search engine-, become the emphasis and the focus of current research according to sorting with the degree of correlation of user inquiring index file to search engine.The process of distributed information retrieval mainly is divided into following 3 steps: set is selected, and promptly for a given query formulation, from whole collection of document, selects maximally related with it document subclass and retrieves; Single document set retrieval is found out in each document sets and the closely-related document of user inquiring; Query Result merges, and promptly the intermediate result returned of each information set must be merged into a single the results list and return to the user.It is the major issue of distributed information retrieval research that set is selected.Given several ensembles of communication, set is chosen in as far as possible and does not influence under the prerequisite of retrieval effectiveness, selects and inquires about relevant information subset and retrieve.Set is selected to avoid searching for all information sets, can reduce network bandwidth consumption, improves the response speed of system, realizes the high-efficiency and economic retrieval.
Summary of the invention
The technical matters that the present invention will solve provides a kind of result for retrieval accuracy height, network bandwidth consumption is low, response speed is fast, the distributed information retrieval set option method based on historical click data of retrieval economical and efficient.
For solving the problems of the technologies described above, the technical scheme that the present invention adopts is: a kind of distributed information retrieval set option method based on historical click data, and implementation step is following:
1) the retrieval agent server carries out pre-service to inquiry log, extracts historical query and corresponding click data thereof;
2) the retrieval agent server is according to the degree of correlation between each ensemble of communication of storing on inquiry of click data computation history and the information retrieval server;
3) the retrieval agent server obtains the new inquiry that the user sends, and calculates the comprehensive similarity between new inquiry and each historical query;
4) the retrieval agent server is selected the most similar historical query of a plurality of and new inquiry according to said comprehensive similarity, according to the historical query of said selection and and each ensemble of communication between the make new advances degree of correlation of inquiry and each ensemble of communication of relatedness computation;
5) the retrieval agent server is selected a plurality of ensembles of communication according to the new inquiry and the degree of correlation of ensemble of communication; The information retrieval server corresponding to ensemble of communication sends retrieval request, and exports to the user who sends new inquiry after the result that information retrieval server returns merged.
Further improvement as technique scheme of the present invention:
Extracting historical query and corresponding click data thereof in the said step 1) specifically is that index is stored and set up in historical query and corresponding click data thereof, and said index entry is formed by comprising the pointer that is used to store the data segment of historical query and point to corresponding click document id.
Said step 2) detailed step is: the retrieval agent server at first sends retrieval request with each historical query to each information retrieval server, and the number of being clicked in the result for retrieval that returns according to each retrieval server of said index statistics, basis then
Figure BDA0000118896140000031
Obtain the degree of correlation Rel (s of historical query and each ensemble of communication j| p), wherein p is historical query, and T should retrieve the number of documents of returning, s for each preset information retrieval server jBe an ensemble of communication that retrieval server comprises,
Figure BDA0000118896140000041
CTD (p) is the click data of historical query, doc iFor retrieval server comprises a document in the ensemble of communication.
The detailed step that calculates the comprehensive similarity between new inquiry and each historical query in the said step 3) is:
A) obtain the keyword similarity between the keyword of keyword and each historical query of new inquiry respectively through calculating query vector included angle cosine value;
B) form central sample to each historical query result for retrieval document of information retrieval server collection predetermined number;
C) similarity as a result between new inquiry of calculating and the said central sample;
D) with the keyword similarity and as a result similarity multiply by respectively that summation obtains comprehensive similarity behind the coefficient.
Said steps A) specifically basis in
sim _ term ( p | q ) = Σ i = 1 l w t i , p × w t i , q Σ i = 1 l w t i , p 2 × Σ i = 1 l w t i , q 2 w t i , p = t f i , p × iq f i iq f i = log ( n q f i )
Keyword similarity sim_term (p|q) between the keyword of the new inquiry of calculating and the keyword of each historical query, wherein p is new inquiry, q is historical query, t iBe i index terms, w Ti, pBe index terms t among the inquiry p iWeight, w Ti, pBe index terms t among the inquiry p iWeight, tf I, pBe index terms t among the inquiry p iThe frequency that occurs, iqf iBe reverse enquiry frequency, qf iFor keyword t occurring iInquiry quantity.
Said step C) specifically is based in
sim _ result ( p | q ) = N ( R ( p ) ∩ R ( q ) ) N ( R ( p ) ∪ R ( q ) )
Calculate the sim_result of similarity as a result (p|q) between new inquiry and the said central sample; R (q) is the result for retrieval to central sample of historical query q, and the number of documents that N (R (p) ∩ R (q)) comprises for the common factor of newly inquiring about p and historical query q result for retrieval, N (R (p) ∪ R (q)) are the number of documents that the union of new inquiry p and historical query q result for retrieval comprises.
Said step D) detailed step comprises:
1., obtain comprehensive similarity sim (p|q) according to sim (p|q)=α * sim_term (p|q)+β * sim_result (p|q); Wherein sim_term (p|q) is the keyword similarity; Sim_result (p|q) is a similarity as a result, and α is the keyword coefficient of similarity, and β is a coefficient of similarity as a result;
2., basis
Standardization comprehensive similarity sim (p|q), wherein cutSim is the preset coefficient of standardization, ∑ sim (p|q) is sim (p|q) summation greater than cutSim, obtains final comprehensive similarity sim (p|q).
Said step 4) retrieval agent server calculates new inquiry and specifically is meant with the degree of correlation of each information retrieval server: according to the degree of correlation of the similarity between said inquiry with historical similar inquiry and each ensemble of communication, through Rel (s j| q)=∑ Rel (s j| p) sim (p|q) calculates make new advances inquiry and ensemble of communication s jDegree of correlation Rel (s j| q), Rel (s j| p) be historical query p and ensemble of communication s jThe degree of correlation, sim (p|q) is the comprehensive similarity of new inquiry p and historical query q.
The present invention has following advantage:
1, the click data of the present invention through extracting each historical query, according to the degree of correlation of the inquiry of click data computation history and each ensemble of communication, obtain comprehensive similarity between new inquiry and each historical query, select a plurality of and newly inquire about the most similar historical query according to comprehensive similarity through calculating keyword similarity and result for retrieval similarity between new inquiry and each historical query; And according to the relatedness computation of the historical similar inquiry of selecting and each ensemble of communication make new advances inquiry and each ensemble of communication the degree of correlation, from the maximum ensemble of communication of the new inquiry degree of correlation select a plurality of information sets to merge to send retrieval to information retrieval server; The result that information retrieval server is returned exports to the user who sends new inquiry after merging, and has the advantage that the result for retrieval accuracy is high, network bandwidth consumption is low, response speed is fast, retrieve economical and efficient.
2, the present invention extracts the click situation that the click data of each historical query further is preceding several results of returning of each retrieval server of statistics retrieval; Promptly only consider the situation that quilt is clicked in the front T result for retrieval; Angle from the user; Estimate the degree of correlation of each ensemble of communication and historical query more accurately, improved the accuracy rate of a preceding K result for retrieval, improved the quality and the efficient of retrieval.
3, the present invention obtains the comprehensive similarity between new inquiry and the historical query through the method for keyword similarity and similarity combination as a result; Take all factors into consideration keyword similarity and theme similarity between inquiry; Estimate the similarity between inquiry more accurately, can improve retrieval precision.
Description of drawings
Fig. 1 is the framed structure synoptic diagram of the distributed information retrieval system of prior art.
Fig. 2 is the main schematic flow sheet of the embodiment of the invention.
Fig. 3 is the framed structure synoptic diagram of retrieval agent server in the embodiment of the invention.
Fig. 4 is the storage organization synoptic diagram of inquiry log in the embodiment of the invention.
Fig. 5 is a step 2 in the embodiment of the invention) concise and to the point schematic flow sheet.
Fig. 6 is the concise and to the point schematic flow sheet of embodiment of the invention step 3).
Fig. 7 is the concise and to the point schematic flow sheet of embodiment of the invention step 4).
Embodiment
As shown in Figure 2, the embodiment of the invention is following based on the implementation step of the distributed information retrieval set option method of historical click data:
1) the retrieval agent server carries out pre-service to inquiry log, extracts historical query and corresponding click data thereof;
2) the retrieval agent server is according to the degree of correlation between each ensemble of communication of storing on inquiry of click data computation history and the information retrieval server;
3) the retrieval agent server obtains the new inquiry that the user sends, and calculates the comprehensive similarity between new inquiry and each historical query;
4) the retrieval agent server is selected the most similar historical query of a plurality of and new inquiry according to comprehensive similarity, according to the historical query of selecting and and each ensemble of communication between the make new advances degree of correlation of inquiry and each ensemble of communication of relatedness computation;
5) the retrieval agent server is selected a plurality of ensembles of communication according to the new inquiry and the degree of correlation of ensemble of communication; The information retrieval server corresponding to ensemble of communication sends retrieval request, and exports to the user who sends new inquiry after the result that information retrieval server returns merged.
As shown in Figure 3, the retrieval agent server mainly comprises inquiry inlet module, data preparation module and the module of query set selection in real time, and the input end of data preparation module and the module of query set selection in real time links to each other with the inquiry inlet module respectively.Data preparation module comprises " historical query and the click data pre-processing module " and " historical query and set relatedness computation module " that links to each other successively, and data preparation module is carried out pre-service and calculated historical query and the degree of correlation of each set historical query and click data thereof.Query set selects module to comprise " inquiry similarity calculation module " and " inquiry and set relatedness computation module " in real time, and query set selects module to utilize the similar inquiry in the historical query to calculate the degree of correlation of each set in real time, gathers selection.
As shown in Figure 4, inquiry log comprises a large amount of user inquirings, and the corresponding click result of inquiry; Inquiry log is carried out pre-service; Extract the corresponding click data of each inquiry, its storage organization is as shown in Figure 3, and inquiry and document id are all arranged by lexicographic ordering from low to high.The user is through reading web page title, and after information such as summary had certain understanding to web page contents, whether decision was clicked further and read, if the user has clicked a webpage, this webpage is probably relevant with inquiry so.Click data has reflected the degree of correlation of result for retrieval and inquiry, takes all factors into consideration the validity of retrieval and clicks the distribution situation of document in each set, can estimate the degree of correlation of each set more accurately.Because being the user, the click behavior web page contents is being had certain understanding back take place; Click data has comprised inquiry and the corresponding click result of inquiry that the user submits to; Click data has reflected the preference situation of user to Query Result; Can think that the click result is relevant with inquiry to a great extent, can be more accurately from user's the estimation set and the degree of correlation of inquiry.Extracting historical query and corresponding click data thereof in the present embodiment step 1) specifically is that index is stored and set up in historical query and corresponding click data thereof, and index entry is formed by comprising the pointer that is used to store the data segment of historical query and point to corresponding click document id.
As shown in Figure 5; Present embodiment step 2) detailed step is: the retrieval agent server at first sends retrieval request with each historical query to each information retrieval server; And the click situation of adding up preceding several result for retrieval that each retrieval server returns, basis then Obtain the degree of correlation Rel (s of historical query and each ensemble of communication j| p), wherein p is historical query, and T should retrieve the number of documents of returning, s for each preset information retrieval server jBe an ensemble of communication that retrieval server comprises,
Figure BDA0000118896140000082
CTD (p) is the click data of historical query, doc iFor retrieval server comprises a document in the ensemble of communication.
As shown in Figure 6, the detailed step that calculates the comprehensive similarity between new inquiry and each historical query in the step 3) is:
A) obtain the keyword similarity between the keyword of keyword and each historical query of new inquiry respectively through calculating query vector included angle cosine value;
B) form central sample to each historical query result for retrieval document of information retrieval server collection predetermined number;
C) similarity as a result between new inquiry of calculating and the central sample;
D) with the keyword similarity and as a result similarity multiply by respectively that summation obtains comprehensive similarity behind the coefficient.
New inquiry p can be expressed as vector (<t 1, w T1, p>,<t 2, w T2, p>...,<t l, w Tl, p>), t wherein iBe i index terms, w Ti, pBe index terms t among the inquiry p iWeight.
Steps A) specifically basis in
sim _ term ( p | q ) = &Sigma; i = 1 l w t i , p &times; w t i , q &Sigma; i = 1 l w t i , p 2 &times; &Sigma; i = 1 l w t i , q 2 w t i , p = t f i , p &times; iq f i iq f i = log ( n q f i )
Keyword similarity sim_term (p|q) between the keyword of the new inquiry of calculating and the keyword of each historical query, wherein p is new inquiry, q is historical query, t iBe i index terms, w Ti, pBe index terms t among the inquiry p iWeight, w Ti, pBe index terms t among the inquiry p iWeight, tf I, pBe the frequency that index terms ti among the inquiry p occurs, iqf iBe reverse enquiry frequency, qf iFor keyword t occurring iInquiry quantity.
Obtain inquiry and run counter to the original intention that set is selected in the global search result of distributed system; Step B) passes through the method for sampling in based on inquiry; For each historical query; Specifically be to obtain first three document composition central sample that each set retrieval is returned, utilize the result for retrieval of central sample is calculated the similarity of inquiry.
Step C) specifically basis in
sim _ result ( p | q ) = N ( R ( p ) &cap; R ( q ) ) N ( R ( p ) &cup; R ( q ) )
Calculate the sim_result of similarity as a result (p|q) between new inquiry and the central sample; R (q) is the result for retrieval to central sample of historical query q, and the number of documents that N (R (p) ∩ R (q)) comprises for the common factor of newly inquiring about p and historical query q result for retrieval, N (R (p) ∪ R (q)) are the number of documents that the union of new inquiry p and historical query q result for retrieval comprises.
Step D) detailed step comprises:
1., obtain comprehensive similarity sim (p|q) according to sim (p|q)=α * sim_term (p|q)+β * sim_result (p|q); Wherein sim_term (p|q) is the keyword similarity; Sim_result (p|q) is a similarity as a result, and α is the keyword coefficient of similarity, and β is a coefficient of similarity as a result;
2., basis
Figure BDA0000118896140000095
Standardization obtains final comprehensive similarity sim (p|q), and wherein cutSim is the preset coefficient of standardization, and ∑ sim (p|q) is sim (p|q) summation greater than cutSim, and ∑ sim (p|q)=1 obtains final comprehensive similarity sim (p|q).
Because always have many same queries and similar inquiry in the true searching system, similar inquiry has similar result for retrieval usually, and the user tends to select similar result for retrieval.Each set capable of using is predicted the degree of correlation of each set to new inquiry to the degree of correlation of historical query.As shown in Figure 7, step 4) retrieval agent server calculates new inquiry and specifically is meant with the degree of correlation of each information retrieval server: according to the similarity between inquiry and the degree of correlation of historical similar inquiry and each ensemble of communication, through Rel (s j| q)=∑ Rel (s j| p) sim (p|q) calculates make new advances inquiry and ensemble of communication s jDegree of correlation Rel (s j| q), Rel (s j| p) be historical query p and ensemble of communication s jThe degree of correlation, sim (p|q) is the comprehensive similarity of new inquiry p and historical query q.
The above is merely preferred implementation of the present invention, and protection scope of the present invention is not limited in above-mentioned embodiment, and every technical scheme that belongs to the principle of the invention all belongs to protection scope of the present invention.For a person skilled in the art, some improvement and the retouching under the prerequisite that does not break away from principle of the present invention, carried out, these improvement and retouching also should be regarded as protection scope of the present invention.

Claims (8)

1. distributed information retrieval set option method based on historical click data is characterized in that implementation step is following:
1) the retrieval agent server carries out pre-service to inquiry log, extracts historical query and corresponding click data thereof;
2) the retrieval agent server is according to the degree of correlation between each ensemble of communication of storing on inquiry of click data computation history and the information retrieval server;
3) the retrieval agent server obtains the new inquiry that the user sends, and calculates the comprehensive similarity between new inquiry and each historical query;
4) the retrieval agent server is selected the most similar historical query of a plurality of and new inquiry according to said comprehensive similarity, according to the historical query of said selection and and each ensemble of communication between the make new advances degree of correlation of inquiry and each ensemble of communication of relatedness computation;
5) the retrieval agent server is selected a plurality of ensembles of communication according to the new inquiry and the degree of correlation of ensemble of communication; The information retrieval server corresponding to ensemble of communication sends retrieval request, and exports to the user who sends new inquiry after the result that information retrieval server returns merged.
2. the distributed information retrieval set option method based on historical click data according to claim 1; It is characterized in that: extracting historical query and corresponding click data thereof in the said step 1) specifically is that index is stored and set up in historical query and corresponding click data thereof, and said index entry is made up of the pointer that is used to store the data segment of historical query and point to corresponding click document id.
3. the distributed information retrieval set option method based on historical click data according to claim 1; It is characterized in that; Said step 2) detailed step is: the retrieval agent server at first sends retrieval request with each historical query to each information retrieval server; And the number of being clicked in the result for retrieval that returns according to each retrieval server of said index statistics, basis then Obtain the degree of correlation Rel (s of historical query and each ensemble of communication j| p), wherein p is historical query, and T should retrieve the number of documents of returning, s for each preset information retrieval server jBe an ensemble of communication that retrieval server comprises,
Figure FDA0000118896130000021
CTD (p) is the click data of historical query, doc iFor retrieval server comprises a document in the ensemble of communication.
4. the distributed information retrieval set option method based on historical click data according to claim 3 is characterized in that, the detailed step that calculates the comprehensive similarity between new inquiry and each historical query in the said step 3) is:
A) obtain the keyword similarity between the keyword of keyword and each historical query of new inquiry respectively through calculating query vector included angle cosine value;
B) form central sample to each historical query result for retrieval document of information retrieval server collection predetermined number;
C) similarity as a result between new inquiry of calculating and the said central sample;
D) with the keyword similarity and as a result similarity multiply by respectively that summation obtains comprehensive similarity behind the coefficient.
5. the distributed information retrieval set option method based on historical click data according to claim 4 is characterized in that: specifically be basis said steps A)
sim _ term ( p | q ) = &Sigma; i = 1 l w t i , p &times; w t i , q &Sigma; i = 1 l w t i , p 2 &times; &Sigma; i = 1 l w t i , q 2 w t i , p = t f i , p &times; iq f i iq f i = log ( n q f i )
Keyword similarity sim_term (p|q) between the keyword of the new inquiry of calculating and the keyword of each historical query, wherein p is new inquiry, q is historical query, t iBe i index terms, w Ti, pBe index terms t among the inquiry p iWeight, w Ti, pBe index terms t among the inquiry p iWeight, tf I, pBe index terms t among the inquiry p iThe frequency that occurs, iqf iBe reverse enquiry frequency, qf iFor keyword t occurring iInquiry quantity.
6. the distributed information retrieval set option method based on historical click data according to claim 4 is characterized in that: specifically be basis said step C)
sim _ result ( p | q ) = N ( R ( p ) &cap; R ( q ) ) N ( R ( p ) &cup; R ( q ) )
Calculate the sim_result of similarity as a result (p|q) between new inquiry and the said central sample; R (q) is the result for retrieval to central sample of historical query q, and the number of documents that N (R (p) ∩ R (q)) comprises for the common factor of newly inquiring about p and historical query q result for retrieval, N (R (p) ∪ R (q)) are the number of documents that the union of new inquiry p and historical query q result for retrieval comprises.
7. the distributed information retrieval set option method based on historical click data according to claim 4 is characterized in that said step D) detailed step comprise:
1., obtain comprehensive similarity sim (p|q) according to sim (p|q)=α * sim_term (p|q)+β * sim_result (p|q); Wherein sim_term (p|q) is the keyword similarity; Sim_result (p|q) is a similarity as a result, and α is the keyword coefficient of similarity, and β is a coefficient of similarity as a result;
2., basis
Standardization comprehensive similarity sim (p|q), wherein cutSim is the preset coefficient of standardization, ∑ sim (p|q) is sim (p|q) summation greater than cutSim, obtains final comprehensive similarity sim (p|q).
8. according to any described distributed information retrieval set option method in the claim 3~7 based on historical click data; It is characterized in that; Said step 4) retrieval agent server calculates new inquiry and specifically is meant with the degree of correlation of each information retrieval server: according to the degree of correlation of the similarity between said inquiry with historical similar inquiry and each ensemble of communication, through Rel (s j| q)=∑ Rel (s j| p) sim (p|q) calculates make new advances inquiry and ensemble of communication s jDegree of correlation Rel (s j| q), Rel (s j| p) be historical query p and ensemble of communication s jThe degree of correlation, sim (p|q) is the comprehensive similarity of new inquiry p and historical query q.
CN201110412262.5A 2011-12-12 2011-12-12 Selection method of distributed information retrieval sets based on historical click data Expired - Fee Related CN102521350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110412262.5A CN102521350B (en) 2011-12-12 2011-12-12 Selection method of distributed information retrieval sets based on historical click data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110412262.5A CN102521350B (en) 2011-12-12 2011-12-12 Selection method of distributed information retrieval sets based on historical click data

Publications (2)

Publication Number Publication Date
CN102521350A true CN102521350A (en) 2012-06-27
CN102521350B CN102521350B (en) 2014-07-16

Family

ID=46292264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110412262.5A Expired - Fee Related CN102521350B (en) 2011-12-12 2011-12-12 Selection method of distributed information retrieval sets based on historical click data

Country Status (1)

Country Link
CN (1) CN102521350B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050235A (en) * 2014-03-27 2014-09-17 浙江大学 Distributed information retrieval method based on set selection
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN105956010A (en) * 2016-04-20 2016-09-21 浙江大学 Distributed information retrieval set selection method based on distributed representation and local ordering
CN107103014A (en) * 2016-10-11 2017-08-29 阿里巴巴集团控股有限公司 The replay method of history pushed information, device and system
CN108897751A (en) * 2018-05-04 2018-11-27 中国信息安全研究院有限公司 A kind of efficient data presentation method
CN111143427A (en) * 2019-11-25 2020-05-12 中国科学院计算技术研究所 Distributed information retrieval method, system and device based on-line computing

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1963816A (en) * 2006-12-01 2007-05-16 清华大学 Automatization processing method of rating of merit of search engine
CN101582085A (en) * 2008-09-19 2009-11-18 江苏大学 Set option method based on distributed information retrieval system
CN101814085A (en) * 2010-02-04 2010-08-25 林培光 WEB data bank selection method based on WDB (World Data Bank) characteristics and user query requests
CN101820592A (en) * 2009-02-27 2010-09-01 华为技术有限公司 Method and device for mobile search

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1963816A (en) * 2006-12-01 2007-05-16 清华大学 Automatization processing method of rating of merit of search engine
CN101582085A (en) * 2008-09-19 2009-11-18 江苏大学 Set option method based on distributed information retrieval system
CN101820592A (en) * 2009-02-27 2010-09-01 华为技术有限公司 Method and device for mobile search
CN101814085A (en) * 2010-02-04 2010-08-25 林培光 WEB data bank selection method based on WDB (World Data Bank) characteristics and user query requests

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
张刚等: "分布式信息检索的集合选择研究", 《计算机工程》 *
张刚等: "基于查询空间的分布式文档集合划分算法", 《中文信息学报》 *
李培等: "分布式信息检索中资源选择方法的研究", 《情报理论与实践》 *
王秀红: "基于集合覆盖的分布式信息检索资源选择", 《计算机工程》 *
雷雪: "分布式检索中信息集选择方法研究综述", 《情报科学》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636403A (en) * 2013-11-15 2015-05-20 腾讯科技(深圳)有限公司 Query request processing method and device
CN104636403B (en) * 2013-11-15 2019-03-26 腾讯科技(深圳)有限公司 Handle the method and device of inquiry request
CN104050235A (en) * 2014-03-27 2014-09-17 浙江大学 Distributed information retrieval method based on set selection
CN104050235B (en) * 2014-03-27 2017-02-22 浙江大学 Distributed information retrieval method based on set selection
CN105956010A (en) * 2016-04-20 2016-09-21 浙江大学 Distributed information retrieval set selection method based on distributed representation and local ordering
CN105956010B (en) * 2016-04-20 2019-03-26 浙江大学 Distributed information retrieval set option method based on distributed characterization and partial ordering
CN107103014A (en) * 2016-10-11 2017-08-29 阿里巴巴集团控股有限公司 The replay method of history pushed information, device and system
CN108897751A (en) * 2018-05-04 2018-11-27 中国信息安全研究院有限公司 A kind of efficient data presentation method
CN108897751B (en) * 2018-05-04 2023-07-25 中国信息安全研究院有限公司 Efficient data presentation method
CN111143427A (en) * 2019-11-25 2020-05-12 中国科学院计算技术研究所 Distributed information retrieval method, system and device based on-line computing
CN111143427B (en) * 2019-11-25 2023-09-12 中国科学院计算技术研究所 Distributed information retrieval method, system and device based on online computing

Also Published As

Publication number Publication date
CN102521350B (en) 2014-07-16

Similar Documents

Publication Publication Date Title
CN102521350B (en) Selection method of distributed information retrieval sets based on historical click data
KR100462292B1 (en) A method for providing search results list based on importance information and a system thereof
CN102426610B (en) Microblog rank searching method and microblog searching engine
US8117256B2 (en) Methods and systems for exploring a corpus of content
US8380697B2 (en) Search and retrieval methods and systems of short messages utilizing messaging context and keyword frequency
CN104050235A (en) Distributed information retrieval method based on set selection
CN102663064B (en) A kind of disposal route of favorites data and device
CN101477554A (en) User interest based personalized meta search engine and search result processing method
CN102088419A (en) Method and system for searching information of good friends in social network
CN101833570A (en) Method and device for optimizing page push of mobile terminal
CN102799587A (en) Forum searching method and device
CN102200979A (en) Distributed parallel information retrieval system and distributed parallel information retrieval method
CN104915449A (en) Faceted search system and method based on water conservancy object classification labels
CN104021125A (en) Search engine sorting method and system and search engine
CN102063454A (en) Method and equipment combining search and application
CN103559258A (en) Webpage ranking method based on cloud computation
CN103942268A (en) Method and device for combining search and application and application interface
CN105787066A (en) Digital content distribution system based on total analysis
CN103200269A (en) Internet information statistical method and Internet information statistical system
CN104636403A (en) Query request processing method and device
CN109947935A (en) The generation method and device of media event
Anagnostopoulos et al. Stochastic query covering
Swaroop et al. Mobile distributed real time database systems: A research challenges
CN102325098A (en) Group information acquisition method and system
CN103902687B (en) The generation method and device of a kind of Search Results

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140716

Termination date: 20171212